Wiki page
[input] by
mario
2014-03-20 01:02:31.
D 2014-03-20T01:02:31.210
L input
N text/x-markdown
P 123cee651e8f934f7744bc3134a4db52934e8d34
U mario
W 13620
<h2> new input() </h2>
The <kbd>[input](finfo/php7/input.php)</kbd> class wraps the superglobals `$_REQUEST`, `$_GET`, `$_POST`, `$_SERVER` and `$_COOKIE`. It provides streamlined sanitization with unobtrusive filter names and a unique semi-fluent syntax:
<code><pre>
$_REQUEST<mark style="background: linear-gradient(#f7f6f5,#f7e655,#f7f6f5); color:#fa3">->text</mark>["content"]
</pre></code>
Filtering functions can also be chained, as in `$_GET->text->html["title"]`. There are various whitelisting and sanitizing methods for that.
* This approach addresses input constraint validation at the earliest feasible entry point.
* Unifies access through a central verification mechanism.
* Allows reliable input interpolation instantly into many target contexts.
Additionally it can still shadow/audit casual and unverfied accesses. Its overall API simplicity is meant to *encourage* safety; through minimal effort.
<h2> Available filters </h2>
There's a wide range of built-in methods. Often the basic filters are sufficient and best suited for combination.
<table>
<colgroup width="21%"></colgroup>
<colgroup width="11%"></colgroup>
<colgroup width="22%"></colgroup>
<colgroup width="56%"></colgroup>
<tr>
<th>Method</th>
<th>Type</th>
<th>Sample</th>
<th>Usage</th>
</tr>
<tr>
<td>int</td>
<td>cast</td>
<td>123</td>
<td>Only numeric characters, cast to integer.</td>
</tr>
<tr>
<td>name</td>
<td>white</td>
<td>abc12_x3</td>
<td>Alphanumeric symbols only.</td>
</tr>
<tr>
<td>id</td>
<td>white</td>
<td>xy_2.1</td>
<td>Alphanumeric chars, dot and underscore.</td>
</tr>
<tr>
<td>words</td>
<td>white</td>
<td>abc def</td>
<td>Text with minimal interpunction (only spaces allowed).</td>
</tr>
<tr>
<td>text</td>
<td>white</td>
<td>Hello, World!</td>
<td>Common natural text with basic interpunction (including quotes, but no < >).</td>
</tr>
<tr>
<td>filename</td>
<td>filter</td>
<td>basename.txt</td>
<td>Replace all non-alphanumeric characters with underscores.</td>
</tr>
<tr>
<td>float</td>
<td>cast</td>
<td>3.14159</td>
<td>Cast to float.</td>
</tr>
<tr>
<td>boolean</td>
<td>cast</td>
<td>true, false</td>
<td>Converts "false/true" or "0/1" or "off/on" and "no/yes" to boolean.</td>
</tr>
<tr>
<td>ascii</td>
<td>white</td>
<td>Aa#*:β,\n\0~</td>
<td>Characters in the ASCII range 0 .. 127</td>
</tr>
<tr>
<td>nocontrol</td>
<td>white</td>
<td>Aa#*:β,\n~</td>
<td>Fiilters out control characters (< 32), except \r \n \t.</td>
</tr>
<tr>
<td>spaces</td>
<td>filter</td>
<td>Single line</td>
<td>Turns linebreaks / whitespace (\r \n \t) into spaces only.</td>
</tr>
<tr>
<td>q</td>
<td>black</td>
<td>\βvalue\β</td>
<td>Shorthand for <code>addslashes</code>.</td>
</tr>
<tr>
<td>escape</td>
<td>black</td>
<td>\ []β{}'$`!Β΄&?/><|*~;^</td>
<td>Broader escaping of well-known meta charactes (quotes and regex).</td>
</tr>
<tr>
<td>html</td>
<td>filter</td>
<td>&amp;</td>
<td>htmlspecialchars (shorthand)</td>
</tr>
<tr>
<th>Structural</th>
<th colspan=3>Following filters constrain specific input formats.</th>
</tr>
<tr>
<td>datetime</td>
<td>white</td>
<td>1999-12-31T23:59:59Z</td>
<td>HTML5 datetime values</td>
</tr>
<tr>
<td>date</td>
<td>white</td>
<td sdval="42202" sdnum="1031;0;JJJJ-MM-TT">2015-07-17</td>
<td>Just date string.</td>
</tr>
<tr>
<td>time</td>
<td>white</td>
<td>23:59:20.17</td>
<td>Time specifier.</td>
</tr>
<tr>
<td>color</td>
<td>white</td>
<td>#FF5022</td>
<td>Hex color value.</td>
</tr>
<tr>
<td>tel</td>
<td>white</td>
<td>"+1-347-2214144</td>
<td>International-format telephone number.</td>
</tr>
<tr>
<td>iconv</td>
<td>filter</td>
<td><br></td>
<td>Convert input to UTF-8</td>
</tr>
<tr>
<td>utf7</td>
<td>black</td>
<td><br></td>
<td>Filter some UTF-7 out.</td>
</tr>
<tr>
<td>ip</td>
<td>white</td>
<td>::1</td>
<td>IPv4 or IPv6 address</td>
</tr>
<tr>
<td>ipv4</td>
<td>white</td>
<td>134.22.7.207</td>
<td>IPv4 address only</td>
</tr>
<tr>
<td>public</td>
<td>white</td>
<td>8.8.4.4</td>
<td>Validate IP to be public.</td>
</tr>
<tr>
<td>email</td>
<td>white</td>
<td>you @gmail.com</td>
<td>Syntactically valid email address.</td>
</tr>
<tr>
<td>url</td>
<td>white</td>
<td><br></td>
<td>Ensure URL syntax xxx:///</td>
</tr>
<tr>
<td>http</td>
<td>white</td>
<td>http:// localhost/</td>
<td>More conservative http:// URL constraint.</td>
</tr>
<tr>
<td>uri</td>
<td>white</td>
<td><br></td>
<td>More generic URI syntax.</td>
</tr>
<tr>
<td>xml</td>
<td>cast</td>
<td><br></td>
<td>Create a SimpleXML object from input.</td>
</tr>
<tr>
<td>json</td>
<td>cast</td>
<td>{βkeyβ:βvalueβ}</td>
<td>json_decode()</td>
</tr>
<tr>
<td>purify</td>
<td>filter</td>
<td><b>basic</b></td>
<td>Utilizes HTMLPurifier</td>
</tr>
<tr>
<th>NOP</th>
<th colspan=3>Virtual / control filters.</th>
</tr>
<tr>
<td>log</td>
<td>control</td>
<td><br></td>
<td>Raw value access with logging.</td>
</tr>
<tr>
<td>raw</td>
<td>control</td>
<td><br></td>
<td>Raw access with E_NOTICE (is the default).</td>
</tr>
<tr>
<td>disallow</td>
<td>control</td>
<td><br></td>
<td>Disallow unfiltered variable access (configurable per INPUT_DIRECT).</td>
</tr>
<tr>
<td>is</td>
<td>control</td>
<td><br></td>
<td>Is a meta filter, that applies the following filter chain, then checks if the content would have passed unaffected. Returns a boolean if all constraints were matched.</td>
</tr>
<tr>
<th><b>Parameterized</b></td>
<th colspan=3>These filters require method access <code>$_GET->default(βidβ, βindexβ)</code> instead of the plain array key syntax.</th>
</tr>
<tr>
<td>length(ID, 20)</td>
<td>filter</td>
<td>Hello Wo</td>
<td>Cuts strings to maximum given length.</td>
</tr>
<tr>
<td>range(ID, 1, 15)</td>
<td>white</td>
<td>17</td>
<td>Constrains numeric input to the given range.</td>
</tr>
<tr>
<td>default</td>
<td>filter</td>
<td>β¦</td>
<td>Uses default value, if no input present.</td>
</tr>
<tr>
<td>regex</td>
<td>white/black</td>
<td>β¦</td>
<td>Custom regular expression method <code>->regex("field", "/(abc)/")</code></td>
</tr>
<tr>
<td>in_array</td>
<td>white</td>
<td>a,b,c</td>
<td>Can be used with array parameter, or a simpler comma-separated of allowed values.</td>
</tr>
<tr>
<td><br></td>
<td><br></td>
<td><br></td>
<td><br></td>
</tr>
<tr>
<th>Multi-Apply</th>
<th colspan=3>Following filters work on a set of input variables, instead of a single one.</th>
</tr>
<tr>
<td>array</td>
<td>control</td>
<td><br></td>
<td>Is automatically applied to input subarrays, so filters are run on each entry.</td>
</tr>
<tr>
<td>list</td>
<td>control</td>
<td><br></td>
<td>Combine multiple input variables per name (comma-separated list) and apply filtering collectively; finally return a named result array.</td>
</tr>
<tr>
<td>multi</td>
<td>control</td>
<td><br></td>
<td>Also grabs a list of input variables. But does not run filters on scalars within, but pass the combined set to filter functions. This is used in combination with e.g. <code>http_build_query</code></td>
</tr>
<tr>
<th>Global functions</th>
<th colspan=3><br></th>
</tr>
<tr>
<td>strtolower</td>
<td>filter</td>
<td><br></td>
<td rowspan=3>Any global function can be chained actually. It just needs to accept one parameter, modify its input (string), and return something in return. Custom userland functions can thus be utilized.</td>
</tr>
<tr>
<td>urlencode</td>
<td>filter</td>
<td><br></td>
</tr>
<tr>
<td>strip_tags</td>
<td>filter</td>
<td><br></td>
</tr>
<tr>
<td><br></td>
<td><br></td>
<td><br></td>
<td><br></td>
</tr>
<tr>
<th>Inadvised filters</th>
<th colspan=3>Care should be taken here. Liberal application will lead to a false sense of security.</th>
</tr>
<tr>
<td>sql</td>
<td>filter</td>
<td><br></td>
<td>Configurable <code>PDO::quote</code> shorthand.</td>
</tr>
<tr>
<td>mysql</td>
<td>filter</td>
<td><br></td>
<td>Shorthand to <code>mysql_real_escape_string</code> (doubly discouraged).</td>
</tr>
<tr>
<td>xss</td>
<td>black</td>
<td><br></td>
<td>Minimal XSS blacklist</td>
</tr>
</table>
As mentioned, any global function can be utilized implicitly. A few [core string functions](http://php.net/strings) are useful in this context. But the intended target are custom functions.
<h3> Binding filters </h3>
One can even *bind* new functions or class methods using:
$_GET->_filtername = array("AppFilter", "validSessionID");
It's imperative to shadow the filternames using an underscore `_` prefix however. See for example `input.inspekt.php` for some examples. This allows them to be chained still:
$_GET->text->validSessionID["var"]
(Btw, to use some of the `input` filter methods statically and outside of their scope, one could use `$value = input::_datetime($value);` for instance.)
<h3> Complex filters </h3>
With `->list` and `->multi` you can utilize some more crafty features. For instance:
$_GET->multi->http_build_query["id,name,title"]
Will rebuild an URL-encoded string from three input variables.
<h2> Wrapper implementation </h2>
Basically the filters are initialized for all superglobals like:
$_GET = new input($_GET);
The original variables are stored in `->__vars[]` internally. Each `$_GET->filtername` pseudo-method access is accumulated in a filter chain.
The first use of array `["key"]` or method `("key")` requests, applies the filter chain to the named input variable, then returns the constrained value.
<h2> Filter chain defaults </h2>
It's possible to define a default filter for remaining `$_GET["old"]` accesses with the <b><code>INPUT_DIRECT</code></b> constant.
* Per default it uses "raw" which just prints a notice.
* It can also be set to "disable" to prevent such uses.
* Another alternative would be "q" to emulate magic quotes (not recommended).
* Or using "sql" to securely use `$_POST["fields"]` in SQL strings, if that's the default target (also not recommended).
Another option is to predefine a filter chain on a particular superglobal with `->always()`:
$_POST->xss->nocontrol->always();
Then any `$_RAW["access"]` would still use these filters. Yet additional more context-specific filters could also be intermixed.
It's equivalent to having the filter chain built up, before accessing an entry:
$_GET->filter->name->and->more;
$_GET["var"]
Btw, to reset a default filter chain, use `->__always = array()`;
<h3> Predeclaring filters for raw access </h3>
While this somewhat amounts to **magic_quotes 2.0**, you can also pre-define filter chains on a variable name basis:
$_GET->__rules["old_id"] = array("int", array());
This is suitable for bolting a minimum of safety onto old code, whose data flow is structurally hard to fix otherwise.
<h2> Differences to plain <code>$_GET</code> / <code>$_POST</code> / <code>$_REQUEST</code> </h3>
Because the whole <code>ArrayAccess</code> and <code>Iterator</code> interfaces are implemented, it's easy to transition existing code to <code>new input()</code>. There are few behavioural discrepancies.<br><br>
One thing that won't work for example is the common / olden idiom:
<code><pre>
if ($_POST) {
</pre></code>
To probe for presence of input data, one should check one of the keys, or rather:
<code><pre>
if (count($_POST)) {
</pre></code>
Which has the same effect.
<h3> Methods <code>->has()</code>, <code>->no()</code>, <code>->keys()</code> </h3>
These three convenience methods make some idioms more readble. Instead of testing for <code>isset($_GET["key"])</code> one can now write: <code>$_GET->has("key")</code>. Or to probe for the opposite <code>$_GET->no("sleep")</code>.
<p>In place of <code>array_keys()</code> there's now <code>$_REQUEST->keys()</code>, also slightly shorter.</p>
<h2> Notice emission </h2>
Syntactic salt ala `isset($_GET["id"]) ? $_GET["id"] : ""` for silent value substitution has become commonplace.
It's made redundant here, because `input{}` itself already probes for existence of variables. Notices for absent values are only generated afterwards, and only if requested. Thus they can be reenabled when needed, unlike with the `isset` and `?:` supper suppression syntax.
`INPUT_DIRECT` controls the default filter for `$_GET["raw"]` access. If it's set to `raw` then this specific filter name will engage. And `raw` honors `INPUT_SILENCE`. Per default it still emits useful notices. If set to `1` it will no more.
Rewritten code can default to `$_REQUEST->raw->default("id", 123)` however. This combines both the default value substitution, but still permits bringing back notices and hence debugging.
<h2> Closing remarks </h2>
Using such an input filter **does not mean one can forgo database esaping** et al. It just adds another layer of format constraining and thus security atop.
And it's a very simple and convenient layer. (Complexity seldomly helps with that.)
Z 5a7d89cedb34a0776ff98071fd911ff9