PHP utility collection with hybrid and fluent APIs.

βŒˆβŒ‹ βŽ‡ branch:  hybrid7 libraries


Artifact [48f5f077ad]

Artifact 48f5f077adb6fb1ce4e15f7ffef783f97c38e18a:

Wiki page [input] by mario 2014-03-20 01:02:31.
D 2014-03-20T01:02:31.210
L input
N text/x-markdown
P 123cee651e8f934f7744bc3134a4db52934e8d34
U mario
W 13620
<h2> new input() </h2>

The <kbd>[input](finfo/php7/input.php)</kbd> class wraps the superglobals `$_REQUEST`, `$_GET`, `$_POST`, `$_SERVER` and `$_COOKIE`. It provides streamlined sanitization with unobtrusive filter names and a unique semi-fluent syntax:

<code><pre>
    $_REQUEST<mark style="background: linear-gradient(#f7f6f5,#f7e655,#f7f6f5); color:#fa3">-&gt;text</mark>["content"]
</pre></code>

Filtering functions can also be chained, as in `$_GET->text->html["title"]`. There are various whitelisting and sanitizing methods for that.

  *  This approach addresses input constraint validation at the earliest feasible entry point.

  *  Unifies access through a central verification mechanism.

  *  Allows reliable input interpolation instantly into many target contexts.

Additionally it can still shadow/audit casual and unverfied accesses. Its overall API simplicity is meant to *encourage* safety; through minimal effort. 



<h2> Available filters </h2>

There's a wide range of built-in methods. Often the basic filters are sufficient and best suited for combination.

<table>
<colgroup width="21%"></colgroup>
<colgroup width="11%"></colgroup>
<colgroup width="22%"></colgroup>
<colgroup width="56%"></colgroup>
<tr>
	<th>Method</th>
	<th>Type</th>
	<th>Sample</th>
	<th>Usage</th>
</tr>
<tr>
	<td>int</td>
	<td>cast</td>
	<td>123</td>
	<td>Only numeric characters, cast to integer.</td>
</tr>
<tr>
	<td>name</td>
	<td>white</td>
	<td>abc12_x3</td>
	<td>Alphanumeric symbols only.</td>
</tr>
<tr>
	<td>id</td>
	<td>white</td>
	<td>xy_2.1</td>
	<td>Alphanumeric chars, dot and underscore.</td>
</tr>
<tr>
	<td>words</td>
	<td>white</td>
	<td>abc def</td>
	<td>Text with minimal interpunction (only spaces allowed).</td>
</tr>
<tr>
	<td>text</td>
	<td>white</td>
	<td>Hello, World!</td>
	<td>Common natural text with basic interpunction (including quotes, but no &lt; &gt;).</td>
</tr>
<tr>
	<td>filename</td>
	<td>filter</td>
	<td>basename.txt</td>
	<td>Replace all non-alphanumeric characters with underscores.</td>
</tr>
<tr>
	<td>float</td>
	<td>cast</td>
	<td>3.14159</td>
	<td>Cast to float.</td>
</tr>
<tr>
	<td>boolean</td>
	<td>cast</td>
	<td>true, false</td>
	<td>Converts &quot;false/true&quot; or &quot;0/1&quot; or &quot;off/on&quot; and &quot;no/yes&quot; to boolean.</td>
</tr>
<tr>
	<td>ascii</td>
	<td>white</td>
	<td>Aa#*:β€œ,\n\0~</td>
	<td>Characters in the ASCII range 0 .. 127</td>
</tr>
<tr>
	<td>nocontrol</td>
	<td>white</td>
	<td>Aa#*:β€œ,\n~</td>
	<td>Fiilters out control characters (&lt; 32), except \r \n \t.</td>
</tr>
<tr>
	<td>spaces</td>
	<td>filter</td>
	<td>Single line</td>
	<td>Turns linebreaks / whitespace (\r \n \t) into spaces only.</td>
</tr>
<tr>
	<td>q</td>
	<td>black</td>
	<td>\β€œvalue\β€œ</td>
	<td>Shorthand for <code>addslashes</code>.</td>
</tr>
<tr>
	<td>escape</td>
	<td>black</td>
	<td>\ []β€œ{}'$`!Β΄&amp;?/&gt;&lt;|*~;^</td>
	<td>Broader escaping of well-known meta charactes (quotes and regex).</td>
</tr>
<tr>
	<td>html</td>
	<td>filter</td>
	<td>&amp;amp;</td>
	<td>htmlspecialchars (shorthand)</td>
</tr>
<tr>
	<th>Structural</th>
	<th colspan=3>Following filters constrain specific input formats.</th>
</tr>
<tr>
	<td>datetime</td>
	<td>white</td>
	<td>1999-12-31T23:59:59Z</td>
	<td>HTML5 datetime values</td>
</tr>
<tr>
	<td>date</td>
	<td>white</td>
	<td sdval="42202" sdnum="1031;0;JJJJ-MM-TT">2015-07-17</td>
	<td>Just date string.</td>
</tr>
<tr>
	<td>time</td>
	<td>white</td>
	<td>23:59:20.17</td>
	<td>Time specifier.</td>
</tr>
<tr>
	<td>color</td>
	<td>white</td>
	<td>#FF5022</td>
	<td>Hex color value.</td>
</tr>
<tr>
	<td>tel</td>
	<td>white</td>
	<td>&quot;+1-347-2214144</td>
	<td>International-format telephone number.</td>
</tr>
<tr>
	<td>iconv</td>
	<td>filter</td>
	<td><br></td>
	<td>Convert input to UTF-8</td>
</tr>
<tr>
	<td>utf7</td>
	<td>black</td>
	<td><br></td>
	<td>Filter some UTF-7 out.</td>
</tr>
<tr>
	<td>ip</td>
	<td>white</td>
	<td>::1</td>
	<td>IPv4 or IPv6 address</td>
</tr>
<tr>
	<td>ipv4</td>
	<td>white</td>
	<td>134.22.7.207</td>
	<td>IPv4 address only</td>
</tr>
<tr>
	<td>public</td>
	<td>white</td>
	<td>8.8.4.4</td>
	<td>Validate IP to be public.</td>
</tr>
<tr>
	<td>email</td>
	<td>white</td>
	<td>you @gmail.com</td>
	<td>Syntactically valid email address.</td>
</tr>
<tr>
	<td>url</td>
	<td>white</td>
	<td><br></td>
	<td>Ensure URL syntax xxx:///</td>
</tr>
<tr>
	<td>http</td>
	<td>white</td>
	<td>http:// localhost/</td>
	<td>More conservative http:// URL constraint.</td>
</tr>
<tr>
	<td>uri</td>
	<td>white</td>
	<td><br></td>
	<td>More generic URI syntax.</td>
</tr>
<tr>
	<td>xml</td>
	<td>cast</td>
	<td><br></td>
	<td>Create a SimpleXML object from input.</td>
</tr>
<tr>
	<td>json</td>
	<td>cast</td>
	<td>{β€žkeyβ€œ:β€œvalueβ€œ}</td>
	<td>json_decode()</td>
</tr>
<tr>
	<td>purify</td>
	<td>filter</td>
	<td>&lt;b&gt;basic&lt;/b&gt;</td>
	<td>Utilizes HTMLPurifier</td>
</tr>
<tr>
	<th>NOP</th>
	<th colspan=3>Virtual / control filters.</th>
</tr>
<tr>
	<td>log</td>
	<td>control</td>
	<td><br></td>
	<td>Raw value access with logging.</td>
</tr>
<tr>
	<td>raw</td>
	<td>control</td>
	<td><br></td>
	<td>Raw access with E_NOTICE (is the default).</td>
</tr>
<tr>
	<td>disallow</td>
	<td>control</td>
	<td><br></td>
	<td>Disallow unfiltered variable access (configurable per INPUT_DIRECT).</td>
</tr>
<tr>
	<td>is</td>
	<td>control</td>
	<td><br></td>
	<td>Is a meta filter, that applies the following filter chain, then checks if the content would have passed unaffected. Returns a boolean if all constraints were matched.</td>
</tr>
<tr>
	<th><b>Parameterized</b></td>
	<th colspan=3>These filters require method access <code>$_GET-&gt;default(β€židβ€œ, β€žindexβ€œ)</code> instead of the plain array key  syntax.</th>
</tr>
<tr>
	<td>length(ID, 20)</td>
	<td>filter</td>
	<td>Hello Wo</td>
	<td>Cuts strings to maximum given length.</td>
</tr>
<tr>
	<td>range(ID, 1, 15)</td>
	<td>white</td>
	<td>17</td>
	<td>Constrains numeric input to the given range.</td>
</tr>
<tr>
	<td>default</td>
	<td>filter</td>
	<td>…</td>
	<td>Uses default value, if no input present.</td>
</tr>
<tr>
	<td>regex</td>
	<td>white/black</td>
	<td>…</td>
	<td>Custom regular expression method <code>-&gt;regex(&quot;field&quot;, &quot;/(abc)/&quot;)</code></td>
</tr>
<tr>
	<td>in_array</td>
	<td>white</td>
	<td>a,b,c</td>
	<td>Can be used with array parameter, or a simpler comma-separated of allowed values.</td>
</tr>
<tr>
	<td><br></td>
	<td><br></td>
	<td><br></td>
	<td><br></td>
</tr>
<tr>
	<th>Multi-Apply</th>
	<th colspan=3>Following filters work on a set of input variables, instead of a single one.</th>
</tr>
<tr>
	<td>array</td>
	<td>control</td>
	<td><br></td>
	<td>Is automatically applied to input subarrays, so filters are run on each entry.</td>
</tr>
<tr>
	<td>list</td>
	<td>control</td>
	<td><br></td>
	<td>Combine multiple input variables per name (comma-separated list) and apply filtering collectively; finally return a named result array.</td>
</tr>
<tr>
	<td>multi</td>
	<td>control</td>
	<td><br></td>
	<td>Also grabs a list of input variables. But does not run filters on scalars within, but pass the combined set to filter functions. This is used in combination with e.g. <code>http_build_query</code></td>
</tr>
<tr>
	<th>Global functions</th>
	<th colspan=3><br></th>
</tr>
<tr>
	<td>strtolower</td>
	<td>filter</td>
	<td><br></td>
	<td rowspan=3>Any global function can be chained actually. It just needs to accept one parameter, modify its input (string), and return something in return. Custom userland functions can thus be utilized.</td>
</tr>
<tr>
	<td>urlencode</td>
	<td>filter</td>
	<td><br></td>
	</tr>
<tr>
	<td>strip_tags</td>
	<td>filter</td>
	<td><br></td>
	</tr>
<tr>
	<td><br></td>
	<td><br></td>
	<td><br></td>
	<td><br></td>
</tr>
<tr>
	<th>Inadvised filters</th>
	<th colspan=3>Care should be taken here. Liberal application will lead to a false sense of security.</th>
</tr>
<tr>
	<td>sql</td>
	<td>filter</td>
	<td><br></td>
	<td>Configurable <code>PDO::quote</code> shorthand.</td>
</tr>
<tr>
	<td>mysql</td>
	<td>filter</td>
	<td><br></td>
	<td>Shorthand to <code>mysql_real_escape_string</code> (doubly discouraged).</td>
</tr>
<tr>
	<td>xss</td>
	<td>black</td>
	<td><br></td>
	<td>Minimal XSS blacklist</td>
</tr>
</table>


As mentioned, any global function can be utilized implicitly. A few [core string functions](http://php.net/strings) are useful in this context. But the intended target are custom functions.

<h3> Binding filters </h3>

One can even *bind* new functions or class methods using:

     $_GET->_filtername = array("AppFilter", "validSessionID");

It's imperative to shadow the filternames using an underscore `_` prefix however. See for example `input.inspekt.php` for some examples. This allows them to be chained still:

     $_GET->text->validSessionID["var"]

(Btw, to use some of the `input` filter methods statically and outside of their scope, one could use `$value = input::_datetime($value);` for instance.)


<h3> Complex filters </h3>

With `->list` and `->multi` you can utilize some more crafty features. For instance:

     $_GET->multi->http_build_query["id,name,title"]

Will rebuild an URL-encoded string from three input variables.


<h2> Wrapper implementation </h2>

Basically the filters are initialized for all superglobals like:

     $_GET = new input($_GET);

The original variables are stored in `->__vars[]` internally. Each `$_GET->filtername` pseudo-method access is accumulated in a filter chain.

The first use of array `["key"]` or method `("key")` requests, applies the filter chain to the named input variable, then returns the constrained value.


<h2> Filter chain defaults </h2>

It's possible to define a default filter for remaining `$_GET["old"]` accesses with the <b><code>INPUT_DIRECT</code></b> constant. 

  *  Per default it uses "raw" which just prints a notice.
  *  It can also be set to "disable" to prevent such uses.
  *  Another alternative would be "q" to emulate magic quotes (not recommended).
  *  Or using "sql" to securely use `$_POST["fields"]` in SQL strings, if that's the default target (also not recommended).

Another option is to predefine a filter chain on a particular superglobal with `->always()`:

     $_POST->xss->nocontrol->always();

Then any `$_RAW["access"]` would still use these filters. Yet additional more context-specific filters could also be intermixed.

It's equivalent to having the filter chain built up, before accessing an entry:

     $_GET->filter->name->and->more;
     $_GET["var"]

Btw, to reset a default filter chain, use `->__always = array()`;


<h3> Predeclaring filters for raw access </h3>

While this somewhat amounts to **magic_quotes 2.0**, you can also pre-define filter chains on a variable name basis:

     $_GET->__rules["old_id"] = array("int", array());

This is suitable for bolting a minimum of safety onto old code, whose data flow is structurally hard to fix otherwise.


<h2> Differences to plain <code>$_GET</code> / <code>$_POST</code> / <code>$_REQUEST</code> </h3>

Because the whole <code>ArrayAccess</code> and <code>Iterator</code> interfaces are implemented, it's easy to transition existing code to <code>new input()</code>. There are few behavioural discrepancies.<br><br>

One thing that won't work for example is the common / olden idiom:

<code><pre>
 if ($_POST) {
</pre></code>

To probe for presence of input data, one should check one of the keys, or rather:

<code><pre>
  if (count($_POST)) {
</pre></code>

Which has the same effect.


<h3> Methods <code>-&gt;has()</code>, <code>-&gt;no()</code>, <code>-&gt;keys()</code> </h3>

These three convenience methods make some idioms more readble. Instead of testing for <code>isset($_GET["key"])</code> one can now write: <code>$_GET-&gt;has("key")</code>. Or to probe for the opposite <code>$_GET-&gt;no("sleep")</code>.

<p>In place of <code>array_keys()</code> there's now <code>$_REQUEST-&gt;keys()</code>, also slightly shorter.</p>


<h2> Notice emission </h2>

Syntactic salt ala `isset($_GET["id"]) ? $_GET["id"] : ""` for silent value substitution has become commonplace.

It's made redundant here, because `input{}` itself already probes for existence of variables. Notices for absent values are only generated afterwards, and only if requested. Thus they can be reenabled when needed, unlike with the `isset` and `?:` supper suppression syntax.

`INPUT_DIRECT` controls the default filter for `$_GET["raw"]` access. If it's set to `raw` then this specific filter name will engage. And `raw` honors `INPUT_SILENCE`. Per default it still emits useful notices. If set to `1` it will no more.

Rewritten code can default to `$_REQUEST->raw->default("id", 123)` however. This combines both the default value substitution, but still permits bringing back notices and hence debugging.


<h2> Closing remarks </h2>

Using such an input filter **does not mean one can forgo database esaping** et al. It just adds another layer of format constraining and thus security atop. 
And it's a very simple and convenient layer. (Complexity seldomly helps with that.)


Z 5a7d89cedb34a0776ff98071fd911ff9