PHP utility collection with hybrid and fluent APIs.

⌈⌋ ⎇ branch:  hybrid7 libraries


Artifact [256b0f874e]

Artifact 256b0f874ef79e569cf751431526d4f981c779b8:

Wiki page [input] by mario 2015-04-11 22:46:18.
D 2015-04-11T22:46:18.390
L input
N text/x-markdown
P c468372d7d1d71ef8860a5f1503ccc843f095445
U mario
W 17447
<h2> new input() </h2>

The <kbd>[input](finfo/php7/input.php)</kbd> class wraps the superglobals `$_REQUEST`, `$_GET`, `$_POST`, `$_SERVER` and `$_COOKIE`. It provides streamlined sanitization with unobtrusive filter names, and a unique semi-fluent syntax:

<code><pre>
    $_REQUEST<mark style="background: linear-gradient(#f7f6f5,#f7e655,#f7f6f5); color:#fa3">-&gt;text</mark>["content"]
</pre></code>

Filtering functions can also be chained, as in `$_GET->text->html["title"]`. Most sanitizing methods excise unwanted literals, several validate or drop whole values, some perform escaping, and a few are just blacklists.

  *  This approach addresses input constraint validation at the earliest feasible entry point.

  *  Unifies access through a central verification mechanism, to shadow/audit unverified retrieval.

  *  Often permits reliable and instant target context interpolation.

The API is kept trivial on purpose. Which *encourages* its use early on. Minimal effort and all.



<h2> Available filters </h2>

There's a wide range of built-in methods. Often the basic filters are sufficient and best suited for combination.

<table>
<colgroup width="21%"></colgroup>
<colgroup width="11%"></colgroup>
<colgroup width="22%"></colgroup>
<colgroup width="56%"></colgroup>
<tr>
	<th>Method</th>
	<th>Type</th>
	<th>Sample</th>
	<th>Usage</th>
</tr>
<tr>
	<td>int</td>
	<td>cast</td>
	<td>123</td>
	<td>Only numeric characters, cast to integer.</td>
</tr>
<tr>
	<td>name</td>
	<td>white</td>
	<td>abc12_x3</td>
	<td>Alphanumeric symbols only.</td>
</tr>
<tr>
	<td>id</td>
	<td>white</td>
	<td>xy_2.1</td>
	<td>Alphanumeric chars, dot and underscore.</td>
</tr>
<tr>
	<td>words</td>
	<td>white</td>
	<td>abc def</td>
	<td>Text with minimal interpunction (only spaces allowed).</td>
</tr>
<tr>
	<td>text</td>
	<td>white</td>
	<td>Hello, World!</td>
	<td>Common natural text with basic interpunction (including quotes, but no &lt; &gt;).</td>
</tr>
<tr>
	<td>filename</td>
	<td>filter</td>
	<td>basename.txt</td>
	<td>Replace all non-alphanumeric characters with underscores.</td>
</tr>
<tr>
	<td>float</td>
	<td>cast</td>
	<td>3.14159</td>
	<td>Cast to float.</td>
</tr>
<tr>
	<td>boolean</td>
	<td>cast</td>
	<td>true, false</td>
	<td>Converts &quot;false/true&quot; or &quot;0/1&quot; or &quot;off/on&quot; and &quot;no/yes&quot; to boolean.</td>
</tr>
<tr>
	<td>ascii</td>
	<td>white</td>
	<td>Aa#*:“,\n\0~</td>
	<td>Characters in the ASCII range 0 .. 127</td>
</tr>
<tr>
	<td>nocontrol</td>
	<td>white</td>
	<td>Aa#*:“,\n~</td>
	<td>Fiilters out control characters (&lt; 32), except \r \n \t.</td>
</tr>
<tr>
	<td>spaces</td>
	<td>filter</td>
	<td>Single line</td>
	<td>Turns linebreaks / whitespace (\r \n \t) into spaces only.</td>
</tr>
<tr>
	<td>q</td>
	<td>black</td>
	<td>\“value\“</td>
	<td>Shorthand for <code>addslashes</code>.</td>
</tr>
<tr>
	<td>escape</td>
	<td>black</td>
	<td>\ []“{}'$`!´&amp;?/&gt;&lt;|*~;^</td>
	<td>Broader escaping of well-known meta charactes (quotes and regex).</td>
</tr>
<tr>
	<td>html</td>
	<td>escape</td>
	<td>&amp;amp;</td>
	<td>htmlspecialchars (shorthand)</td>
</tr>
<tr>
	<th>Structural</th>
	<th colspan=3>Following filters constrain specific input formats.</th>
</tr>
<tr>
	<td>datetime</td>
	<td>white</td>
	<td>1999-12-31T23:59:59Z</td>
	<td>HTML5 datetime values</td>
</tr>
<tr>
	<td>date</td>
	<td>white</td>
	<td sdval="42202" sdnum="1031;0;JJJJ-MM-TT">2015-07-17</td>
	<td>Just date string.</td>
</tr>
<tr>
	<td>time</td>
	<td>white</td>
	<td>23:59:20.17</td>
	<td>Time specifier.</td>
</tr>
<tr>
	<td>color</td>
	<td>white</td>
	<td>#FF5022</td>
	<td>Hex color value.</td>
</tr>
<tr>
	<td>tel</td>
	<td>white</td>
	<td>&quot;+1-347-2214144</td>
	<td>International-format telephone number.</td>
</tr>
<tr>
	<td>iconv</td>
	<td>filter</td>
	<td><br></td>
	<td>Convert input to UTF-8</td>
</tr>
<tr>
	<td>utf7</td>
	<td>black</td>
	<td><br></td>
	<td>Filter some UTF-7 out.</td>
</tr>
<tr>
	<td>ip</td>
	<td>white</td>
	<td>::1</td>
	<td>IPv4 or IPv6 address</td>
</tr>
<tr>
	<td>ipv4</td>
	<td>white</td>
	<td>134.22.7.207</td>
	<td>IPv4 address only</td>
</tr>
<tr>
	<td>public</td>
	<td>white</td>
	<td>8.8.4.4</td>
	<td>Validate IP to be public.</td>
</tr>
<tr>
	<td>email</td>
	<td>white</td>
	<td>you @gmail.com</td>
	<td>Syntactically valid email address.</td>
</tr>
<tr>
	<td>url</td>
	<td>white</td>
	<td><br></td>
	<td>Ensure URL syntax xxx:///</td>
</tr>
<tr>
	<td>http</td>
	<td>white</td>
	<td>http:// localhost/</td>
	<td>More conservative http:// URL constraint.</td>
</tr>
<tr>
	<td>uri</td>
	<td>white</td>
	<td><br></td>
	<td>More generic URI syntax.</td>
</tr>
<tr>
	<td>xml</td>
	<td>cast</td>
	<td><br></td>
	<td>Create a SimpleXML object from input.</td>
</tr>
<tr>
	<td>json</td>
	<td>cast</td>
	<td>{„key“:“value“}</td>
	<td>json_decode()</td>
</tr>
<tr>
	<td>purify</td>
	<td>white</td>
	<td>&lt;b&gt;basic&lt;/b&gt;</td>
	<td>Utilizes HTMLPurifier</td>
</tr>
<tr>
	<th>Behaviour</th>
	<th colspan=3>Virtual / control filters.</th>
</tr>
<tr>
	<td>log</td>
	<td>control</td>
	<td><br></td>
	<td>Raw value access with logging.</td>
</tr>
<tr>
	<td>raw</td>
	<td>control</td>
	<td><br></td>
	<td>Raw access with E_NOTICE (is the default).</td>
</tr>
<tr>
	<td>disallow</td>
	<td>control</td>
	<td><br></td>
	<td>Disallow unfiltered variable access (configurable per INPUT_DIRECT).</td>
</tr>
<tr>
	<td>is</td>
	<td>control</td>
	<td><br></td>
	<td>Is a meta filter, that applies the following filter chain, then checks if the content would have passed unaffected. Returns a boolean if all constraints were matched.</td>
</tr>
<tr>
	<th><b>Parameterized</b></td>
	<th colspan=3>These filters require method access <code>$_GET-&gt;default(„id“, „index“)</code> instead of the plain array key syntax. Alternatively ellipse <code>…</code> syntax.</th>
</tr>
<tr>
	<td>length(ID, 20)</td>
	<td>limit</td>
	<td>Hello Wo</td>
	<td>Cuts strings to maximum given length.</td>
</tr>
<tr>
	<td>range(ID, 1, 17)</td>
	<td>limit</td>
	<td>17</td>
	<td>Constrains numeric input to the given range.</td>
</tr>
<tr>
	<td>default</td>
	<td>filter</td>
	<td>…</td>
	<td>Uses default value, if no input was present.</td>
</tr>
<tr>
	<td>regex</td>
	<td>white/black</td>
	<td>…</td>
	<td>Custom regular expression method <code>-&gt;regex(&quot;field&quot;, &quot;/(abc)/&quot;)</code></td>
</tr>
<tr>
	<td>in_array</td>
	<td>white</td>
	<td>a,b,c</td>
	<td>Can be used with an array parameter or a simpler comma-separated list of allowed values.</td>
</tr>
<tr>
	<td><br></td>
	<td><br></td>
	<td><br></td>
	<td><br></td>
</tr>
<tr>
	<th>Multi-Apply</th>
	<th colspan=3>Following filters work on a set of input variables, instead of a single one.</th>
</tr>
<tr>
	<td>array</td>
	<td>control</td>
	<td><br></td>
	<td>Is automatically applied to input subarrays, so filters are run on each entry.</td>
</tr>
<tr>
	<td>list</td>
	<td>control</td>
	<td><br></td>
	<td>Combines multiple input variables per name (comma-separated list) and applies filtering collectively; finally returns an associative result array.</td>
</tr>
<tr>
	<td>multi</td>
	<td>control</td>
	<td><br></td>
	<td>Also grabs a list of input variables. `multi` does not run filters on scalars within, but passes the combined set to filter functions. This is used in combination with e.g. <code>http_build_query</code></td>
</tr>
<tr>
	<th>Global functions</th>
	<th colspan=3><br></th>
</tr>
<tr>
	<td>strtolower</td>
	<td>filter</td>
	<td><br></td>
	<td rowspan=3>Any global function can be chained actually. It just needs to accept one parameter, modify its input (string), and return something in return. Custom userland functions can thus be utilized.</td>
</tr>
<tr>
	<td>urlencode</td>
	<td>filter</td>
	<td><br></td>
	</tr>
<tr>
	<td>strip_tags</td>
	<td>filter</td>
	<td><br></td>
	</tr>
<tr>
	<td><br></td>
	<td><br></td>
	<td><br></td>
	<td><br></td>
</tr>
<tr>
	<th>Inadvised filters</th>
	<th colspan=3>Care should be taken here. Liberal application will lead to a false sense of security.</th>
</tr>
<tr>
	<td>sql</td>
	<td>filter</td>
	<td><br></td>
	<td>Configurable <code>PDO::quote</code> shorthand.</td>
</tr>
<tr>
	<td>mysql</td>
	<td>filter</td>
	<td><br></td>
	<td>Shorthand to <code>mysql_real_escape_string</code> (doubly discouraged).</td>
</tr>
<tr>
	<td>xss</td>
	<td>black</td>
	<td><br></td>
	<td>Minimal XSS blacklist</td>
</tr>
</table>


As mentioned, any global function can be utilized implicitly. A few [core string functions](http://php.net/strings) are useful in this context. But the intended target are custom functions.



<h3> Binding filters </h3>

One can even *bind* new functions using:

     $_POST->_filtername = function($s) { ... }

Likewise class or object methods with:

     $_GET->_filtername = array("AppFilter", "validSessionID");

It's imperative to shadow the filternames using an underscore `_` prefix however. See `input.inspekt.php` for some examples. Such bound methods can be chained just as well:

     $_GET->text->validSessionID["var"]

(Btw, to use some of the `input` filter methods statically and outside of their scope, one could use `$value = input::_datetime($value);` for instance.)



<h3> Array filters </h3>

Any input variable name that corresponds to a single-level array (as in `<input name="answers[]">`) will automatically be managed by <b><code>-&gt;array</code></b>. Which will apply successive filters on each value entry, so `$_REQUEST->text["answers"][0]` will still resolve.

But there is also <b><code>-&gt;list</code></b> for *regrouping* multiple input variable names into an associative array. It's useful to apply one set of filters onto each value, but retain them as named set afterwards.

To filter and then localize just three known input variables, `extract` suddenly becomes a useful idiom:

     extract( $_GET->list->name["user,id,tag"] );

Input names can either be passed as comma separated list, or as actual array of names. PHP 5.4 syntax allows a neat utilization of name constants `$_GET->list->text[[URLPARAM_TITLE, URLPARAM_NAME]]` then.

The <b><code>-&gt;multi</code></b> wrapper instead does not traverse each subvalue. It pipes the whole named array to its downstream filter function. Its primary purpose is:

     $_GET->multi->http_build_query["id,name,title"]

Which is the most concise way in the known universe to rebuild an URL-encoded string from three input variables. (No extra code was written for that in `input.php`. It just acrued as by-product.)



<h3> Parameterized methods </h3>

For filters like `->range` or `->length` you had to use the method access syntax `->length("varname", 20)` normally.

But you can also combine literal parameters into the function name, using the ellipse <code>…</code> symbol (with <kbd>AltGr</kbd>+<kbd>.</kbd> on Linux, <kbd>⌥</kbd>+<kbd>.</kbd> for Apple, or <kbd>Alt</kbd>+<kbd>0</kbd><kbd>1</kbd><kbd>3</kbd><kbd>3</kbd> on Windows).

     $_GET->int->range…1…59->html["minutes"]

Which still allows chaining other filters thereafter. And this syntax novelty keeps the code a bit more readable.


<h3> Context targetting </h3>

The `input` wrappers primarily encapsulate early access to unvetted remote input. This avoids delayed sanitization and an effortful data flow tracing through application layers.

But some filter combinations are perfectly suitable to skip the application logic, and combine input constraining and output context preparation.

For instance replaying form input becomes as simple as:

     echo <<<FORM
        <input name=title value="{$_POST->text->html['title']}">
        <input name=email value="{$_POST->email->html['email']}">
     FORM;

While this is highly indavisable (and ultimatively *more effort* than just using parameterized queries!!!!) one could do the same for SQL queries:

     pdo_query("INSERT INTO comments VALUES ('{$_POST->id->mysql['name']}') ");

The complex curly ("var expression") syntax makes this utilization of input filters in string context suitable in quite a few cases.

With preset/default filters (see `->always()`), one could even use the simple PHP3 syntax in double quoted string context.



<h2> Wrapper implementation </h2>

Basically the filters are initialized for all superglobals like:

     $_GET = new input($_GET);

The original variables are stored in `->__vars[]` internally. Each `$_GET->filtername` pseudo-method access is accumulated in a filter chain.

The first use of array `["key"]` or method `("key")` requests, applies the filter chain to the named input variable, then returns the constrained value.



<h2> Filter chain defaults </h2>

It's possible to define a default filter for remaining `$_GET["old"]` accesses with the <b><code>INPUT_DIRECT</code></b> constant. 

  *  Per default it uses `"raw"` which just prints a notice. (Though this filter is primarily there because it's unavoidable to access some specific values literally anyway.)
  *  It can also be set to `"disable"` to prevent any unfiltered access.
  *  Alternatively `"log"` to get an overview of where to watch out.
  *  Very inadvisable but feasible are also `"q"` to simulate magic_quotes, or `"sql"` if that's the primary variable target, or possibly `"html"` to have a minimum of XSS protection for dated web apps where most variables would otherwise end up unsanitized in HTML context.

Another option is to predefine a filter chain on a particular superglobal with `->always()`:

     $_POST->xss->nocontrol->always();

Then any `$_RAW["access"]` would still use these filters. Yet additional more context-specific filters could also be intermixed.

It's equivalent to having the filter chain built up, before accessing an entry:

     $_GET->filter->name->and->more;
     $_GET["var"]

Btw, to reset a default filter chain, use `->__always = array()`;



<h3> Predeclaring filters for raw access </h3>

While this somewhat amounts to **magic_quotes 2.0**, you can also pre-define filter chains on a variable name basis:

     $_GET->__rules["old_id"] = array("int", array());

This is suitable for bolting a minimum of safety onto old code, whose data flow is structurally hard to fix otherwise.


<h2> Differences to plain <code>$_GET</code> / <code>$_POST</code> / <code>$_REQUEST</code> </h3>

Because the whole <code>ArrayAccess</code> and <code>Iterator</code> interfaces are implemented, it's easy to transition existing code to <code>new input()</code>. There are few behavioural discrepancies.<br><br>

One thing that won't work for example is the olden idiom:

<code><pre>
 if ($_POST) {
</pre></code>

The same can be achieved however just as readibly with:

<code><pre>
  if (count($_POST)) {
</pre></code>

Though it's generally preferrable and more contemporary to just probe for one of the input values, e.g. submit button name, etc.


<h3> ArrayInterface+ <code>-&gt;has()</code>, <code>-&gt;no()</code>, <code>-&gt;keys()</code> </h3>

These three convenience methods shorten some array handling. Instead of testing for <code>isset($_GET["key"])</code> one can alternatively write <code>$_GET-&gt;has("key")</code> now. Or to probe for the opposite <code>$_GET-&gt;no("sleep")</code>.

<p>And in place of <code>array_keys()</code> there's <code>$_REQUEST-&gt;keys()</code> for instance.</p>

<p>Notice that these three are actual methods, not chainable filters.</p>




<h2> Notice emission </h2>

Syntactic salt à la `(isset($_GET["id"]) ? $_GET["id"] : "")` for silent value substitution has become commonplace.

It's made redundant here, because `input{}` itself already probes for existence of variables. Notices for absent values are only generated afterwards, and only if requested. Thus they can be reenabled when needed, unlike with the irrevocable `isset ?:` super suppression syntax.

Rather utilize `INPUT_QUIET` to control it at incursion. Set this constant to `1` prior loading `input.php` to eschew notices and just receive `NULL` for absent input data. For uncovering non-systemic or structural flow deviations you could then easily reenable them later.

Rewritten code might also utilize `$_REQUEST->default("id", 123)` for applying preset values. Because of its centralized role you could thus alternatively adapt `->default` or even inject a different default handler *when* the need arises.



<h2> Closing remarks </h2>

This new-fangled input filter does not attempt to avoid all context-specific escaping.

 - It's not an excuse to forgo e.g. parameterized DB queries.

 - It only adds another layer of format constraining and thus a bit of reliability atop.  
   And it's a rather convenient layer at that!

It only makes sense with customized filters, on a per-project basis. And it's not overly suitable for anything but mid-sized applications. (Anything larger benefits more from combined filter+validation rules that most frameworks bake in anyway.)





Z c4ae2cc91e4326564086da9c2f2bf581