PHP utility collection with hybrid and fluent APIs.

⌈⌋ ⎇ branch:  hybrid7 libraries


Update of "input"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 169fc11cb6e4d28e34a1294899486c2d8d65ec63
Page Name:input
Date: 2014-03-29 03:55:00
Original User: mario
Mimetype:text/x-markdown
Parent: 21bec6aaa41ebc961af33ce937471821dc6ffbde (diff)
Next f4c04016bc16865cc9cb43a73cdc89630146bc8d
Content

new input()

The input class wraps the superglobals $_REQUEST, $_GET, $_POST, $_SERVER and $_COOKIE. It provides streamlined sanitization with unobtrusive filter names and a unique semi-fluent syntax:

    $_REQUEST<mark style="background: linear-gradient(#f7f6f5,#f7e655,#f7f6f5); color:#fa3">->text</mark>["content"]

Filtering functions can also be chained, as in $_GET->text->html["title"]. Most sanitizing methods excise unwanted literals, several validate or drop whole values, some perform escaping, and a few are just blacklists.

  • This approach addresses input constraint validation at the earliest feasible entry point.

  • Unifies access through a central verification mechanism.

  • Allows reliable input interpolation instantly into many target contexts.

Additionally it can still shadow/audit casual and unverfied accesses. Its overall API simplicity is meant to encourage safety; through minimal effort.

Available filters

There's a wide range of built-in methods. Often the basic filters are sufficient and best suited for combination.

Method Type Sample Usage
int cast 123 Only numeric characters, cast to integer.
name white abc12_x3 Alphanumeric symbols only.
id white xy_2.1 Alphanumeric chars, dot and underscore.
words white abc def Text with minimal interpunction (only spaces allowed).
text white Hello, World! Common natural text with basic interpunction (including quotes, but no < >).
filename filter basename.txt Replace all non-alphanumeric characters with underscores.
float cast 3.14159 Cast to float.
boolean cast true, false Converts "false/true" or "0/1" or "off/on" and "no/yes" to boolean.
ascii white Aa#:“,n0~ Characters in the ASCII range 0 .. 127
nocontrol white Aa#:“,n~ Fiilters out control characters (< 32), except r n t.
spaces filter Single line Turns linebreaks / whitespace (r n t) into spaces only.
q black “value“ Shorthand for addslashes.
escape black []“{}'$`!´&?/><|*~;^ Broader escaping of well-known meta charactes (quotes and regex).
html filter &amp; htmlspecialchars (shorthand)
Structural Following filters constrain specific input formats.
datetime white 1999-12-31T23:59:59Z HTML5 datetime values
date white 2015-07-17 Just date string.
time white 23:59:20.17 Time specifier.
color white #FF5022 Hex color value.
tel white "+1-347-2214144 International-format telephone number.
iconv filter
Convert input to UTF-8
utf7 black
Filter some UTF-7 out.
ip white ::1 IPv4 or IPv6 address
ipv4 white 134.22.7.207 IPv4 address only
public white 8.8.4.4 Validate IP to be public.
email white you @gmail.com Syntactically valid email address.
url white
Ensure URL syntax xxx:///
http white http:// localhost/ More conservative http:// URL constraint.
uri white
More generic URI syntax.
xml cast
Create a SimpleXML object from input.
json cast {„key“:“value“} json_decode()
purify filter <b>basic</b> Utilizes HTMLPurifier
NOP Virtual / control filters.
log control
Raw value access with logging.
raw control
Raw access with E_NOTICE (is the default).
disallow control
Disallow unfiltered variable access (configurable per INPUT_DIRECT).
is control
Is a meta filter, that applies the following filter chain, then checks if the content would have passed unaffected. Returns a boolean if all constraints were matched.
Parameterized These filters require method access $_GET->default(„id“, „index“) instead of the plain array key syntax. Alternatively ellipse … syntax.
length(ID, 20) filter Hello Wo Cuts strings to maximum given length.
range(ID, 1, 15) white 17 Constrains numeric input to the given range.
default filter … Uses default value, if no input present.
regex white/black … Custom regular expression method ->regex("field", "/(abc)/")
in_array white a,b,c Can be used with array parameter, or a simpler comma-separated of allowed values.




Multi-Apply Following filters work on a set of input variables, instead of a single one.
array control
Is automatically applied to input subarrays, so filters are run on each entry.
list control
Combine multiple input variables per name (comma-separated list) and apply filtering collectively; finally return a named result array.
multi control
Also grabs a list of input variables. But does not run filters on scalars within, but pass the combined set to filter functions. This is used in combination with e.g. http_build_query
Global functions
strtolower filter
Any global function can be chained actually. It just needs to accept one parameter, modify its input (string), and return something in return. Custom userland functions can thus be utilized.
urlencode filter
strip_tags filter




Inadvised filters Care should be taken here. Liberal application will lead to a false sense of security.
sql filter
Configurable PDO::quote shorthand.
mysql filter
Shorthand to mysql_real_escape_string (doubly discouraged).
xss black
Minimal XSS blacklist

As mentioned, any global function can be utilized implicitly. A few core string functions are useful in this context. But the intended target are custom functions.

Binding filters

One can even bind new functions or class methods using:

 $_GET->_filtername = array("AppFilter", "validSessionID");

It's imperative to shadow the filternames using an underscore _ prefix however. See input.inspekt.php for some examples. Such bound methods can be chained just as well:

 $_GET->text->validSessionID["var"]

(Btw, to use some of the input filter methods statically and outside of their scope, one could use $value = input::_datetime($value); for instance.)

Array filters

Any input variable name that corresponds to a single-level array (as in <input name="answers[]">) will automatically be managed by ->array. Which will apply successive filters on each value entry, so $_REQUEST->text["answers"][0] will still resolve.

But there is also ->list for regrouping multiple input variable names into an associative array. It's useful to apply one set of filters onto each value, but retain them as named set afterwards.

To filter and then localize three input variables, extract suddenly becomes a useful idiom:

 extract( $_GET->list->name["user,id,tag"] );

Input names can either be passed as comma separated list, or as actual array of names. PHP 5.4 syntax allows a neat utilization of name constants $_GET->list->text[[URLPARAM_TITLE, URLPARAM_NAME]] then.

The ->multi wrapper instead does not traverse each subvalue. It pipes the whole named array to its downstream filter function. Its primary purpose is:

 $_GET->multi->http_build_query["id,name,title"]

Which is the most concise way in the known universe to rebuild an URL-encoded string from three input variables. (No extra code was written for that in input.php. It just acrued as by-product.)

Parameterized methods

For filters like ->range or ->length you had to use the method access syntax ->length("varname", 20) normally.

But you can also combine literal parameters into the function name, using the ellipse … symbol (with AltGr+. on Linux, ⌥+. for Apple, or Alt+0133 on Windows).

 $_GET->int->range…1…59->html["minutes"]

Which still allows chaining other filters thereafter. And this syntax novelty keeps the code a bit more readable.

Context targetting

The input wrappers primarily encapsulate early access to unvetted remote input. This avoids delayed sanitization and an effortful data flow tracing through application layers.

But some filter combinations are perfectly suitable to skip the application logic, and combine input constraining and output context preparation.

For instance replaying form input becomes as simple as:

 echo <<<FORM
    <input name=title value="{$_POST->text->html['title']}">
    <input name=email value="{$_POST->email->html['email']}">
 FORM;

While this is highly indavisable (and ultimatively more effort than just using parameterized queries!!!!) one could do the same for SQL queries:

 pdo_query("INSERT INTO comments VALUES ('{$_POST->id->mysql['name']}') ");

The complex curly ("var expression") syntax makes this utilization of input filters in string context suitable in quite a few cases.

With preset/default filters (see ->always()), one could even use the simple PHP3 syntax in double quoted string context.

Wrapper implementation

Basically the filters are initialized for all superglobals like:

 $_GET = new input($_GET);

The original variables are stored in ->__vars[] internally. Each $_GET->filtername pseudo-method access is accumulated in a filter chain.

The first use of array ["key"] or method ("key") requests, applies the filter chain to the named input variable, then returns the constrained value.

Filter chain defaults

It's possible to define a default filter for remaining $_GET["old"] accesses with the INPUT_DIRECT constant.

  • Per default it uses "raw" which just prints a notice. (Though this filter is primarily there because it's unavoidable to access a few specific values literally anyway.)
  • It can also be set to "disable" to prevent any unfiltered access.
  • Alternatively "log" to get an overview of where to watch out.
  • Very inadvisable but feasible are also "q" to simulate magic_quotes, or "sql" if that's the primary variable target, or possibly "html" to have a minimum of XSS protection for dated web apps where most variables would otherwise end up unsanitized in HTML context.

Another option is to predefine a filter chain on a particular superglobal with ->always():

 $_POST->xss->nocontrol->always();

Then any $_RAW["access"] would still use these filters. Yet additional more context-specific filters could also be intermixed.

It's equivalent to having the filter chain built up, before accessing an entry:

 $_GET->filter->name->and->more;
 $_GET["var"]

Btw, to reset a default filter chain, use ->__always = array();

Predeclaring filters for raw access

While this somewhat amounts to magic_quotes 2.0, you can also pre-define filter chains on a variable name basis:

 $_GET->__rules["old_id"] = array("int", array());

This is suitable for bolting a minimum of safety onto old code, whose data flow is structurally hard to fix otherwise.

Differences to plain $_GET / $_POST / $_REQUEST </h3>

Because the whole ArrayAccess and Iterator interfaces are implemented, it's easy to transition existing code to new input(). There are few behavioural discrepancies.

One thing that won't work for example is the olden idiom:

 if ($_POST) {

To probe for presence of input data, one should check one of the keys, or rather:

  if (count($_POST)) {

Which has the same effect.

Methods ->has(), ->no(), ->keys()

These three convenience methods shorten some array handling. Instead of testing for isset($_GET["key"]) one can alternatively write $_GET->has("key") now. Or to probe for the opposite $_GET->no("sleep").

And in place of array_keys() there's $_REQUEST->keys() for instance.

Notice that these three are actual methods, not chainable filters.

Notice emission

Syntactic salt à la (isset($_GET["id"]) ? $_GET["id"] : "") for silent value substitution has become commonplace.

It's made redundant here, because input{} itself already probes for existence of variables. Notices for absent values are only generated afterwards, and only if requested. Thus they can be reenabled when needed, unlike with the irrevocable isset ?: super suppression syntax.

Rather utilize INPUT_QUIET to control it at incursion. Set this constant to 1 prior loading input.php to eschew notices and just receive NULL for absent input data. For uncovering non-systemic or structural flow deviations you could then easily reenable them later.

Rewritten code might also utilize $_REQUEST->default("id", 123) for applying preset values. Because of its centralized role you could thus alternatively adapt ->default or even inject a different default handler when the need arises.

Closing remarks

Using such an input filter does not mean one can forgo database escaping (or parameterization) et al. It just adds another layer of format constraining and thus a bit of security atop.

And it's a very simple and convenient layer! (Complexity seldomly abets security.)