PHP utility collection with hybrid and fluent APIs.

βŒˆβŒ‹ βŽ‡ branch:  hybrid7 libraries


Artifact [6f70a4b48f]

Artifact 6f70a4b48f2329534aa6cb57503aab196bccbb7a:

Wiki page [log] by mario 2015-01-11 07:23:38.
D 2015-01-11T07:23:38.920
L log
N text/x-markdown
P 7a1846629b10f2c4dbf02633cb13dc933952fb6f
U mario
W 17134
<h2> Structured and hierarchical logging with :token-parametric API </h2>

 * <small> State: ***experimental*** </small>
 * <small> Category: logging </small>
 * <small> Features: journaling, structured, hierarchical </small>
 * <small> Backend: SQLite, JSON, <del>fluentd</del>, <del>logstash</del> </small>
 * <small> Signature: hybrid, parametric </small>

**logStruck <kbd>`Ε‚`</kbd>** implements a logging API and SQLite/JSON storage backend. 

 * Its purpose is storing *structured* log data and retaining log event hierarchies.
 * Implements a hybrid and terse function interface.
 * Accepts plain string messages, Ruby-style `:token` categorizers and placeholders, and array data.
 * Implicitly captures and maps PHP errors, unhandled exceptions, and most importantly *`assert()`ions*.

Unlike other PHP logging frameworks it's not primarily a text/line-oriented message dump.



## Quick example

Invocations can be along the lines of:

     Ε‚(':warn', ':wikiauth', "User doesn't have permission", $pageObj, ':vars', $_SESSION);

All the fun is in the `:token` literals, and passing arrays or objects.



## Database scheme, primary fields

All columns in the database scheme are *primary fields*. Any extra data/values end up in the `context` array.

<style>
 table.dbstruct {
   width: 75%;
   margin-left: 3%;
 }
 table.dbstruct td {
   vertical-align: top;
 }
 table.dbstruct tr:nth-child(2n) {
   background: #efefef;
 }
</style>
<table class=dbstruct>
<tr><td> <kbd>i</kbd>  </td>  <td>PRIM</td>  <td rowspan=3>Where `i` is the primary index, `g` the event group, and `p` the parent reference. Which allows displaying event group hierarchies. <img src="raw/783ad438cb4577058910a4eaccabbba327789fc3?m=image/png" align=bottom width=485 height=93 alt="log tree"></td></tr>
<tr><td> <kbd>g</kbd>  </td>  <td>INT</td>            </tr>
<tr><td> <kbd>p</kbd>  </td>  <td>INT</td>            </tr>
<tr><td> <kbd>timestamp</kbd>   </td>  <td>REAL</td> <td>Timestamp with microseconds.</td></tr>
<tr><td> <kbd>timestr</kbd></td>  <td>TEXT</td> <td>ISO DateTime string. In GMT/UTC of course.</td></tr>
<tr><td> <kbd>host</kbd>   </td>  <td>TEXT</td> <td>Hostname.</td></tr>
<tr><td> <kbd>pri</kbd>    </td>  <td>INT</td> <td>Priority number (0…7).</td></tr>
<tr><td> <kbd>**prio**</kbd></td>  <td>TEXT</td> <td>Priority string (emerg…info)</td></tr>
<tr><td> <kbd>source</kbd> </td>  <td>TEXT</td> <td>log | sys | lang | excpt | assert</td></tr>
<tr><td> <kbd>errno</kbd>  </td>  <td>INT</td> <td>0…32767</td></tr>
<tr><td> <kbd>app</kbd>    </td>  <td>TEXT</td> <td>AppName.php</td></tr>
<tr><td> <kbd>**section**</kbd></td> <td>TEXT</td> <td>Application structure / module / part / section.</td></tr>
<tr><td> <kbd>file</kbd>   </td>  <td>TEXT</td> <td>path/file.php</td></tr>
<tr><td> <kbd>line</kbd>   </td>  <td>INT</td> <td>125</td></tr>
<tr><td> <kbd>version</kbd></td>  <td>TEXT</td> <td>Meta data from source code.</td></tr>
<tr><td> <kbd>**message**</kbd></td>  <td>TEXT</td> <td>Primary log event message string.</td></tr>
<tr><td> <kbd>doc</kbd>    </td>  <td>TEXT</td> <td>Extra documentation / long message / href.</td></tr>
<tr><td> <kbd>backtrace</kbd></td>  <td>JSON</td> <td>Array of `:backtrace`</td></tr>
<tr><td> <kbd>code</kbd>   </td>  <td>TEXT</td> <td>Extracted code context (3 lines).</td></tr>
<tr><td> <kbd>vars</kbd>   </td>  <td>JSON</td> <td>Main $vars[] array.</td></tr>
<tr><td> <kbd>context</kbd></td>  <td>JSON</td> <td>Additional / user-defined fields.</td></tr>
<table>


## Flexible parametric API

The chief invocation method is <kbd>Ε‚()</kbd>. On the outset it's a procedural function, thus available globally. Behind the scenes it keeps a primary logger group. Alternatively it can be invoked via <kbd>Ε‚::<em style=color:#630>option</em>_<em style=color:#226>tokens</em>()</kbd>.

For plain old log messages it's as simple to use as:

     Ε‚("A thing happened.");

Usually you also want to convey a <u>priority</u> however:

     Ε‚("Warnful warning", ':warn');

The <kbd>:token</kbd> attributes are the most interesting concept in this logging API. They simulate Ruby-style symbols. In PHP they need to be enquoted as strings however.

Besides priority levels, an *important* use is classifying an <u>application section</u>. Anything that isn't a reserved keyword :token will simply be assumed to refer to an application module:

     Ε‚(':auth', "Authorization error", ':notice');

Make up memorable designators to categorize your log messages according to your application structure and code flow.

An obvious benefit of the :token syntax is that it allows <u>freely ordered</u> parameter lists.  
These are all equivalent:

     Ε‚(':warn', ':db', "Database error");
     Ε‚(':db', ':warn', "Database error");
     Ε‚(':warn', "Database error", ':db');

You'd probably want to be somewhat consistent though. But flexibility occasionally helps readability, in particular when passing functional :tokens or arrays/lists.


#### Array data

As mentioned, this logger API isn't meant for just string data. You often want to convey <u>context data</u>, and additional attributes. It's often as simple as just attaching an array to the parameter list:

     Ε‚(':debug', "Front controller state", $RequestVars);

Now those would end up in the `context` database field.<br>
To retain them in the `vars` column, you have two options:

     Ε‚(':debug', "Front controller state", ['vars' => $RequestVars]);

Or the more fancy <kbd>:vars</kbd> <u>array designator</u>:

     Ε‚(':debug', "Front controller state", ':vars', ⃕$RequestVars);

You might also use this for other fields like `:message` / `:code` / `:doc`, as long as the following parameter is an array.


#### Data mapping with `"field: value"`

Plain strings usually end up in the `message` field. But the structured database scheme has more <u>fields</u>, each with specific purposes. You can easily populate them with the key:value syntax.

     Ε‚("File reading error", "errno: EACCESS");

Here `errno` is actually an integer field, thus will be converted afterwards.

A more interesting field to take care of is the <u>`doc`</u> column.

     Ε‚("Cache directory locked", "doc: ?wiki=SetCachePerm");

Nowadays logs are often consumed by machines rather than humans. For some projects you may however wish to be more descriptive. You can augment the coarse and technical message summary. Supply a human-readable description for non-programmers. (In other logging APIs this is usually an afterthought, if at all implemented, seldomly even manageable in log processors/viewers).

While you could use the `doc:` field for a long prosaic documentation, this needlessly stuffs the datastore. Instead prefer hyperlinks, or references. A `"?tktid=12345"` or `"See setup.txt on chmod cache"` are helpful minimums. Relative link references are easiest to process.

All of the *primary fields* could also be set using the key:value scheme:

     Ε‚('section: auth', "Auth warning");
     Ε‚("message: $php_errormsg");
     Ε‚('source: sys', "Exec failure", "errno: $retval", [$cmd]);
     Ε‚("Regex failed", "code: $rx", "errno: $preg_last_error");

And fields that aren't primary log event columns/fields will end up in the `context` database array.

     Ε‚("Special needs logging", "var1: $var1", "method: $callback");

This is basically equivalent to using a `["key"=>"value", ...]` list. Again, prefer what's more readable in whatever context.


#### Injectors

As if there wasn't enough flexibility already, <kbd>:tokens</kbd> can also refer to data source functions.

You can augment log events with a <kbd>:backtrace</kbd> most of the time:

     Ε‚(':warn', "How did we get here?", ':backtrace');

The placeholder token will be substituted by an array, before being pushed to the log store.

Likewise you can interpolate some common vars:

     Ε‚(':debug', "Front Controller startup", ':server');

Or extract pretty much all available meta data:

     Ε‚(':debug', "Debug by logging", ':backtrace', ':file', ':code', ':version');

The built-in error / exception / assert handlers do this automatically for example, to varying degrees.



#### Hybrid <kbd>Ε‚::<em style=color:#630>option</em>_<em style=color:#226>tokens</em>()</kbd>

The :token scheme is pretty neat, but certainly not to everyones liking, and sometimes less readable than plain boring method calls. Therefore the `Ε‚()` function and `Ε‚::` class go hand in hand. Instead of listing tokens as arguments, you can just compact them into a virtual method name:

     Ε‚::debug_auth("Authorization failed", $UserObject);

You can even freely mix in injector callbacks and *one* array designator:

     Ε‚::warn_db_backtrace_file_vars("DB error", $stmt);

Again, you're the programmer. Make sound choices on a case-by-case basis. Don't be clingy with stale semantics.



## All the <kbd>:tokens</kbd>!

Around two dozen :token names are reserved keywords / internal field names:

<table class=dbstruct style="width:75%">

<tr><th colspan=3>Priority levels</th></tr>
<tr><td><kbd>:debug</kbd></td> <td>`7`</td> <td>Low-level debug events.</td></tr>
<tr><td><kbd>:info</kbd></td> <td>`6`</td> <td>Process flow infos etc.</td></tr>
<tr><td><kbd>:notice</kbd>, <kbd>:note</kbd></td> <td>`5`</td> <td>Lowest priority language notices.</td></tr>
<tr><td><kbd>:warning</kbd>, <kbd>:warn</kbd></td> <td>`4`</td> <td>Warnings.</td></tr>
<tr><td><kbd>:error</kbd>, <kbd>:err</kbd></td> <td>`3`</td> <td>PHP or system error.</td></tr>
<tr><td><kbd>:critical</kbd>, <kbd>:crit</kbd></td> <td>`2`</td> <td>This can't be good.</td></tr>
<tr><td><kbd>:alert</kbd>, <kbd>:alrt</kbd></td> <td>`1`</td> <td>Turn on the bat light.</td></tr>
<tr><td><kbd>:emergency</kbd>, <kbd>:emerg</kbd></td> <td>`0`</td> <td>Someone call the president.</td></tr>

<tr><th colspan=3>Source / generator</th></tr>
<tr><td><kbd>:log</kbd></td> <td></td> <td>Application origin, normal/manual log calls.</td></tr>
<tr><td><kbd>:sys</kbd></td> <td></td> <td>System-level events and errno codes.</td></tr>
<tr><td><kbd>:lang</kbd></td> <td></td> <td>Language errors, warnings, notices, etc.</td></tr>
<tr><td><kbd>:exception</kbd></td> <td></td> <td>Langauge/runtime exceptions.</td></tr>
<tr><td><kbd>:assert</kbd></td> <td></td> <td>`Assert()` warnings.</td></tr>

<tr><th colspan=3>Field names</th></tr>
<tr><td colspan=3>Any database column / primary field name can be represented as `:token`. It's pretty much only useful to use <kbd>:vars</kbd> however to map the following array parameter.</td></tr>

<tr><th colspan=3>Aliases</th></tr>
<tr><td colspan=3>Besides the prio levels, there are a few more shortened aliases to common fields. Among them  <kbd>:documentation</kbd> for :doc,   <kbd>:priority</kbd> for :prio, <kbd>:help</kbd> for :doc, <kbd>:msg</kbd> for :message, <kbd>:language</kbd> for <kbd>lang</kbd>, <kbd>:exc</kbd> for :exception, <kbd>:app</kbd> as <kbd>:log</kbd> generator source alias, <kbd>:trace</kbd> and <kbd>:stack</kbd> as aliases for :backtrace, and <kbd>:env</kbd> for :server </td></tr>

<tr><th colspan=3>Injector calls</th></tr>
<tr><td><kbd>:backtrace</kbd></td> <td></td> <td>Populates backtrace.</td></tr>
<tr><td><kbd>:server</kbd></td> <td></td> <td>Inserts $_SERVER array into `context`.</td></tr>
<tr><td><kbd>:file</kbd></td> <td></td> <td>Uncovers `file` and `line` from backtrace.</td></tr>
<tr><td><kbd>:version</kbd></td> <td></td> <td>Reads out meta data (file/scm version, and section) from script comments.</td></tr>
<tr><td><kbd>:code</kbd></td> <td></td> <td>Inserts 3 lines of `code` context.</td></tr>
<tr><td><kbd>:p</kbd></td> <td></td> <td>Tries to deduce log event hierarchy from prior calls, sections, and backtraces. (Not yet implemented.)</td></tr>

</table>

Any other `:token` name can be used freely to classify and group your application flow. They'll be used as **section** names.



## Setup

You obviously need a readily available `log.db` SQLite store. Best keep it `DOCUMENT_ROOT`-relative, so it's easy to declare on instantiation:

    Ε‚::$db = "$_SERVER[DOCUMENT_ROOT]/config/log.db";
    Ε‚::$app = "YourAppID";

You can of course manually load the library. Most autoloaders would already load it implicitly because of the class reference. (Even PSR-x ones, and they'd even be accidentially correct for once with case-sensitive Unicode lookups here).

While you ought to use `:section` names for logging calls, you can also override/update the default throughout your application flow with:

    Ε‚::$section = "forum";

Or likewise adapt properties of the global logger group `Ε‚()->section=..`.


#### Default injectors

The `$logger` instance in `Ε‚()` takes a list of default :token options and injectors. You should adapt it directly to enable further features.

       $logger = new Ε‚(":log", ':backtrace', ':file', ':version', ':p');

For instance would engage full event population for all/manual logging calls.


#### Avoid complexity

  *  Yes, you could actually run multiple logger groups, or pass around the `$logger` handler. Don't do it. (Kind of works, but wasn't intended to.)

  *  The alternative branches and `store()` implementations are meant to be patched in. It makes zero sense to DI / runtime-bind them. You're only going to use one approach in practice, so don't complicate instantiation.

  *  Take care with leaking information through logs. It's tempting to include a backtrace for all calls. But in particular authorization-sensitive variable states may only be useful for concrete debugging tasks, not in all log events. (That's in important consideration for any logging scheme; but moreso for logStruck due to its much simpler API and utilization.)


<br>
<br>


### Notes / Rationale

 * So, this is all either genius, or completely bonkers.

 * logStruck is decidedly different to most other logging libraries in PHP. It doesn't follow PSR-3 (lowest common denominator) and historic line/text-oriented logging. In particular it tries to avoid parsing+reformatting backends or making structured/JSON logging an afterthought.

 * Entirely intended as userland runtime. Mostly suitable for wee projects. It's primary use case is application-level debugging and auditability.

 * The function name <kbd>Ε‚</kbd> isn't completely settled on. (Maybe a bit too much novelty strive.)

 * Extensibility of the database scheme is easily done, but not planned for. 

 * Alternative logging backends are best implemented in branches.  
      
      * (In the time and age of GitHub forkeritis anyway.)
    
      * It doesn't seem senseful to impose a configuration-centric instantiation.
    
      * However making `$Ε‚->db` just a Callable would be trivial.
    
      * The API and JSON-logging design is specifically meant to avoid
       MonoLog-style message formatting / parsing / filter chains.
       Events are structured from the start, shouldn't be downconverted
       to suit textdump interfaces.

<!-- -->

 * Inspired by structlog, cabin, journald, graylog, PEAR log even, and with logstash/fluentd in mind.

 * The fancy ':token' signature is used in place of named params and constant literals in PHP.

 * Currently just inserts one-dimensional events. The API mapping is
   too crude still for spatial message/section/prio collections.


### ToDo

 * Log events are only associated to a primary group event as of now.
   The `:p` filter will allow to regroup events automatically from context information.

      * It'll scan the backtrace for matching prior code paths, and obviously the used :sections to relate events to another.

      * Alternatively control it manually:

              $p = Ε‚("first");
              Ε‚("second", "p:$p");

        A logger call returns the new event id. And `p:` is just the field that keeps the handle.

<!-- -->

 * The alternative JSON file-append store just keeps event-local ids/groups=1/parent ids. This needs an insertion transaction or trigger for reconverting into a SQLite store. (Rather simple.)

 * To mix application and server-level logging an injector for Apaches `UNIQUE_ID` might make sense. Still would require log post-processing to turn it into a sequential event id. Or alternatively stash all entries with a UUID.

 * Investigate whether logstash (supposedly works with multiline/mutate), fluentd or graylog2 make suitable targets. Neither seems to provide incremental log ids on submission. Each requires post processing on pushed events, or incremental json/sqlite imports.

 * Most obiously this implementation hinges an a proper GUI. It's kinda pointless to make another ETL toy, if it wasn't for the absence of hierarchical support in existing log viewers. While a web viewer would suffice, only an actual desktop tool would benefit from the intended JSON β†’ SQLite storage.

Z ec7131a7057f6f2edb1152a6adfe6854