<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="img/favicon.ico">
<title>global .fmt db - logfmt1</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700" />
<link rel="stylesheet" href="css/theme.css" />
<link rel="stylesheet" href="css/theme_extra.css" />
<link href="custom.css" rel="stylesheet" />
<link href="syntax.css" rel="stylesheet" />
<script>
// Current page data
var mkdocs_page_name = "global .fmt db";
var mkdocs_page_input_path = "fmt.md";
var mkdocs_page_url = null;
</script>
<script src="js/jquery-2.1.1.min.js" defer></script>
<script src="js/modernizr-2.8.3.min.js" defer></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="." class="icon icon-home"> logfmt1</a>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul>
<li class="toctree-l1"><a class="reference internal" href="index.html">Intro</a>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="log.fmt.html">.log.fmt</a>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="logopen.html">logopen()</a>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="regex.html">regex()</a>
</li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal current" href="fmt.html">global .fmt db</a>
<ul class="current">
<li class="toctree-l2"><a class="reference internal" href="#fmt-example">.fmt Example</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#class">class:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#record">record:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#separator">separator:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#placeholder">placeholder:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#rewrite">rewrite:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#fields">fields:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#expand">expand:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#alias">alias:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#container">container:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#glob">glob:</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#comment-fields">#comment: fields</a>
</li>
<li class="toctree-l2"><a class="reference internal" href="#other-format-files">Other format files</a>
<ul>
<li class="toctree-l3"><a class="reference internal" href="#grok-definitions">.grok definitions</a>
</li>
<li class="toctree-l3"><a class="reference internal" href="#lnav-formats">.lnav formats</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="update-logfmt.html">update-logfmt</a>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="logex.html">logex</a>
</li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href=".">logfmt1</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href=".">Docs</a> »</li>
<li>global .fmt db</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>❮❗❯ This is all very provisional. (First draft. Names might still change.)</p>
</div>
<h2 id="global-fmt-database">Global .fmt database</h2>
<p>While each log file should be accompanied by a <a href="log.fmt.html">.fmt descriptor</a>,
the global database in <code>/usr/share/logfmt/</code> contains a full .fmt field
definition for each class. And the cross-section of both allows to construct
a regex.</p>
<p>Most notably the <code>"fields":</code> and <code>"placeholder":</code> are used to turn the
<code>"record":</code> string definition into a capture pattern.</p>
<h3 id="fmt-example">.fmt Example</h3>
<p>The Apache format definition (apache.fmt) contains:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="nt">"class"</span><span class="p">:</span> <span class="s2">"apache generic"</span><span class="p">,</span>
<span class="nt">"separator"</span><span class="p">:</span> <span class="s2">" "</span><span class="p">,</span>
<span class="nt">"rewrite"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"%[\\d!,+\\-]+"</span><span class="p">:</span> <span class="s2">"%"</span><span class="p">,</span>
<span class="nt">"%%"</span><span class="p">:</span> <span class="s2">"%"</span>
<span class="p">},</span>
<span class="nt">"placeholder"</span><span class="p">:</span> <span class="s2">"%[<>]?(?:\\w*\\{[^\\}]+\\})?\\^?\\w+"</span><span class="p">,</span>
<span class="nt">"fields"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"%a"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"remote_addr"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"[\\d.:a-f]+"</span> <span class="p">},</span>
<span class="nt">"%h"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"remote_host"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"[\\w\\-.:]+"</span> <span class="p">},</span>
<span class="nt">"%{c}h"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"remote_host"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"[\\w\\-.:]+"</span> <span class="p">},</span>
<span class="nt">"%A"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"local_address"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"[\\d.:a-f]+"</span> <span class="p">},</span>
<span class="nt">"%u"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"remote_user"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"[\\-\\w@.]+"</span> <span class="p">},</span>
<span class="nt">"%t"</span><span class="p">:</span> <span class="p">{</span> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"request_time"</span><span class="p">,</span> <span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"\\[?(\\d[\\d:\\w\\s:./\\-+,;]+)\\]?"</span> <span class="p">},</span>
<span class="err">…</span>
<span class="p">},</span>
<span class="nt">"alias"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"remote_address"</span><span class="p">:</span> <span class="s2">"remote_addr"</span><span class="p">,</span>
<span class="nt">"ip"</span><span class="p">:</span> <span class="s2">"remote_addr"</span><span class="p">,</span>
<span class="nt">"file"</span><span class="p">:</span> <span class="s2">"request_file"</span><span class="p">,</span>
<span class="nt">"size"</span><span class="p">:</span> <span class="s2">"bytes_sent"</span><span class="p">,</span>
<span class="err">…</span>
<span class="p">},</span>
<span class="nt">"expand"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"%\\{([^{}]+)\\}t"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"id"</span><span class="p">:</span> <span class="s2">"request_time"</span><span class="p">,</span>
<span class="nt">"class"</span><span class="p">:</span> <span class="s2">"strftime"</span><span class="p">,</span>
<span class="nt">"record"</span><span class="p">:</span> <span class="s2">"$1"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="nt">"container"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"message"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"id"</span><span class="p">:</span> <span class="s2">"$1"</span><span class="p">,</span>
<span class="nt">"value"</span><span class="p">:</span> <span class="s2">"$2"</span><span class="p">,</span>
<span class="nt">"rx"</span><span class="p">:</span> <span class="s2">"\\[(\\w+) \"(.*?)\"\\]"</span><span class="p">,</span>
<span class="nt">"class"</span><span class="p">:</span> <span class="s2">"apache mod_security"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="nt">"glob"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"/var/log/apache*/*acc*.log"</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>It usually does not describe a default "record" format (like the local .log.fmt descriptors do).</p>
<h3 id="class">class:</h3>
<p>The class in the global database is largely decorative. The filenames
instead define the heritage of rules/fields. The "class" as declared by
a .log.fmt is mapped onto <code>/usr/share/logfmt/application.variant.fmt</code>.</p>
<ul>
<li>Usually there's just one variant level per log type. But the lookup is
supposed to be mildly recursive.</li>
<li>Essentially it should merge <code>*.log.fmt</code> with <code>appclass.variant.fmt</code> and
<code>appclass.fmt</code> applied last, so the most specific definitions are retained.</li>
<li>There's also a generic "grok" class. But the patterns therein are largely
static (not build from variable format strings).</li>
<li>Some special classes like "json" might exist. (Not supported by logfmt1)</li>
</ul>
<h3 id="record">record:</h3>
<p>The "record" entry is not usually present in the global .fmt definition.
Some super specific variant definitions (for example apache.error.fmt) or
static formats (syslog.fmt) might however.</p>
<h3 id="separator">separator:</h3>
<p>Most log formats use spaces for separating %placeholder fields. And simpler
implementations might just split up the "record" declaration on this.</p>
<h3 id="placeholder">placeholder:</h3>
<p>While logfmt1 instead uses a regex definition of possible %placeholder
strings to map onto fields. It should account for prefixes/suffixes, unless
those got cleared by the <code>rewrite</code> map.</p>
<p>Not all formatstrings use <code>%\w+</code> to signal placeholders. In nginx for instance
the sigil <code>$\w+</code> introduces placeholders (variable names, really).</p>
<h3 id="rewrite">rewrite:</h3>
<p>A list/map of regex to apply before any transformations or field lookups.
Which can be used to mask or simplify placeholder definitions (for instance
clean up the Apache conditional prefixes) or regex meta characters.</p>
<ul>
<li>The <code>record</code> field starts as a static string, but is meant to be turned
into a regex.</li>
<li>Therefore meta characters (such as <code>|</code> or <code>[]</code>) have to be
taken care of. Which is what the <code>rewrite</code> map is lazily used for.</li>
<li>Better implementations might look up the placeholders, and automatically
escape the rest of the the "record" format string.</li>
</ul>
<h3 id="fields">fields:</h3>
<p>The core of the global .fmt definitions are the field lists. Each defines a
static %F placeholder and associaties it with a default field name (id:) and
regex (rx:) or even a grok definition (grok:).</p>
<table>
<thead>
<tr>
<th>key</th>
<th>purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>%F</code></td>
<td><strong>JSON key</strong>: static placeholder string (not a regex itself)</td>
</tr>
<tr>
<td>id</td>
<td>field identifier, as specified by the application (internal name)</td>
</tr>
<tr>
<td>rx</td>
<td>regex which %F placeholder gets replaced with</td>
</tr>
<tr>
<td>grok</td>
<td>alternatively to regex, %F might be turned into %PATTERN:id</td>
</tr>
<tr>
<td>type</td>
<td>"int" and "float" could designate strictly numeric fields</td>
</tr>
</tbody>
</table>
<div class="admonition notes">
<p class="admonition-title">Notes</p>
<ul>
<li>As part of the regex transformation, a <code>%F</code> could be turned into
<code>(?<id>\S+)</code> for instance.</li>
<li>If there's any unnamed capture group <code>(…)</code>, it should be augmented
into a named capture group - instead of the whole match. (To account
for implicit wrapping.)</li>
<li>The <code>rx</code> itself might however specify named subgroups (like request_line
in Apache logs, itself comprised of _method, _path, _protocol, or the
datetime made up of tm_wday, tm_year, tm_whatever).</li>
<li><code>\S+</code> is also used as fallback for entirely undefined placeholders
(no expand definition matched) in logfmt1.</li>
<li><code>grok</code> isn't currently used, but might allow for simpler transformations
(indirectly into a grok pattern, and later a regex).</li>
</ul>
</div>
<h3 id="expand">expand:</h3>
<p>The expand declarations are used to construct unknown fields/placeholders.
Instead of static %placeholders, each entry describes a regex to detect
new/variant placeholders. Thus it simply can be applied before
separator/placeholder are looked at, to augment the known <code>fields</code> list.</p>
<table>
<thead>
<tr>
<th>key</th>
<th>purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>%\{(\w+)\}t</code></td>
<td><strong>JSON key</strong>: a regex to detect mutable placeholders</td>
</tr>
<tr>
<td>id</td>
<td>name for newly created fields entry, might use captures´ $1</td>
</tr>
<tr>
<td>rx</td>
<td>for static definitions (often just \S+)</td>
</tr>
<tr>
<td>if_quoted</td>
<td>alternative regex, if placeholder was enclosed in "%\w+" quotes</td>
</tr>
<tr>
<td>class</td>
<td>recurse into other .fmt types</td>
</tr>
<tr>
<td>record</td>
<td>can be set to $2 if class: recursion is defined</td>
</tr>
</tbody>
</table>
<div class="admonition notes">
<p class="admonition-title">Notes</p>
<ul>
<li>Typically it suffices to specify the <code>id</code> and <code>rx</code> field.</li>
<li>If no <code>id</code> is given, then the regex capture is normalized into
an identifier (non-alphanumerics stripped, all lowercased).</li>
<li>But the <code>id</code> or <code>record</code> value might be set with regex captures
(e.g. <code>$1</code> or <code>$2</code>) or compound values (<code>"id": "newfield_$1"</code>).</li>
<li>And logfmt1 allows to recurse into other format types per <code>class</code>
(which is used to expand the captured <code>"record": "$1"</code> into regex
tokens).</li>
</ul>
</div>
<h3 id="alias">alias:</h3>
<p>Maps alternative/more common field names onto the declared field <code>id</code>s.</p>
<p>To get to some state of standardization, the field ids usually refer
to application-internal names. (For instance <code>log_pfn_register(…,…,cb_id)</code>
names in Apache). And those aren't always the more commonly used identifiers.</p>
<p>Thus aliases makes sense not just for convenience, but also to be compatible
to other common names (e.g. w3c extend log format names like <code>cs-time</code>).</p>
<h3 id="container">container:</h3>
<p>Is utilized by logopen() to extract additional fields (lists even) from one
of the existing fields. This is usually done at row traversal. And makes
sense for application-specific subformats in logs. Such as any <code>key=value</code>
lists in the main message field.</p>
<table>
<thead>
<tr>
<th>key</th>
<th>purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>message</code></td>
<td><strong>JSON key</strong>: from which field to extract</td>
</tr>
<tr>
<td>rx</td>
<td>regex to detect and capture (key)=(value) fields</td>
</tr>
<tr>
<td>id</td>
<td>unpacked field name (usually just <code>$1</code> from the rx capture</td>
</tr>
<tr>
<td>value</td>
<td>value from capture (so <code>$2</code> typically)</td>
</tr>
<tr>
<td>class</td>
<td>decorative description (no .fmt recursion supported in logfmt1)</td>
</tr>
</tbody>
</table>
<div class="admonition notes">
<p class="admonition-title">Notes</p>
<ul>
<li>The entries here might become lists, since commonly there's just one
<code>message</code> field in logs, yet multiple key:value schemes might be
utilized within.</li>
<li>Or the target field might become a <code>"extract_from":</code> property, and
<code>container</code> a list itself.</li>
<li>Still not sure if automatic list conversion is a good idea. -
Standard fields get an enumaration suffix <code>(?<request_uri2>…)</code> if
duplicated.</li>
</ul>
</div>
<h3 id="glob">glob:</h3>
<p>Might be used by log processors to look up a log class, based on file names,
if no .log.fmt is declared.</p>
<h3 id="comment-fields">#comment: fields</h3>
<p>Documentation entries in the .fmt files have keys starting with <code>#</code>. For example
<code>"#license":</code> or <code>"#origin":</code>. Which is simpler than using JSON with
comments (JSOL/JSON5).</p>
<hr />
<h3 id="other-format-files">Other format files</h3>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This section is about fictional features.</p>
</div>
<h4 id="grok-definitions">.grok definitions</h4>
<blockquote>
<p>Not implemented yet.</p>
</blockquote>
<p>The logfmt/ directory might also contain .grok files, which get transformed
into .fmt structures. (Probably with the grok: parameter for fields, and
a grok: pattern table alongside regular fields:).</p>
<p>There's already a pretransformed <code>grok.fmt</code>, which however requires
<code>%{GROK:%{PATTERN:id}}</code> references currently.</p>
<h4 id="lnav-formats">.lnav formats</h4>
<blockquote>
<p>Not implemented yet.</p>
</blockquote>
<p>Likewise could we use lnav .json format definitions. Those are static
too, however.</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="update-logfmt.html" class="btn btn-neutral float-right" title="update-logfmt">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="regex.html" class="btn btn-neutral" title="regex()"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="https://www.mkdocs.org/">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" aria-label="versions">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="regex.html" style="color: #fcfcfc;">« Previous</a></span>
<span style="margin-left: 15px"><a href="update-logfmt.html" style="color: #fcfcfc">Next »</a></span>
</span>
</div>
<script>var base_url = '.';</script>
<script src="js/theme.js" defer></script>
<script defer>
window.onload = function () {
SphinxRtdTheme.Navigation.enable(true);
};
</script>
</body>
</html>