A .log.fmt for each log file

Rationale

Log parsing is a curse, because each application has its own format, and oftentimes configurable fields at that. Various attempts at standardizing logs have failed, or are bound to. Logging services and database storage are largely just symptomatic kludges, with JSON logs and not-quite-JSON formats held back by inertia.

Instead logfmt1 aims to have descriptors for each log file, in order to make them parseable. You can't attempt anything but guesswork until you know what's in a file.

So the idea is to have a *.fmt next to each *.log file, with a descriptor such as:

{
   "class": "apache combined",
   "record": "%h %l %u %t \"%r\" %>s %b"
}

Notably the "record" field should be the most current format string that the application itself uses. In order to resolve the placeholders, an application reference is kept in "class". Which allows combining the format string with placeholder field definitions from the global .fmt database (/usr/share/logfmt) database.

common classes

There aren't many predefined classes yet, but special values that could work without a current "record": declaration might be:

"class": "grok syslog"
Reads the according definition from a .grok (or perhaps preconverted) pattern definition. Which are largely static patterns.
"class": "inilog"
For Heroku/Go "logfmt" style logs comprised of only key=value fields
"class": "json appmoniker"
For real JSON logs, with an application identifier here (for decoration)
"class": "apache common"
Reads a predefined/static record: definition from the global apache.common.fmt. Which of course means it would fail to parse, if the user diverted the LogFormat declaration in Apache.

Note that predefined classes undermine the purpose of logfmt1, in that they're only suitable for static/non-variant log formats.

additional fields

The *.log.fmt itself might declare definitions such as aliases and more specific/custom placeholders.

{
   "class": "apache cust3",
   "record": "%a %h %{iso}t '%r' %s",
   "fields": {
       "%{iso}t": { "id": "datetime", "rx": "..." }
   },
   "alias": {
       "iso8601": "datetime",
   }
}

Which ought to be joined and override any global fmt definitions. Though such user customizations are more likely to be applied there anyway. Care should be taken by update-logfmt or applications to not jettison user-customized *.log.fmt options.

rationale

Having the .fmt files adjecent to log files seems the most convenient option.

  • Appending a .fmt suffix to the ….log filename doesn't obstruct tab completion as much as .fmt substituting .log.
  • Doesn't require a lookup table or directory, with additional permission or updating woes.
  • And (over time) enabled applications themselves to create a .log.fmt for each log file. (That's kinda the goal. The update-logfmt scripts are a stop-gap workaround.)