Check-in [7d5c807be6]
Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Comment updates, fixed script wrappers, unify update-logfmt to python version. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
7d5c807be675936e53d8ee0a0f491493 |
User & Date: | mario 2020-12-16 16:38:42 |
Context
2020-12-17
| ||
16:34 | @inject __getattr__ for simpler tk.Widget lookups check-in: 45a8f2658a user: mario tags: trunk | |
2020-12-16
| ||
16:38 | Comment updates, fixed script wrappers, unify update-logfmt to python version. check-in: 7d5c807be6 user: mario tags: trunk | |
10:42 | Enable [Wrap] button as submenu, just defers to according recipes however. Update dependencies to new logfmt1 check-in: 5c5f0ae2d7 user: mario tags: trunk | |
Changes
Changes to logfmt1/README.md.
|
| > > | | < < > | > > > > | > > > > > > | < | > | < | > > > > > > > > > > > > > > | > | > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | **logfmt1** is meant for universal log parsing, whilst reducing manual configuration or restricting to basic log variants. It handles `*.log.fmt` files to transform LogFormat / placeholder strings to regular expressions (with named capture groups). { "class": "apache combined", "record": "%h %l %u %t \"%r\" %>s %b", } For instance would resolve to: (?<remote_host>[\\w\\-.:]+) (?<remote_logname>[\\w\\-.:]+) (?<remote_user>[\\-\\w@.]+) \\[?(?<request_time>\\d[\\d:\\w\\s:./\\-+,;]+)\\]? "(?<request_line>(?<request_method>\\w+) (?<request_path>\\S+) (?<request_protocol>[\\w/\\d.]+))" (?<status>-|\\d\\d\\d) (?<bytes_sent>\\d+|-)' This python package currently just comes with: * `.fmt` definitions for apache + strftime + grok placeholders. * `logex` - a basic log extractor * And `update-logfmt` to create/rewrite `*.log.fmt` files globally. It originated in [modseccfg](https://fossil.include-once.org/modseccfg/). You should ideally install the [system package](https://apt.include-once.org/) however: apt install python3-logfmt1 This will yield the proper `/usr/share/logfmt/` structure and the run-parts wrapper `update-logfmt`. ### logfmt1 To manually craft a regex: import logfmt1, json fmt = json.load(open("/.../access.log.fmt", "r")) rx = logfmt1.regex(fmt) rx = logfmt1.rx2re(rx) # turn into Python regex Or with plain old guesswork / presuming a standard log format: rx = logfmt1.regex({"class": "apache combined"}) Though that's of course not the intended use case, and hinges on predefined formats in /usr/share/logfmt/. ### logfmt1.logopen() `logopen(fn=…)` is basically a file-like iterator that yields dictionaries rather than text strings. for row in logfmt1.logopen(".../access.log"): print(row["request_time"]) And it provides a basic regex/formatstring debugging feature (via `debug=True` parameter or with `logex -D`): ![failed regex section](https://imgur.com/QBKzDsK.png) ### logex Very crudementary extractor for log files: logex .../access.log --tab @host @date +id Which also handles the `.fmt` implicitly. (Kinda the whole point of this project.) ### update-logfmt The Python package does bundle a run-parts wrapper, but just the apache collector, and a local Python copy of the format database. It should discover all (Apache) `*.log` files nonetheless and pair them with `.fmt` declarations. And that's sort of the main aspect of this project. Establish .log.fmt files until application vendors come around to making logs parseable. The rules database structure is subject to change, and only one possible implementation. There might also be simpler approaches (grok mapping) to generate regexps for format strings. |
Changes to logfmt1/logex.py.
︙ | ︙ | |||
41 42 43 44 45 46 47 48 49 50 51 52 53 54 | import sys, re, json import traceback, dateutil.parser import logfmt1 #-- args argv = sys.argv space = " " if "--tab" in argv: space = "\t" if "--csv" in argv: space = "," | > > > | 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | import sys, re, json import traceback, dateutil.parser import logfmt1 def main(): pass #-- args argv = sys.argv space = " " if "--tab" in argv: space = "\t" if "--csv" in argv: space = "," |
︙ | ︙ |
Changes to logfmt1/logfmt1.py.
1 2 3 4 5 6 | # encoding: utf-8 # api: python # title: python3-logfmt1 # description: handle *.log.fmt specifiers and regex conversion # type: transform # category: io | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # encoding: utf-8 # api: python # title: python3-logfmt1 # description: handle *.log.fmt specifiers and regex conversion # type: transform # category: io # version: 0.4-p2 # license: Apache-2.0 # pack: # logfmt1.py=/usr/lib/python3/dist-packages/ # update_logfmt.py=/usr/bin/update-logfmt # ./logex.py=/usr/bin/logex # share=/usr/share/logfmt # architecture: all # depends: python (>= 3.6) # url: https://fossil.include-once.org/modseccfg/wiki/logfmt1 # # Logging format strings to regex conversion. |
︙ | ︙ | |||
68 69 70 71 72 73 74 | #"record": "%h %l %u %t \"%r\" %>s %b", #"regex": "(?<remote_host>\S+) …", "separator": " ", "rewrite": { | | | | 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | #"record": "%h %l %u %t \"%r\" %>s %b", #"regex": "(?<remote_host>\S+) …", "separator": " ", "rewrite": { "%[\d!,+\-]+": "%", # strip Apache placehoder conditions "(?<!\\\\)([\[\]])": r"\\$1", # escape meta chars "%%": "%", }, "placeholder": "%[<>]?(?:\w*\{[^\}]+\})?\^?\w+", # placeholder definitions to build regex: from "fields": { "%a": { "id": "remote_addr", "rx": "[\d.:a-f]+" }, "%{c}a": { "id": "remote_addr", "rx": "[\d.:a-f]+" }, "%h": { "id": "remote_host", "rx": "[\w\-.:]+" }, "%{c}h": { "id": "remote_host", "rx": "[\w\-.:]+" }, "%A": { "id": "local_address", "rx": "[\d.:a-f]+" }, "%u": { "id": "remote_user", "rx": "[\-\w@.]+" }, "%l": { "id": "remote_logname", "rx": "[\w\-.:]+" }, # %alias `loglevel` (errlog) "%t": { "id": "request_time", "rx": "\[?(\d[\d:\w\s:./\-+,;]+)\]?" }, # might be "local" formatting, e.g. [01/Mnt/2020:11:22:33 +0100], %alias `ctime` "%{u}t": { "id": "request_time", "rx": "\d+/\w+/\d+:\d+:\d+:\d+\.\d+\s\+\d+" }, # 01/Mnt/2020:11:22:33.12345 +0100 no implicit brackets "%{cu}t": { "id": "request_time", "rx": "\d+-\w+-\d+\s\d+:\d+:\d+\.\d+" }, # error.log-only, 2020-01-31 11:22:33.901234, compact ISO 8601 format, no implicit brackets "%{msec_frac}t": { "id": "msec_frac", "rx": "[\d.]+" }, "%{usec_frac}t": { "id": "usec_frac", "rx": "[\d.]+" }, "%f": { "id": "request_file", "rx": "[^\s\"]+" }, "%b": { "id": "bytes_sent", "rx": "\d+|-" }, "%B": { "id": "bytes_sent", "rx": "\d+|-" }, |
︙ | ︙ | |||
435 436 437 438 439 440 441 | def __next__(self): line = self.f.readline() if not line: # should be implied really raise StopIteration() m = self.rx.match(line) if m: d = m.groupdict() | > | | | | 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | def __next__(self): line = self.f.readline() if not line: # should be implied really raise StopIteration() m = self.rx.match(line) if m: d = m.groupdict() if self.container: self.container_expand(d) if self.duplicate: for trg,src in self.alias.items(): if src in d and not trg in d: d[trg] = d[src] return d elif self.debug: self.debug_rx(line) if self.fail: raise StopIteration() elif self.fail: raise StopIteration() else: pass # just try next line # pass .close() and similar to file object def __getattr__(self, name): return getattr(self.f, name) # add [key "value"] fields def container_expand(self, d): for k,opt in self.container.items(): if k in d: for id,val in re.findall(opt["rx"], d[k]): if not id in d: d[id] = val elif not isinstance(d[id], list): d[id] = [d[id], val] else: d[id].append(val) # ANSI output for debugging regex/fmt string def debug_rx(self, line): rx = self.rx.pattern |
︙ | ︙ |
Changes to logfmt1/setup.py.
︙ | ︙ | |||
23 24 25 26 27 28 29 | "./share/*", "./share/update/*" ], }, #data_files=[], entry_points={ "console_scripts": [ | | | | 23 24 25 26 27 28 29 30 31 32 33 34 35 | "./share/*", "./share/update/*" ], }, #data_files=[], entry_points={ "console_scripts": [ "logex=logfmt1.logex:main", "update-logfmt=logfmt1.update_logfmt:main", ] } ) |
Changes to logfmt1/share/apache.combined.fmt.
1 2 | { "class": "apache combined", | | | 1 2 3 4 5 6 7 | { "class": "apache combined", "record": "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", "glob": [ "*.access.log" ] } |
Changes to logfmt1/share/apache.fmt.
1 2 3 4 | { "class": "apache generic", "separator": " ", "rewrite": { | | | 1 2 3 4 5 6 7 8 9 10 11 12 | { "class": "apache generic", "separator": " ", "rewrite": { "%[\\d!,+\\-]+": "%", "(?<!\\\\)([\\[\\]])": "\\\\$1", "%%": "%" }, "placeholder": "%[<>]?(?:\\w*\\{[^\\}]+\\})?\\^?\\w+", "fields": { "%a": { "id": "remote_addr", |
︙ | ︙ | |||
34 35 36 37 38 39 40 | }, "%l": { "id": "remote_logname", "rx": "[\\w\\-.:]+" }, "%t": { "id": "request_time", | | | 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | }, "%l": { "id": "remote_logname", "rx": "[\\w\\-.:]+" }, "%t": { "id": "request_time", "rx": "\\[?(\\d[\\d:\\w\\s:./\\-+,;]+)\\]?" }, "%{u}t": { "id": "request_time", "rx": "\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\.\\d+\\s\\+\\d+" }, "%{cu}t": { "id": "request_time", |
︙ | ︙ |
Changes to logfmt1/share/update/apache.
1 | #!/usr/bin/env python3 | > | > > < | < < | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #!/usr/bin/env python3 # description: extract *Log options from all apache *.conf to create .log.fmt files # # This is a simpler version of the modseccfg vhost reader. # import os, re, sys, random import subprocess import traceback import json from pprint import pprint import logfmt1 # extraction patterns class rx: # a conf file '(*) /etc/apache2/main.conf' dump_includes = re.compile("^\s*\([\d*]+\)\s+(.+)$", re.M) # directives we care about (to detect relevant .conf files) interesting = re.compile(""" ^ \s* ( (Error|Custom|Global|Forensic|Transfer)Log | (Error)?LogFormat ) # log directives """, re.M|re.I|re.X ) # extract directive line including line continuations (<\><NL>) configline = re.compile( """ ^ [\ \\t]* # whitespace \h* |
︙ | ︙ | |||
142 143 144 145 146 147 148 149 150 151 152 153 154 155 | def transferlog(self, args): self.customlog([args[0], "transfer"]) def logformat(self, args): if len(args) == 1: args[1] = "transfer" tmp.log_formats[args[1]] = args[0].replace('\\"', '"') def errorlogformat(self, args): self.logformat([args[0], "error"]) # scan for APACHE_ENV= vars def read_env_vars(): for fn in tmp.env_locations: if os.path.exists(fn): src = open(fn, "r", encoding="utf-8").read() | > > > | 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | def transferlog(self, args): self.customlog([args[0], "transfer"]) def logformat(self, args): if len(args) == 1: args[1] = "transfer" tmp.log_formats[args[1]] = args[0].replace('\\"', '"') def errorlogformat(self, args): self.logformat([args[0], "error"]) # could look into LoadModule directives to determine errorlogformat # from e.g. mpm_prefork being present # scan for APACHE_ENV= vars def read_env_vars(): for fn in tmp.env_locations: if os.path.exists(fn): src = open(fn, "r", encoding="utf-8").read() |
︙ | ︙ |
Deleted logfmt1/update-logfmt.
|
| < < |
Name change from logfmt1/update_logfmt_all.py to logfmt1/update_logfmt.py.
1 2 3 4 5 6 7 8 9 10 | #!/usr/bin/env python3 # encoding: utf-8 # title: update-logfmt # description: invoke ./share/update/* scripts # type: virtual # # Stub that reimplements run-parts import os, re | > > > | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/usr/bin/env python3 # encoding: utf-8 # title: update-logfmt # description: invoke ./share/update/* scripts # type: virtual # # Stub that reimplements run-parts import os, re def main(): pass for dir in ["/usr/share/logfmt/update", re.sub("[.\w]+$", "share/update", __file__)]: if os.path.exists(dir): os.system(f"run-parts {dir}") break |