GUI editor to tame mod_security rules

⌈⌋ branch:  modseccfg


Check-in [7d5c807be6]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Comment updates, fixed script wrappers, unify update-logfmt to python version.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 7d5c807be675936e53d8ee0a0f491493bab9603e7512c001272dee0ac48771f4
User & Date: mario 2020-12-16 16:38:42
Context
2020-12-17
16:34
@inject __getattr__ for simpler tk.Widget lookups check-in: 45a8f2658a user: mario tags: trunk
2020-12-16
16:38
Comment updates, fixed script wrappers, unify update-logfmt to python version. check-in: 7d5c807be6 user: mario tags: trunk
10:42
Enable [Wrap] button as submenu, just defers to according recipes however. Update dependencies to new logfmt1 check-in: 5c5f0ae2d7 user: mario tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to logfmt1/README.md.

1
2
3
4
5
6
7
8
9
10
11
12












13
14
15

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39














40
41
42
43
44
45
46
47

48
49
50
51
52
53
54
55





**logfmt1** handles `*.log.fmt` files to transform LogFormat / placeholder
strings to regular expressions (named capture groups). Currently just comes
with rules for Apache definitions. It bundles a `logex` and `update-logfmt`
to create/rewrite `*.log.fmt` files globally.

    {
       "class": "apache combined",
       "record": "%h %l %u %t \"%r\" %>s %b",
    }

It's basically meant for universal log parsing, whilst reducing manual
configuration or the restrain on basic log variants. It originated in












[modseccfg](https://fossil.include-once.org/modseccfg/). This Python
package is mostly a stub. You should preferrably install the
[system package](https://apt.include-once.org/):


    apt install python3-logfmt1

This will yield the proper `/usr/share/logfmt/` structure and the run-parts
wrapper `update-logfmt`. The grok placeholders are supported, but remain
untested.


### logfmt1

To craft a regex:

    import logfmt1, json
    fmt = json.load(open("/.../access.log.fmt", "r"))
    rx = logfmt1.regex(fmt)
    rx = logfmt1.rx2re(rx)   # turn into Python regex

Or with plain old guesswork / presuming a standard log format:

    rx = logfmt1.regex({"class": "apache combined"})

Though that's of course not the intended use case, and hinges on
predefined formats in /usr/share/logfmt/.
















### logex

Very crudementary extractor for log files:

    logex .../access.log --tab @host @date +id

Which of course handles the `.fmt` implicitly.



### update-logfmt

The Python package does bundle a run-parts wrapper, but just the apache
collector, and a local Python copy of the format database. It should discover
all `*.log` files nonetheless and pair them with `.fmt` declarations.






|
|
|
|






|
<
>
>
>
>
>
>
>
>
>
>
>
>
|
<
|
>




|
<




|













>
>
>
>
>
>
>
>
>
>
>
>
>
>







|
>






|

>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11

12
13
14
15
16
17
18
19
20
21
22
23
24

25
26
27
28
29
30
31

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
**logfmt1** is meant for universal log parsing, whilst reducing manual
configuration or restricting to basic log variants. It handles `*.log.fmt`
files to transform LogFormat / placeholder strings to regular expressions
(with named capture groups).

    {
       "class": "apache combined",
       "record": "%h %l %u %t \"%r\" %>s %b",
    }

For instance would resolve to:


    (?<remote_host>[\\w\\-.:]+) (?<remote_logname>[\\w\\-.:]+) (?<remote_user>[\\-\\w@.]+)
    \\[?(?<request_time>\\d[\\d:\\w\\s:./\\-+,;]+)\\]? "(?<request_line>(?<request_method>\\w+)
    (?<request_path>\\S+) (?<request_protocol>[\\w/\\d.]+))" (?<status>-|\\d\\d\\d)
    (?<bytes_sent>\\d+|-)'
    
This python package currently just comes with:

  * `.fmt` definitions for apache + strftime + grok placeholders.
  * `logex` - a basic log extractor
  * And `update-logfmt` to create/rewrite `*.log.fmt` files globally.

It originated in [modseccfg](https://fossil.include-once.org/modseccfg/).

You should ideally install the [system package](https://apt.include-once.org/)
however:

    apt install python3-logfmt1

This will yield the proper `/usr/share/logfmt/` structure and the run-parts
wrapper `update-logfmt`.



### logfmt1

To manually craft a regex:

    import logfmt1, json
    fmt = json.load(open("/.../access.log.fmt", "r"))
    rx = logfmt1.regex(fmt)
    rx = logfmt1.rx2re(rx)   # turn into Python regex

Or with plain old guesswork / presuming a standard log format:

    rx = logfmt1.regex({"class": "apache combined"})

Though that's of course not the intended use case, and hinges on
predefined formats in /usr/share/logfmt/.


### logfmt1.logopen()

`logopen(fn=…)` is basically a file-like iterator that yields
dictionaries rather than text strings.

    for row in logfmt1.logopen(".../access.log"):
	print(row["request_time"])

And it provides a basic regex/formatstring debugging feature (via
`debug=True` parameter or with `logex -D`):

![failed regex section](https://imgur.com/QBKzDsK.png)


### logex

Very crudementary extractor for log files:

    logex .../access.log --tab @host @date +id

Which also handles the `.fmt` implicitly. (Kinda the whole point of
this project.)


### update-logfmt

The Python package does bundle a run-parts wrapper, but just the apache
collector, and a local Python copy of the format database. It should discover
all (Apache) `*.log` files nonetheless and pair them with `.fmt` declarations.

And that's sort of the main aspect of this project. Establish .log.fmt files
until application vendors come around to making logs parseable. The rules
database structure is subject to change, and only one possible implementation.
There might also be simpler approaches (grok mapping) to generate regexps
for format strings.

Changes to logfmt1/logex.py.

41
42
43
44
45
46
47



48
49
50
51
52
53
54


import sys, re, json
import traceback, dateutil.parser
import logfmt1





#-- args
argv = sys.argv
space = " "
if "--tab" in argv:
    space = "\t"
if "--csv" in argv:
    space = "," 







>
>
>







41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57


import sys, re, json
import traceback, dateutil.parser
import logfmt1


def main():
    pass
    
#-- args
argv = sys.argv
space = " "
if "--tab" in argv:
    space = "\t"
if "--csv" in argv:
    space = "," 

Changes to logfmt1/logfmt1.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
..
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
..
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
...
435
436
437
438
439
440
441

442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
# encoding: utf-8
# api: python
# title: python3-logfmt1
# description: handle *.log.fmt specifiers and regex conversion
# type: transform
# category: io
# version: 0.4
# license: Apache-2.0
# pack:
#    logfmt1.py=/usr/lib/python3/dist-packages/
#    update-logfmt=/usr/bin/
#    ./logex.py=/usr/bin/logex
#    share=/usr/share/logfmt
# architecture: all
# depends: python (>= 3.6)
# url: https://fossil.include-once.org/modseccfg/wiki/logfmt1
#
# Logging format strings to regex conversion.
................................................................................

        #"record": "%h %l %u %t \"%r\" %>s %b",

        #"regex": "(?<remote_host>\S+) …",

        "separator": " ",
        "rewrite": {
            "%[\d!,]+": "%",      # strip Apache placehoder conditions
            "(?<!\\\\)([\[\]])": r"\\$1",  # escape meta chars
            "%%": "%",
        },
        "placeholder": "%[<>]?(?:\w*\{[^\}]+\})?\^?\w+",

        # placeholder definitions to build regex: from
        "fields": {
................................................................................
            "%a": { "id": "remote_addr", "rx": "[\d.:a-f]+" },
            "%{c}a": { "id": "remote_addr", "rx": "[\d.:a-f]+" },
            "%h": { "id": "remote_host", "rx": "[\w\-.:]+" },
            "%{c}h": { "id": "remote_host", "rx": "[\w\-.:]+" },
            "%A": { "id": "local_address", "rx": "[\d.:a-f]+" },
            "%u": { "id": "remote_user", "rx": "[\-\w@.]+" },
            "%l": { "id": "remote_logname", "rx": "[\w\-.:]+" },   # %alias `loglevel` (errlog)
            "%t": { "id": "request_time", "rx": "\[(\d[\d:\w\s:./\-+,;]+)\]" }, # might be "local" formatting, e.g. [01/Mnt/2020:11:22:33 +0100], %alias `ctime`
            "%{u}t": { "id": "request_time", "rx": "\d+/\w+/\d+:\d+:\d+:\d+\.\d+\s\+\d+" },  # 01/Mnt/2020:11:22:33.12345 +0100 no implicit brackets
            "%{cu}t": { "id": "request_time", "rx": "\d+-\w+-\d+\s\d+:\d+:\d+\.\d+" },  # error.log-only, 2020-01-31 11:22:33.901234, compact ISO 8601 format, no implicit brackets
            "%{msec_frac}t": { "id": "msec_frac", "rx": "[\d.]+" },
            "%{usec_frac}t": { "id": "usec_frac", "rx": "[\d.]+" },
            "%f": { "id": "request_file", "rx": "[^\s\"]+" },
            "%b": { "id": "bytes_sent", "rx": "\d+|-" },
            "%B": { "id": "bytes_sent", "rx": "\d+|-" },
................................................................................
    def __next__(self):
        line = self.f.readline()
        if not line:  # should be implied really
            raise StopIteration()
        m = self.rx.match(line)
        if m:
            d = m.groupdict()

            self.container_expand(d)
            if self.duplicate:
                for trg,src in self.alias.items():
                    if src in d and not trg in d:
                        d[trg] = d[src]
            return d
        elif self.debug:
            self.debug_rx(line)
            if self.fail:
                raise StopIteration()
        elif self.fail:
            raise StopIteration()
        else:
            pass # jusst try next line
    
    # pass .close() and similar to file object
    def __getattr__(self, name):
        return getattr(self.f, name)

    # add [key "value"] fields
    def container_expand(self, d):
        for k,opt in self.container.items():
            if k in d:
                for id,val in re.findall(opt["rx"], d[k]):
                    if not id in d:
                        d[id] = val
                    elif not isinstance(d[id], "list"):
                        d[id] = [d[id], val]
                    else:
                        d[id].append(val)

    # ANSI output for debugging regex/fmt string
    def debug_rx(self, line):
        rx = self.rx.pattern






|



|







 







|







 







|







 







>
|












|












|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
..
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
..
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
...
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
# encoding: utf-8
# api: python
# title: python3-logfmt1
# description: handle *.log.fmt specifiers and regex conversion
# type: transform
# category: io
# version: 0.4-p2
# license: Apache-2.0
# pack:
#    logfmt1.py=/usr/lib/python3/dist-packages/
#    update_logfmt.py=/usr/bin/update-logfmt
#    ./logex.py=/usr/bin/logex
#    share=/usr/share/logfmt
# architecture: all
# depends: python (>= 3.6)
# url: https://fossil.include-once.org/modseccfg/wiki/logfmt1
#
# Logging format strings to regex conversion.
................................................................................

        #"record": "%h %l %u %t \"%r\" %>s %b",

        #"regex": "(?<remote_host>\S+) …",

        "separator": " ",
        "rewrite": {
            "%[\d!,+\-]+": "%",      # strip Apache placehoder conditions
            "(?<!\\\\)([\[\]])": r"\\$1",  # escape meta chars
            "%%": "%",
        },
        "placeholder": "%[<>]?(?:\w*\{[^\}]+\})?\^?\w+",

        # placeholder definitions to build regex: from
        "fields": {
................................................................................
            "%a": { "id": "remote_addr", "rx": "[\d.:a-f]+" },
            "%{c}a": { "id": "remote_addr", "rx": "[\d.:a-f]+" },
            "%h": { "id": "remote_host", "rx": "[\w\-.:]+" },
            "%{c}h": { "id": "remote_host", "rx": "[\w\-.:]+" },
            "%A": { "id": "local_address", "rx": "[\d.:a-f]+" },
            "%u": { "id": "remote_user", "rx": "[\-\w@.]+" },
            "%l": { "id": "remote_logname", "rx": "[\w\-.:]+" },   # %alias `loglevel` (errlog)
            "%t": { "id": "request_time", "rx": "\[?(\d[\d:\w\s:./\-+,;]+)\]?" }, # might be "local" formatting, e.g. [01/Mnt/2020:11:22:33 +0100], %alias `ctime`
            "%{u}t": { "id": "request_time", "rx": "\d+/\w+/\d+:\d+:\d+:\d+\.\d+\s\+\d+" },  # 01/Mnt/2020:11:22:33.12345 +0100 no implicit brackets
            "%{cu}t": { "id": "request_time", "rx": "\d+-\w+-\d+\s\d+:\d+:\d+\.\d+" },  # error.log-only, 2020-01-31 11:22:33.901234, compact ISO 8601 format, no implicit brackets
            "%{msec_frac}t": { "id": "msec_frac", "rx": "[\d.]+" },
            "%{usec_frac}t": { "id": "usec_frac", "rx": "[\d.]+" },
            "%f": { "id": "request_file", "rx": "[^\s\"]+" },
            "%b": { "id": "bytes_sent", "rx": "\d+|-" },
            "%B": { "id": "bytes_sent", "rx": "\d+|-" },
................................................................................
    def __next__(self):
        line = self.f.readline()
        if not line:  # should be implied really
            raise StopIteration()
        m = self.rx.match(line)
        if m:
            d = m.groupdict()
            if self.container:
                self.container_expand(d)
            if self.duplicate:
                for trg,src in self.alias.items():
                    if src in d and not trg in d:
                        d[trg] = d[src]
            return d
        elif self.debug:
            self.debug_rx(line)
            if self.fail:
                raise StopIteration()
        elif self.fail:
            raise StopIteration()
        else:
            pass # just try next line
    
    # pass .close() and similar to file object
    def __getattr__(self, name):
        return getattr(self.f, name)

    # add [key "value"] fields
    def container_expand(self, d):
        for k,opt in self.container.items():
            if k in d:
                for id,val in re.findall(opt["rx"], d[k]):
                    if not id in d:
                        d[id] = val
                    elif not isinstance(d[id], list):
                        d[id] = [d[id], val]
                    else:
                        d[id].append(val)

    # ANSI output for debugging regex/fmt string
    def debug_rx(self, line):
        rx = self.rx.pattern

Changes to logfmt1/setup.py.

23
24
25
26
27
28
29
30
31
32
33
34
35
           "./share/*",
           "./share/update/*"
        ],
    },
    #data_files=[],
    entry_points={
        "console_scripts": [
            "logex=logfmt1.logex",
            "update-logfmt=logfmt1.update_logfmt_all",
        ]
    }
)








|
|




23
24
25
26
27
28
29
30
31
32
33
34
35
           "./share/*",
           "./share/update/*"
        ],
    },
    #data_files=[],
    entry_points={
        "console_scripts": [
            "logex=logfmt1.logex:main",
            "update-logfmt=logfmt1.update_logfmt:main",
        ]
    }
)

Changes to logfmt1/share/apache.combined.fmt.

1
2
3
4
5
6
7
{
    "class": "apache combined",
    "record": "%h %l %u %t \"%r\" %>s %b",
    "glob": [
        "*.access.log"
    ]
}


|




1
2
3
4
5
6
7
{
    "class": "apache combined",
    "record": "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"",
    "glob": [
        "*.access.log"
    ]
}

Changes to logfmt1/share/apache.fmt.

1
2
3
4
5
6
7
8
9
10
11
12
..
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
{
    "class": "apache generic",
    "separator": " ",
    "rewrite": {
        "%[\\d!,]+": "%",
        "(?<!\\\\)([\\[\\]])": "\\\\$1",
        "%%": "%"
    },
    "placeholder": "%[<>]?(?:\\w*\\{[^\\}]+\\})?\\^?\\w+",
    "fields": {
        "%a": {
            "id": "remote_addr",
................................................................................
        },
        "%l": {
            "id": "remote_logname",
            "rx": "[\\w\\-.:]+"
        },
        "%t": {
            "id": "request_time",
            "rx": "\\[(\\d[\\d:\\w\\s:./\\-+,;]+)\\]"
        },
        "%{u}t": {
            "id": "request_time",
            "rx": "\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\.\\d+\\s\\+\\d+"
        },
        "%{cu}t": {
            "id": "request_time",




|







 







|







1
2
3
4
5
6
7
8
9
10
11
12
..
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
{
    "class": "apache generic",
    "separator": " ",
    "rewrite": {
        "%[\\d!,+\\-]+": "%",
        "(?<!\\\\)([\\[\\]])": "\\\\$1",
        "%%": "%"
    },
    "placeholder": "%[<>]?(?:\\w*\\{[^\\}]+\\})?\\^?\\w+",
    "fields": {
        "%a": {
            "id": "remote_addr",
................................................................................
        },
        "%l": {
            "id": "remote_logname",
            "rx": "[\\w\\-.:]+"
        },
        "%t": {
            "id": "request_time",
            "rx": "\\[?(\\d[\\d:\\w\\s:./\\-+,;]+)\\]?"
        },
        "%{u}t": {
            "id": "request_time",
            "rx": "\\d+/\\w+/\\d+:\\d+:\\d+:\\d+\\.\\d+\\s\\+\\d+"
        },
        "%{cu}t": {
            "id": "request_time",

Changes to logfmt1/share/update/apache.

1

2


3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
...
142
143
144
145
146
147
148



149
150
151
152
153
154
155
#!/usr/bin/env python3





import os, re, sys, random
import subprocess
import traceback
import json
from pprint import pprint
try:
    import logfmt1
except:
    from modseccfg import logfmt1


# extraction patterns
class rx:
    # a conf file '(*) /etc/apache2/main.conf'
    dump_includes = re.compile("^\s*\([\d*]+\)\s+(.+)$", re.M)
    # directives we care about (to detect relevant .conf files)
    interesting = re.compile("""
        ^ \s*
         ( (Error|Custom|Global|Forensic|Transfer)Log | (Error)?LogFormat )           # log directivess
        """,
        re.M|re.I|re.X
    )
    # extract directive line including line continuations (<\><NL>)
    configline = re.compile(
        """ ^
        [\ \\t]*                          # whitespace \h*
................................................................................
    def transferlog(self, args):
        self.customlog([args[0], "transfer"])
    def logformat(self, args):
        if len(args) == 1: args[1] = "transfer"
        tmp.log_formats[args[1]] = args[0].replace('\\"', '"')
    def errorlogformat(self, args):
        self.logformat([args[0], "error"])





# scan for APACHE_ENV= vars
def read_env_vars():
    for fn in tmp.env_locations:
        if os.path.exists(fn):
            src = open(fn, "r", encoding="utf-8").read()

>
|
>
>






<
|
<
<









|







 







>
>
>







1
2
3
4
5
6
7
8
9
10
11

12


13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
...
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#!/usr/bin/env python3
# description: extract *Log options from all apache *.conf to create .log.fmt files
# 
# This is a simpler version of the modseccfg vhost reader.
#

import os, re, sys, random
import subprocess
import traceback
import json
from pprint import pprint

import logfmt1




# extraction patterns
class rx:
    # a conf file '(*) /etc/apache2/main.conf'
    dump_includes = re.compile("^\s*\([\d*]+\)\s+(.+)$", re.M)
    # directives we care about (to detect relevant .conf files)
    interesting = re.compile("""
        ^ \s*
         ( (Error|Custom|Global|Forensic|Transfer)Log | (Error)?LogFormat )           # log directives
        """,
        re.M|re.I|re.X
    )
    # extract directive line including line continuations (<\><NL>)
    configline = re.compile(
        """ ^
        [\ \\t]*                          # whitespace \h*
................................................................................
    def transferlog(self, args):
        self.customlog([args[0], "transfer"])
    def logformat(self, args):
        if len(args) == 1: args[1] = "transfer"
        tmp.log_formats[args[1]] = args[0].replace('\\"', '"')
    def errorlogformat(self, args):
        self.logformat([args[0], "error"])
        
    # could look into LoadModule directives to determine errorlogformat
    # from e.g. mpm_prefork being present


# scan for APACHE_ENV= vars
def read_env_vars():
    for fn in tmp.env_locations:
        if os.path.exists(fn):
            src = open(fn, "r", encoding="utf-8").read()

Deleted logfmt1/update-logfmt.

1
2
#!/bin/sh
run-parts /usr/share/logfmt/update/
<
<




Name change from logfmt1/update_logfmt_all.py to logfmt1/update_logfmt.py.

4
5
6
7
8
9
10



11
12
13
14
# description: invoke ./share/update/* scripts
# type: virtual
#
# Stub that reimplements run-parts

import os, re




for dir in [re.sub("[.\w]+$", "share/update", __file__), "/usr/share/logfmt/update"]:
    if os.path.exists(dir):
        os.system(f"run-parts {dir}")
        break







>
>
>
|



4
5
6
7
8
9
10
11
12
13
14
15
16
17
# description: invoke ./share/update/* scripts
# type: virtual
#
# Stub that reimplements run-parts

import os, re

def main():
    pass

for dir in ["/usr/share/logfmt/update", re.sub("[.\w]+$", "share/update", __file__)]:
    if os.path.exists(dir):
        os.system(f"run-parts {dir}")
        break