Update of "AutoupdateRegex"
Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Artifact ID: | 5296b02773abc1ba3b3ce2041520bd950adeaf1e |
---|---|
Page Name: | AutoupdateRegex |
Date: | 2014-08-03 18:28:36 |
Original User: | mario |
Mimetype: | text/x-markdown |
Parent: | c58f2c9764fe38adc0e57161df2c71edb63c8d0a (diff) |
Next | c24bbc1b08d86e26959b70fc6a3a9413cee9cd5d |
The Autoupdate "regex" module is the most versatile for updating release infos. Besides RegExp matching (for text sources), it also supports XPath and jQuery-style selections now, making it more suitable for HTML project websites.
See also Dr. Changelog for trying it out.
Field Rules
It can be configured in the Autoupdate Rules/Regex project field, where it expects a list of key = ...
entries. Each key can list an URL, one or more RegExp, XPath or jQuery expressions.
version = http://example.com/download.html
version = /(\d+\.\d+(\.\d+)+)/
changes = http://example.com/news.html
changes = $("#main .release div.current")
changes = /Summary:\s*(.+?)\R\R/smix
scope = ~((minor|major) (bugfix|cleanup|security))~
state = ~(stable|beta|prerelease)~i
download = $("a.download").attr("href")
It will not update general project descriptions, but only version=
and changes=
or optionally scope=
, state=
and download=
.
- URLs should preceed the extraction expressions.
- For regex rules the first capture group
[1]
will be used as result. - All regex flags
/Umixus
are allowed, and a special/*
match-all, but not/e
of course. - Use line breaks to separate rule assignments. Comments in between will effectively be ignored.
- Xpath expressions for example take the form
changes = (//ul)[1]/li
- jQuery-style selectors can chain
$("div").find("#first")
multiple selector functions, but not JavaScript expressions of course. - Field names may be preceeded by
$
or%
as in$version = /([\d.]+)/
.
URL sources
Initially the primary Autoupdate URL is used as source for extraction. It's equivalent to listing an URL for version =
. Each subsequent field extraction will reuse the lastly retrieved page. Like-named URL entries in Other URLs will also be recognized.
Regex multi-match /* flag
There's a special regex flag /*
for a preg_match_all
mode. It's used by the listing for the Linux kernel (which is a git log) for instance:
changes = /^Date:.+\R\R\s+(.+)\s+[ ]commit/m*
Here multiple occurences will be found, and merged into a changelog list.
Slicing
Oftentimes it's simpler to just narrow down the extraction area however. Therefore repeating key=/regex/
specifiers often is useful:
changes = /Changelog(.+?)\Z/s
changes = /(.+)---/
It's sometimes sensible to mix XPath/jQuery extractions first and a regex thereafter to cut out the actual result:
version = $("article h4")
version = ~Version ([\d.]+)~
Matching rules thus iteratively isolate the field to be populated.
jQuery-style selector chaining
Often it suffices to call the main $()
CSS selector function. And one could again use multiple slicing rules, but many jQuery-style subfunctions can be chained in one line:
changes = $(".article .first").next().find("li")
XPath and jQuery rules cannot wrap around linebreaks. (Unlike RegExps with the /x flag.)
Examples
Neither regular expressions, nor CSS selectors are magic black boxes. They're both quite easy to understand and use:
If you use semantic versioning, then you can keep the \d+.\d+.\d+
version= field. To allow for -beta
or -dev.2
prefixes even:
version = /((\d+\.\d+(\.\d+)+(-\w+(?:\.\w+)*)*/
You can of course preceed this regex with more concrete context matches. If for example you were to use meta data comments:
version = ~ ^\h* [/#*]+ \h*version:\h* (\d+(?:\.\d+)+[-.\w]+) ~mix
Extracting a Changelog summary is more difficult. If you want to eschew manual release submissions on freshcode.club you may wish to adopt a coherent README or CHANGELOG scheme.
For example I use a history\n------\n
marker in the README, where it's easy to match the pre-summarized changes:
changes = /history\R-----+\R+[\d.]+\R(.+?)\R\R/s
The \R
is a linebreak placeholder (all CR, LF, CRLF variants), and \R\R
hence an empty line.
For the changes
field any -
or #
and *
at the start of lines get stripped, btw.
You still ought to keep the changelog in an end-user approachable writing style.
hidden releases
If you can't uncover a suitable source for $changes=
then your automated release submission will be classified as hidden. Thus the project entry will stay current, but no frontpage listing (or notification) will occur.
The regex module will also likely be rate limited, so won't rescan your website daily.