AutoupdateRegex
The Autoupdate "regex" module is the most versatile for collecting release infos from project pages. Besides RegExp matching (for text sources), it also supports XPath and jQuery-style selections now, which ease HTML project website scraping.
See also Dr. Changelog for trying it out.
Field Rules
It can be configured in the Autoupdate Rules/Regex project field, where it expects a list of key = ...
entries. Each key can list an URL, one or more RegExp, XPath or jQuery expressions.
version = http://example.com/download.html
version = /(\d+\.\d+(\.\d+)+)/
changes = http://example.com/news.html
changes = $("#main .release div.current")
changes = /Summary:\s*(.+?)\R\R/smix
scope = ~((minor|major) (bugfix|cleanup|security))~
state = ~(stable|beta|prerelease)~i
download = $("a.download").attr("href")
It will not update general project descriptions, but only version=
and changes=
or optionally scope=
, state=
and download=
.
- URLs should preceed the extraction expressions.
- For regex rules the first capture group
(..)
will be used as result. - All regex flags
/Umixus
are allowed, and a special/*
match-all flag is provided. - Use line breaks to separate rule assignments. Comments in between will effectively be ignored.
- Xpath expressions for example take the form
changes = (//ul)[1]/li
- jQuery-style selectors can chain
$("div").find("#first")
multiple selector functions. - Field/key names may be prefixed with
$
or%
as in$version = /([\d.]+)/
.
URL sources
Initially the primary Autoupdate URL is used as source for extraction. It's equivalent to listing an URL for version =
. Each subsequent field extraction will reuse the lastly retrieved page. Like-named URL entries in Other URLs will also be recognized.
Regex multi-match /* flag
There's a special regex flag /*
for a preg_match_all
mode. It's used by the listing for the Linux kernel (which is a git log) for instance:
changes = /^Date:.+\R\R\s+(.+)\s+[ ]commit/m*
Here multiple occurences will be found, and merged into a changelog list. (So it's somewhat like the /g
flag in JavaScript.)
Slicing
Oftentimes it's simpler to just narrow down the extraction area however. Therefore repeating key=/regex/
specifiers often is useful:
changes = /Changelog(.+?)\Z/s
changes = /(.+)---/
It's sometimes sensible to mix XPath/jQuery extractions first and a regex thereafter to cut out the actual result:
version = $("article h4")
version = ~Version ([\d.]+)~
Matching rules thus iteratively isolate the field to be populated.
jQuery-style selector chaining
Often it suffices to call the main $()
CSS selector function. And one could again use multiple slicing rules, but many jQuery-style subfunctions can be chained in one line:
changes = $(".article .first").next().find("li")
XPath and jQuery rule assignments can only be single-line directives. (Unlike RegExps with the /x flag, which can wrap around linebreaks.)
References
See regular-expressions.info for a simple RegExp introduction. Otherwise check out jQ & CSS selectors and the w3.org spec or jQuery pseudo selectors for CSS selectors. And the XPath / Selenium cheat sheet or an Xpath/Regex overview for XPath examples.
Examples Regex
If you use semantic versioning, then you can keep the \d+.\d+.\d+
version= field. To allow for -beta
or -dev.2
prefixes even:
version = /((\d+\.\d+(\.\d+)+(-\w+(?:\.\w+)*)*/
You can of course preceed this regex with more concrete context matches. If for example you were to use meta data comments:
version = ~ ^\h* [/#*]+ \h*version:\h* (\d+(?:\.\d+)+[-.\w]+) ~mix
Extracting a Changelog summary is more difficult. If you want to eschew manual release submissions on freshcode.club you may wish to adopt a coherent README or CHANGELOG scheme.
For example I use a history\n------\n
marker in the README, where it's easy to match the pre-summarized changes:
changes = /history\R-----+\R+[\d.]+\R(.+?)\R\R/s
The \R
is a linebreak placeholder (all CR, LF, CRLF variants), and \R\R
hence an empty line.
For the changes
field any -
or #
and *
at the start of lines get stripped, btw.
You still ought to keep the changelog in an end-user approachable writing style.
hidden releases
If you can't uncover a suitable source for $changes=
then your automated release submission will be classified as hidden. Thus the project entry will stay current, but no frontpage listing (or notification) will occur.
The regex module will also likely be rate limited, so won't rescan your website daily.
interval= rule
All Autoupdate modules additionally support the interval = 7
rule; the number specifying a minimum amount of days before any new release lookup is attempted.