The Autoupdate "regex" module is the most versatile for collecting release infos from project pages. Besides RegExp matching (for text sources), it also supports XPath and jQuery-style selections now, which ease HTML project website scraping.
See also Dr. Changelog for trying it out.
It can be configured in the Autoupdate Rules/Regex project field, where it expects a list of
key = ... entries. Each key can list an URL, one or more RegExp, XPath or jQuery expressions.
version = http://example.com/download.html version = /(\d+\.\d+(\.\d+)+)/ changes = http://example.com/news.html changes = $("#main .release div.current") changes = /Summary:\s*(.+?)\R\R/smix scope = ~((minor|major) (bugfix|cleanup|security))~ state = ~(stable|beta|prerelease)~i download = $("a.download").attr("href")
It will not update general project descriptions, but only
changes= or optionally
- URLs should preceed the extraction expressions.
- For regex rules the first capture group
(..)will be used as result.
- All regex flags
/Umixusare allowed, and a special
/*match-all flag is provided.
- Use line breaks to separate rule assignments. Comments in between will effectively be ignored.
- Xpath expressions for example take the form
changes = (//ul)/li
- jQuery-style selectors can chain
$("div").find("#first")multiple selector functions.
- Field/key names may be prefixed with
$version = /([\d.]+)/.
Initially the primary Autoupdate URL is used as source for extraction. It's equivalent to listing an URL for
version =. Each subsequent field extraction will reuse the lastly retrieved page. Like-named URL entries in Other URLs will also be recognized.
Regex multi-match /* flag
There's a special regex flag
/* for a
preg_match_all mode. It's used by the listing for the Linux kernel (which is a git log) for instance:
changes = /^Date:.+\R\R\s+(.+)\s+[ ]commit/m*
Here multiple occurences will be found, and merged into a changelog list. (So it's somewhat like the
Oftentimes it's simpler to just narrow down the extraction area however. Therefore repeating
key=/regex/ specifiers often is useful:
changes = /Changelog(.+?)\Z/s changes = /(.+)---/
It's sometimes sensible to mix XPath/jQuery extractions first and a regex thereafter to cut out the actual result:
version = $("article h4") version = ~Version ([\d.]+)~
Matching rules thus iteratively isolate the field to be populated.
jQuery-style selector chaining
Often it suffices to call the main
$() CSS selector function. And one could again use multiple slicing rules, but many jQuery-style subfunctions can be chained in one line:
changes = $(".article .first").next().find("li")
XPath and jQuery rule assignments can only be single-line directives. (Unlike RegExps with the /x flag, which can wrap around linebreaks.)
See regular-expressions.info for a simple RegExp introduction. Otherwise check out jQ & CSS selectors and the w3.org spec or jQuery pseudo selectors for CSS selectors. And the XPath / Selenium cheat sheet or an Xpath/Regex overview for XPath examples.
If you use semantic versioning, then you can keep the
\d+.\d+.\d+ version= field. To allow for
-dev.2 prefixes even:
version = /((\d+\.\d+(\.\d+)+(-\w+(?:\.\w+)*)*/
You can of course preceed this regex with more concrete context matches. If for example you were to use meta data comments:
version = ~ ^\h* [/#*]+ \h*version:\h* (\d+(?:\.\d+)+[-.\w]+) ~mix
Extracting a Changelog summary is more difficult. If you want to eschew manual release submissions on freshcode.club you may wish to adopt a coherent README or CHANGELOG scheme.
For example I use a
history\n------\n marker in the README, where it's easy to match the pre-summarized changes:
changes = /history\R-----+\R+[\d.]+\R(.+?)\R\R/s
\R is a linebreak placeholder (all CR, LF, CRLF variants), and
\R\R hence an empty line.
changes field any
* at the start of lines get stripped, btw.
You still ought to keep the changelog in an end-user approachable writing style.
If you can't uncover a suitable source for
$changes= then your automated release submission will be classified as hidden. Thus the project entry will stay current, but no frontpage listing (or notification) will occur.
The regex module will also likely be rate limited, so won't rescan your website daily.
All Autoupdate modules additionally support the
interval = 7 rule; the number specifying a minimum amount of days before any new release lookup is attempted.