⌈⌋ ⎇ branch:  freshcode


Update of "AutoupdateRegex"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 3eb5337e1efb4d324416ed777a64c2bc79a88095
Page Name:AutoupdateRegex
Date: 2014-08-03 17:27:36
Original User: mario
Mimetype:text/x-markdown
Parent: 0ec51a2245a10bb50355e5b225c400159e142b9e (diff)
Next 82faeace153aa7ffac7709f4425198f4ee24d76a
Content

birdy The Autoupdate "regex" module is the most versatile for updating release infos. Besides RegExp matching (for text sources), it also supports XPath and jQuery-style selections now, making it more suitable for HTML project websites.

See also Dr. Changelog for trying it out.

Field Rules

It can be configured in the Autoupdate Rules/Regex project field, where it expects a list of key = ... entries. Each key can list an URL, one or more RegExp, XPath or jQuery expressions.

version = http://example.com/download.html
version = /(\d+\.\d+(\.\d+)+)/

changes = http://example.com/news.html
changes = $("#main .release div.current")
changes = /Summary:\s*(.+?)\R\R/smix

scope = ~((minor|major) (bugfix|cleanup|security))~
state = ~(stable|beta|prerelease)~i
download = $("a.download").attr("href")

It will not update general project descriptions, but only version= and changes= or optionally scope=, state= and download=.

  • URLs should preceed the extraction expressions.
  • For regex rules the first capture group [1] will be used as result.
  • All regex flags /Umixus are allowed, and a special /* match-all, but not /e of course.
  • Use line breaks to separate rule assignments. Comments in between will effectively be ignored.
  • Xpath expressions for example take the form changes = (//ul)[1]/li
  • jQuery-style selectors can chain $("div").find("#first") multiple selector functions, but not JavaScript expressions of course.
  • Field names may be preceeded by $ or % as in $version = /([\d.]+)/.

URL sources

Initially the primary Autoupdate URL is used as source for extraction. It's equivalent to listing an URL for version =. Each subsequent field extraction will reuse the lastly retrieved page. Like-named URL entries in Other URLs will also be recognized.

Regex multi-match /* flag

There's a special regex flag /* for a preg_match_all mode. It's used by the listing for the Linux kernel (which is a git log) for instance:

changes = /^Date:.+\R\R\s+(.+)\s+[ ]commit/m*

Here multiple occurences will be found, and merged into a changelog list.

Slicing

Oftentimes it's simpler to just narrow down the extraction area however. Therefore repeating key=/regex/ specifiers often is useful:

changes = /Changelog(.+?)\Z/s
changes = /(.+)---/

It's sometimes sensible to mix XPath/jQuery extractions first and a regex thereafter to cut out the actual result:

version = $("article h4")
version = ~Version ([\d.]+)~

Matching rules thus iteratively isolate the field to be populated.

jQuery-style selector chaining

Often it suffices to call the main $() CSS selector function. And one could again use multiple slicing rules, but many jQuery-style subfunctions can be chained in one line:

changes = $(".article .first").next().find("li")

XPath and jQuery selectors however cannot wrap around linebreaks. (Unlike RegExps with the /x flag.)

Examples

If you use semantic versioning, then you can keep the \d+.\d+.\d+ version= field. To allow for -beta or -dev.2 prefixes even:

version = /((\d+\.\d+(\.\d+)+(-\w+(?:\.\w+)*)*/

You can of course preceed this regex with more concrete context matches. If for example you were to use meta data comments:

version = ~^\s*(?:#|//|*)\s*version:\s*(\d+(?:\.\d+)+[-.\w]+)~mi

Extracting a Changelog summary is more difficult. If you want to eschew manual release submissions on freshcode.club you may wish to adopt a coherent README or CHANGELOG scheme.

For example I use a history\n------\n marker in the README, where it's easy to match the pre-summarized changes:

changes = /history\R-----+\R+[\d.]+\R(.+?)\R\R/s

The \R is a linebreak placeholder (all CR, LF, CRLF variants), and \R\R hence an empty line.

For the changes field any - or # and * at the start of lines get stripped, btw.

You still ought to keep the changelog in an end-user approachable writing style.

hidden releases

If you can't uncover a suitable source for $changes= then your automated release submission will be classified as hidden. Thus the project entry will stay current, but no frontpage listing (or notification) will occur.

The regex module will also likely be rate limited, so won't rescan your website daily.