- File html2mallard/README.md — part of check-in [ada19bd287] at 2021-03-26 12:13:35 on branch trunk — html2mallard update: support direct .md conversion, and http:// url params, doc updates. (user: mario size: 4718)
html2mallard / mkdocs-mallard
It's a very basic regex extraction (→I'm looking forward to your letters!) and filtering process. It only retains some structural elements (headlines, paragraphs, tables, lists, notes). Doesn't even attempt to gather any topic relation/structure from the navigation list.
- Really just intended for one-time/initial conversion.
- Requires some editing to get pages to validate. (Though they probably "work" in yelp as is).
- Links and image references certainly require manual cleanup. Nested lists or tables are likely to cause issues.
- And API docs are least convertible (only tested mkdocstrings, source dump is omitted, and there's obviously no syntax colorization in yelp; alternatively try mkgendocs).
- Primarily designed for mkdocs´ HTML output. But also contains some cleanup rules for fossil wiki pages (with github skin), and yelp-builds` html.
- Conversion doesn't work well for sphinx output (not consistent enough), nor GitHub wiki pages.
Simple command line tool to convert a single .html file:
html2mallard site/index.html > help/index.page
--debug flag after the filename for details on the shortening
html2mallard in.html --debug | xmllint - --recover > out.page
With xmllint to fix some unmatched tags.
Now also supports http:// urls for conversion:
html2mallard http://wiki/index.html > index.page
And directly converting from markdown:
html2mallard index.md > index.page
There's basically just one main function in html2mallard:
import html2mallard page = html2mallard.convert(html_file_content, fn)
The filename parameter is just used to deduce id and/or title from.
As convenience method there is also
page = html2mallard.convert_file(fn),
which would also automatically invoke
markdown conversion given such an
extension, or even resolve an url as parameter.
Converts a list of mkdocs output files to *.page files.
Requires an extra
mallard_dir in the
site_name: logfmt1 docs_dir: docs site_dir: html mallard_dir: mallard use_directory_urls: false nav: - Intro: index.md theme: name: readthedocs highlightjs: false repo_url: https://... markdown_extensions: - admonition - codehilite - attr_list - def_list - tables - markdown.extensions.codehilite: guess_lang: true plugins: - mkdocstrings
Also depends on
use_directory_urls: false, since the script only
one level of
index.page contains a section like:
<section id="nav" style="2column"> <subtitle>Topics</subtitle> </section>
But not the recursive self-reference
<link type="guide" xref="index#nav"/>.
The first two
rewrite rules likely require changes for other HTML sources
or templates. Specifically
"^.+?</nav>" should strip the initial
boilerplate, else might need expansion. (Either in the
GENERAL HTML or
a new rewrite collection.)
|compat||Python ≥3.6, mkdocs 1.x|
|compliancy||!pep8, mallard, manpage, !doap, !xdg|