Artifact [50cdb787c7]
Artifact 50cdb787c7cba3e2154dca63f1f7bb997bc925f717ddfc63e0165d76e2ca838c:
- File html2mallard/README.md — part of check-in [ada19bd287] at 2021-03-26 12:13:35 on branch trunk — html2mallard update: support direct .md conversion, and http:// url params, doc updates. (user: mario size: 4718)
html2mallard / mkdocs-mallard
Extremely crude HTML to mallard help conversion. Specifically for output from mkdocs with RTD or Material theme.
It's a very basic regex extraction (→I'm looking forward to your letters!) and filtering process. It only retains some structural elements (headlines, paragraphs, tables, lists, notes). Doesn't even attempt to gather any topic relation/structure from the navigation list.
- Really just intended for one-time/initial conversion.
- Requires some editing to get pages to validate. (Though they probably "work" in yelp as is).
- Links and image references certainly require manual cleanup. Nested lists or tables are likely to cause issues.
- And API docs are least convertible (only tested mkdocstrings, source dump is omitted, and there's obviously no syntax colorization in yelp; alternatively try mkgendocs).
- Primarily designed for mkdocs´ HTML output. But also contains some cleanup rules for fossil wiki pages (with github skin), and yelp-builds` html.
- Conversion doesn't work well for sphinx output (not consistent enough), nor GitHub wiki pages.
html2mallard
Simple command line tool to convert a single .html file:
html2mallard site/index.html > help/index.page
Add a -d
/--debug
flag after the filename for details on the shortening
process.
html2mallard in.html --debug | xmllint - --recover > out.page
With xmllint to fix some unmatched tags.
Now also supports http:// urls for conversion:
html2mallard http://wiki/index.html > index.page
And directly converting from markdown:
html2mallard index.md > index.page
API
There's basically just one main function in html2mallard:
import html2mallard
page = html2mallard.convert(html_file_content, fn)
The filename parameter is just used to deduce id and/or title from.
As convenience method there is also page = html2mallard.convert_file(fn)
,
which would also automatically invoke markdown
conversion given such an
extension, or even resolve an url as parameter.
mkdocs-mallard
Converts a list of mkdocs output files to *.page files.
mkdocs-mallard
Requires an extra mallard_dir
in the mkdocs.yml
config:
site_name: logfmt1
docs_dir: docs
site_dir: html
mallard_dir: mallard
use_directory_urls: false
nav:
- Intro: index.md
theme:
name: readthedocs
highlightjs: false
repo_url: https://...
markdown_extensions:
- admonition
- codehilite
- attr_list
- def_list
- tables
- markdown.extensions.codehilite:
guess_lang: true
plugins:
- mkdocstrings
Also depends on use_directory_urls: false
, since the script only glob()
s
one level of *.html
files.
Nav links
Ensure the index.page
contains a section like:
<section id="nav" style="2column">
<subtitle>Topics</subtitle>
</section>
But not the recursive self-reference <link type="guide" xref="index#nav"/>
.
Adaption
The first two rewrite
rules likely require changes for other HTML sources
or templates. Specifically "^.+?</nav>"
should strip the initial
boilerplate, else might need expansion. (Either in the GENERAL HTML
or
a new rewrite collection.)
from project
import meta
meta | info |
---|---|
depends | - |
compat | Python ≥3.6, mkdocs 1.x |
compliancy | !pep8, mallard, manpage, !doap, !xdg |
system usage | - |
paths | - |
testing | - |
docs | - |
activity | abandoned |
state | alpha |
support | - |
contrib | - |
announce | - |