Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Artifact ID: | 5a9298e9633e287321907729c53589675e9e3f19eb0ca240378a8a2cb4305d77 |
---|---|
Page Name: | scrape-script |
Date: | 2018-07-12 14:01:08 |
Original User: | mario |
Mimetype: | text/x-markdown |
Parent: | 3cf84fa5c6b322d8d577643aa44870043227b5c64ef3e457dd0f13e9f0b022ef (diff) |
Next | b1dc5e007fd1bd0a29b2202b61d8ad58190c810cce92cdb099ceaea837c6615e |
Content
scrape script
- ls2url.php firstly generates an URL list from the IA search API to only retrieve the interesting content pages
- With the ex.php extraction script converting from src/* to target/* and populating an open fossil repo right away
Now, I wouldn't recommend doing this again. It takes hours at least. (Mostly due to the wget delay, of course.)
If you want to set up your own instance, either download the /tarball. Or just clone the repo:
fossil clone http://fossil.include-once.org/poshcode/ mypc.fsl
fossil open mypc.fsl
fossil ui
(I realize some people might be upset because of the embedded meta data coments. But those are clearly easier removed than added.)