PoshCode Archive  Artifact [5a9298e963]

Artifact 5a9298e9633e287321907729c53589675e9e3f19eb0ca240378a8a2cb4305d77:

Wiki page [scrape-script] by mario 2018-07-12 14:01:08.
D 2018-07-12T14:01:08.935
L scrape-script
N text/x-markdown
P 3cf84fa5c6b322d8d577643aa44870043227b5c64ef3e457dd0f13e9f0b022ef
U mario
W 749
### scrape script

 * **ls2url.php** firstly generates an URL list from the IA search API to only retrieve the interesting content pages
 * With the **ex.php** extraction script converting from src/* to target/* and populating an open fossil repo right away

Now, I wouldn't recommend doing this again. It takes hours at least. (Mostly due to the wget delay, of course.)

If you want to set up your own instance, either download the [/tarball](/tarball] or [/zip](/zip). Or just clone the repo:

     fossil clone http://fossil.include-once.org/poshcode/ mypc.fsl
     fossil open mypc.fsl
     fossil ui

(I realize some people might be upset because of the embedded meta data coments. But those are clearly easier removed than added.)
Z 2d1e5c009dab5ef06d73db98af3a463c