PoshCode Archive: Artifact Content

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID:	5a9298e9633e287321907729c53589675e9e3f19eb0ca240378a8a2cb4305d77
Page Name:	scrape-script
Date:	2018-07-12 14:01:08
Original User:	mario
Mimetype:	text/x-markdown
Parent:	3cf84fa5c6b322d8d577643aa44870043227b5c64ef3e457dd0f13e9f0b022ef (diff)
Next	b1dc5e007fd1bd0a29b2202b61d8ad58190c810cce92cdb099ceaea837c6615e

Content

ls2url.php firstly generates an URL list from the IA search API to only retrieve the interesting content pages
With the ex.php extraction script converting from src/* to target/* and populating an open fossil repo right away

Now, I wouldn't recommend doing this again. It takes hours at least. (Mostly due to the wget delay, of course.)

If you want to set up your own instance, either download the /tarball. Or just clone the repo:

 fossil clone http://fossil.include-once.org/poshcode/ mypc.fsl
 fossil open mypc.fsl
 fossil ui

(I realize some people might be upset because of the embedded meta data coments. But those are clearly easier removed than added.)

PoshCode Archive Update of "scrape-script"