PoshCode Archive: Artifact Content

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID:	3cf84fa5c6b322d8d577643aa44870043227b5c64ef3e457dd0f13e9f0b022ef
Page Name:	scrape-script
Date:	2018-07-12 14:00:14
Original User:	mario
Mimetype:	text/x-markdown
Parent:	c78295372cb0597f10a64f23da25eb9deeb04fb9828b732d24595be2cd250691 (diff)
Next	5a9298e9633e287321907729c53589675e9e3f19eb0ca240378a8a2cb4305d77

Content

This firstly generates an URL list from the IA search API to only retrieve the interesting content pages
With the extraction script converting from src/* to target/* and populating an open fossil repo right away

Now, I wouldn't recommend doing this again. It takes hours at least. (Mostly due to the wget delay, of course.)

If you want to set up your own instance, either download the /tarball. Or just clone the repo:

 fossil clone http://fossil.include-once.org/poshcode/ mypc.fsl
 fossil open mypc.fsl
 fossil ui

(I realize some people might be upset because of the embedded meta data coments. But those are clearly easier removed than added.)

PoshCode Archive Update of "scrape-script"