PoshCode Archive  scrape-script

scrape script

  • ls2url.php firstly generates an URL list from the IA search API to only retrieve the interesting content pages
  • With the ex.php extraction script converting from src/* to target/* and populating an open fossil repo right away

Now, I wouldn't recommend doing this again. It takes hours at least. (Mostly due to the wget delay, of course.)

If you want to set up your own instance, either download the /tarball or /zip. Or better yet clone the repo to get all revisions:

 fossil clone http://fossil.include-once.org/poshcode/ mypc.fsl
 fossil open mypc.fsl
 fossil ui

(I realize some people might be upset because of the embedded meta data coments. But those are clearly easier removed than added.)


Attachments:

  • finit [download] added by mario on 2018-07-12 13:56:45. [details]
  • block [download] added by mario on 2018-07-12 13:56:34. [details]
  • ex.php [download] added by mario on 2018-07-12 13:56:25. [details]
  • ls2url.php [download] added by mario on 2018-07-12 13:56:00. [details]