you are viewing a single comment's thread.

view the rest of the comments →

[–]zyxzevn 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (1 child)

Great idea.
I have a html reader, but it is in Lazarus.
Likely there are many such libraries in Python.
The text can be extracted if it has a class="Text" in it or something.
In what format do you want to store it?

[–]JasonCarswell 1 insightful - 2 fun1 insightful - 1 fun2 insightful - 2 fun -  (0 children)

I don't know what this means or what formats are preferable or why. I just want the max data and highest resolution so that nothing is left behind when folks want it in the future.

Apparently there are open-source archival things that can webscrape/snapshot pages. That doesn't seem too difficult, but what do I know. The tricky part (to me) is having it automatically add to meta-table-lists on wikis and posting this on SaidIt (or other forums like Corbett Report, etc) that it's been archived after actually archiving it all and sharing it on IPFS.