you are viewing a single comment's thread.

view the rest of the comments →

[–]LarrySwinger2 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 2 fun -  (5 children)

Note that youtube-dl can scrape entire playlists and channels. Be sure to archive as much as possible while you can. We have significant storage on the Cassandra server, so we can make them available there.

[–]JasonCarswell 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (4 children)

Actually we don't have "significant" storage, but we've got room for a bunch with a handful of terabytes and the ability to get more.

Separately, I've already archived 10x 4tb drives of YouTube stuff alone, on many topics, including lots of conspolitics - but those drives are not online, need some boxes to put them in, some UPS, PeerTube maybe, etc. (I have other drives with other stuff (books, music, TV, movies, documentaries, etc) that aren't really good for sharing due to copyright tyranny. I doubt my claim would stand that I'm not sharing them so much as using them for fair use sampling.)

If you could create some kind of archival thing on Cassandra or her sisters I hope to get, along with a tutorial for dummies, we could all cue stuff up (within moderation) and whatever.

/u/zyxzevn and /u/Robin had a great idea here:
/s/CorbettCommenters/comments/6hck/solution_wiki_for_corbett_report_and_others/

It would be nice to have bots that could go through all of SaidIt, CorbettReport, etc. and scrape pages and/or download all links to an archive while also making a wiki-table-list that can be sorted via topic/sub/hashtag, article/media date, shared date, article/media source, shared source, etc. That wiki-table-list can be mirrored on WikiSpooks, InfoGalactic, GiraffeIdeas.wiki, etc. The archived media could be shared via IPFS and not clog up the wikis unnecessarily. Of course the bot(s) would leave a comment behind saying the data has been backed up with links to the wiki lists, the IFPS info, and of course the archives.

[–]zyxzevn 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (1 child)

Great idea.
I have a html reader, but it is in Lazarus.
Likely there are many such libraries in Python.
The text can be extracted if it has a class="Text" in it or something.
In what format do you want to store it?

[–]JasonCarswell 1 insightful - 2 fun1 insightful - 1 fun2 insightful - 2 fun -  (0 children)

I don't know what this means or what formats are preferable or why. I just want the max data and highest resolution so that nothing is left behind when folks want it in the future.

Apparently there are open-source archival things that can webscrape/snapshot pages. That doesn't seem too difficult, but what do I know. The tricky part (to me) is having it automatically add to meta-table-lists on wikis and posting this on SaidIt (or other forums like Corbett Report, etc) that it's been archived after actually archiving it all and sharing it on IPFS.

[–]oakenwheels 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 2 fun -  (1 child)

How do you handle those HDD drives? Are they external HDDs or internal ones? Are they plugged into PC? If not, how do you keep them safe from being damage? Also, don't HDDs deteriorate with time? Do you get new HDDs every few years, rewriting data or not? Do you keep only 1 copy of each youtube video, with no backups?

[–]JasonCarswell 1 insightful - 2 fun1 insightful - 1 fun2 insightful - 2 fun -  (0 children)

They are all internal drive sitting in their anti-mag bags away from magnetic vaccinated people.

I wish I was rich. I'd buy and set up a bunch of boxes/servers with FreeNAS or the thing d3rr keeps telling me about that's even better.

If I could afford backups I'd get them. I'm doing the best I can.