you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (4 children)

Cool. I have no idea how to do it either but I wanted to see if it's worthwhile before spending time on it. I'll dig around for a workable approach.

[–]useless_aether[S] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (3 children)

idea: maybe ask a mod there, like axolotl_peyotl. he seems to be into this kind of stuff (archiving)

edit: well, i just asked the sub directly.

https://old.reddit.com/r/conspiracy/comments/9fhf90/what_if_rconspiracy_gets_banhammered_we_should/

[–][deleted] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (2 children)

Nice! It looks like you didn't get much of a response about the backup part. I know I've seen a few github projects that were about "archiving reddit", so let me see what I can find.

eddit:

https://www.reddit.com/r/DataHoarder/comments/8638o2/anyway_to_backup_an_entire_subreddit/

[–]useless_aether[S] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (1 child)

literally not a single person! XD

i looked at this thread and tried a few things, not too vigorously, and cant see an easy way to do it atm, but will try later

[–][deleted] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

It looks like pushshift has all reddit data and even an API to get at it. I guess it's aimed at academic researchers (or Fed bot writers) and sounds kind of stable/has been around for a while edit: but they are also having funding problems. I don't know how this project exists, it seems like reddit copyright infringement. It could go away someday but I imagine there are some backups floating around or there would be if they folded.

https://files.pushshift.io/reddit/submissions/

https://www.reddit.com/r/pushshift/

So the data isn't really usable or browseable right now. I could see writing a little script that gets API data and spits out html pages for each post, and then it could make an index page with titles and links off to the posts. Then it could be hosted and accessed on github like a normal website. So it would be like an offline friendly reddit mirror. But it all kind of falls apart at wanting to search for posts in this archive viewer. If you get into needing a server to use the damn thing it's a huge burden.

Maybe another idea would be to just download all of the pushshift data for /r/conspiracy only, maybe a couple of others, and make a torrents out of them. It would be less work, especially if no one really cares- yet.