all 9 comments

[–]d3rr 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

Nice. Let me know if you want to mod here man.

[–]useless_aether[S] 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (0 children)

:-) cool!

[–]d3rr 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (6 children)

Hey man this banout stuff has me worried that /r/conspiracy might get the axe sooner rather than later. Do you think it would be worthwhile to try to back it up? I know there's some great posts on there from the last decade that eveyone can learn from.

We could at least get the top 1000 posts from all time, and maybe more with keywords using reddit search. Maybe getting self posts only would help thin it out a bit.

Maybe there's something already out there. I don't know how much historical data ceddit and the like store.

[–]useless_aether[S] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (5 children)

i am game, but have no idea how to go about it. also, i wont be near a computer for a few hours today.

[–]d3rr 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (4 children)

Cool. I have no idea how to do it either but I wanted to see if it's worthwhile before spending time on it. I'll dig around for a workable approach.

[–]useless_aether[S] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (3 children)

idea: maybe ask a mod there, like axolotl_peyotl. he seems to be into this kind of stuff (archiving)

edit: well, i just asked the sub directly.

[–]d3rr 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (2 children)

Nice! It looks like you didn't get much of a response about the backup part. I know I've seen a few github projects that were about "archiving reddit", so let me see what I can find.


[–]useless_aether[S] 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (1 child)

literally not a single person! XD

i looked at this thread and tried a few things, not too vigorously, and cant see an easy way to do it atm, but will try later

[–]d3rr 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

It looks like pushshift has all reddit data and even an API to get at it. I guess it's aimed at academic researchers (or Fed bot writers) and sounds kind of stable/has been around for a while edit: but they are also having funding problems. I don't know how this project exists, it seems like reddit copyright infringement. It could go away someday but I imagine there are some backups floating around or there would be if they folded.

So the data isn't really usable or browseable right now. I could see writing a little script that gets API data and spits out html pages for each post, and then it could make an index page with titles and links off to the posts. Then it could be hosted and accessed on github like a normal website. So it would be like an offline friendly reddit mirror. But it all kind of falls apart at wanting to search for posts in this archive viewer. If you get into needing a server to use the damn thing it's a huge burden.

Maybe another idea would be to just download all of the pushshift data for /r/conspiracy only, maybe a couple of others, and make a torrents out of them. It would be less work, especially if no one really cares- yet.