you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

It looks like pushshift has all reddit data and even an API to get at it. I guess it's aimed at academic researchers (or Fed bot writers) and sounds kind of stable/has been around for a while edit: but they are also having funding problems. I don't know how this project exists, it seems like reddit copyright infringement. It could go away someday but I imagine there are some backups floating around or there would be if they folded.

https://files.pushshift.io/reddit/submissions/

https://www.reddit.com/r/pushshift/

So the data isn't really usable or browseable right now. I could see writing a little script that gets API data and spits out html pages for each post, and then it could make an index page with titles and links off to the posts. Then it could be hosted and accessed on github like a normal website. So it would be like an offline friendly reddit mirror. But it all kind of falls apart at wanting to search for posts in this archive viewer. If you get into needing a server to use the damn thing it's a huge burden.

Maybe another idea would be to just download all of the pushshift data for /r/conspiracy only, maybe a couple of others, and make a torrents out of them. It would be less work, especially if no one really cares- yet.