use the following search parameters to narrow your results:
e.g. subreddit:pics site:imgur.com dog
subreddit:pics site:imgur.com dog
advanced search: by author, sub...
~2 users here now
How to backup your favorite banned Reddit content
submitted 2 years ago * by [deleted] from self.MeanwhileOnReddit
Most Reddit data is inexplicably available through this third party API: https://github.com/pushshift/api
There are pre-written tools to grab content from the Pushshift API such as this one, written by yours truly: https://github.com/libertysoft3/reddit-html-archiver
To see if the content you want is available, construct a url like the following, where we are checking for /r/darkhumourandmemes posts:
In the future it may be possible to import backed up/archived content to sites like SaidIt.
[–]Aureus 10 insightful - 3 fun10 insightful - 2 fun11 insightful - 2 fun11 insightful - 3 fun - 2 years ago (0 children)
Thanks so much! This is really important and a good guide.
[–]PuttItBack 6 insightful - 5 fun6 insightful - 4 fun7 insightful - 4 fun7 insightful - 5 fun - 2 years ago (0 children)
Wish I'd seen this last night when I was backing things up before the ban lol.
[–]holy_goat 6 insightful - 2 fun6 insightful - 1 fun7 insightful - 1 fun7 insightful - 2 fun - 2 years ago (0 children)
this could be useful for moving content over to this and/or ruqqus. keep a good backlog so more people will join.
[–]stupidmechanic 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 2 fun5 insightful - 3 fun - 2 years ago* (5 children)
fetch_links.py fails to run; says there's a syntax error in psaw/PushshiftAPI.py line 251
Traceback (most recent call last):
File "./fetch_links.py", line 10, in <module>
from psaw import PushshiftAPI
File "/usr/lib/python2.7/dist-packages/psaw/__init__.py", line 7, in <module>
from .PushshiftAPI import PushshiftAPI, PushshiftAPIMinimal
File "/usr/lib/python2.7/dist-packages/psaw/PushshiftAPI.py", line 251
SyntaxError: 'return' with argument inside generator
Anyone getting the same problem?
[–][deleted] 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 2 fun5 insightful - 3 fun - 2 years ago (3 children)
or how about running
pip install -U pip
pip install psaw --upgrade
it looks like your line 251 is out of date? https://github.com/dmarx/psaw/blob/master/psaw/PushshiftAPI.py#L251
edit: I think that's it, you have psaw 0.0.10 from 3/10 notpsaw 0.0.12 from 3/18, hopefully they fixed this bug https://github.com/dmarx/psaw/blob/f8609bcc9dc945c2c1b0e732ac8dba25f19545fc/psaw/PushshiftAPI.py#L251
[–]stupidmechanic 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 2 fun5 insightful - 3 fun - 2 years ago (2 children)
In reply to
"hmmm I haven't seen this one before, and it's not an issue with the archiver code.
output? maybe try running it with python3 like
God, I'm an idiot!
made this error go away. Now going ahead. Need to set python3 as default interpreter instead of python2.7. Besides, I had to manually copy-paste the *.py files into my compilers, maybe that has something to do with this. Thanks man.
[–][deleted] 5 insightful - 2 fun5 insightful - 1 fun6 insightful - 1 fun6 insightful - 2 fun - 2 years ago (1 child)
cool, i was afraid it was bad advice so i deleted. im not too good at python stuff. I will add this advice to this other troubleshooting advice: https://github.com/libertysoft3/reddit-html-archiver/issues/18
[–]stupidmechanic 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (0 children)
Yes indeed, it seems to be a bad psaw installation. 0.0.10
[–]1nvar 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (0 children)
Pushshift has a mostly complete copy of the last 5 years of reddit, it's how https://redditsearch.io/ works! Thanks, archivists :-)
[–]sosorreal 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 2 fun4 insightful - 3 fun - 2 years ago (2 children)
is there a way to get our Saved content from subs that are now banned?
[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 1 fun3 insightful - 2 fun - 2 years ago (1 child)
The tool I linked doesn't interact with reddit directly, so not with that. But using the reddit API you may be able to still get ids of your saved content, and then the tool I linked could download it. So nothing easy exists.
[–]sosorreal 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (0 children)
hmm okay, thank you for the information
[–]JasonCarswell 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 2 fun4 insightful - 3 fun - 2 years ago (3 children)
I thought your brain was sexy before, but now it's exponentially prodigious.
Most of it is Greek to me, but I have some questions:
Is it possible to reupload old content into SaidIt/NotABug/etc that uses its own old datestamp rather than being "new"? If this were possible, that would be terrific for those who've backed up yet feel like they've lost their communities. If it's not possible then reposting it all anew could be painful for the rest of us unless there were limits. Which brings me to...
Is it possible to make a bot to auto reup or repost an archive? And is it possible for that bot to skip over /s/all and/or use the old datestamp? And is this bot-post-frequency a conversation worth having in /s/SaidItBots regarding the frequency of posts for not only stuff like this but in general and/or specifically about other topics/issues. Of course it's also simply easy to leave it until it becomes an issue - if ever.
Regarding the archive votes, IMO, that's interesting information that's potentially worth saving. As you import the old posts as reups or reposts I'm guessing the SaidIt votes might start from zero. It might be nice for that import-bot to make a note of the votes in the comments - or if you want to get fancy make a new "untouchable" RAI (Reddit Archive Import) vote section that displays the frozen score with a tally that can only be adjusted if more information comes from further import information (ie. if they are importing from a not-yet-banned-sub and/or a more recent archive).
Is there any way to validate the archives and importing? I wouldn't expect forgeries or tampering right away, but I suspect that eventually there could be meddling if there was a motive and a way. Sad but true.
Federating SaidIt is the first big step in what I believe is the most important goal - decentralization. In my limited non-tech savvy opinion, it seems to me that sharing backup(s) of the archive(s) (via torrent?) is the next most important step. I don't know how related this kind of large-scale archiving is comparative to isolated sub-archiving and/or if they can be imported in a similar manner - but it sure would be nice for anyone interested to be able to. I have no interest in creating my own server and instance unless it helps in my small way as backup, but I am very interested in helping to perpetuate archive backup torrents because I know that can help. (People are still asking me to share my old Tigole aggregations and anticipating my new ones that I haven't made in about a year.)
If relevant - maybe repost on /s/DecentralizeAllThings and/or add to that wiki?
There is no 7
[–][deleted] 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 1 fun4 insightful - 2 fun - 2 years ago (2 children)
Well hello to you too good looking. Compliments get replies!
[–]JasonCarswell 2 insightful - 3 fun2 insightful - 2 fun3 insightful - 2 fun3 insightful - 3 fun - 2 years ago* (1 child)
"Good looking"? Maybe once, a dozen years ago. Today I took the first selfie in a dozen years or more, [for reasons] perhaps as a before if there's an after. It's on a camera I have to plug in to see if any are actually good enough to share - so "good looking" remains to be seen - or not.
I didn't expect all answered, nor so quickly, but am glad you did, thanks. Regarding several of them, if demand calls for it, it might be worth supervising any bot development to be sure it's done properly and including the dates, votes, and whatever other metadata might be worth keeping as well as to be sure it doesn't jam up your system. I'm not a Redditor but I know there's some value there, and it seems some folks migrating are keen on it.
4) It occurred to me that there could be some fun creative reasons to forge a thread, though you wouldn't need to go so far as to import it. Specifically, 2 primo examples come to mind - the brilliant epic https://en.wikipedia.org/wiki/Les_Liaisons_dangereuses and clever and beautiful https://en.wikipedia.org/wiki/Griffin_and_Sabine both written as epistolary novels (correspondence letters between people explaining events and thus laying out the story for the reader).
5) I can easily understand why you may have been distracted by everything since the dawn of SaidIt, but IMO, to weaken the target on your backs and to strengthen the future of Saidit (and freedom for humanity) it seems like this one should be kicked up to be among the top priorities - especially since the 2020 gear shift. Who knows what other nonsense bullshit they may be planning. Things are bad but it seems certain they are going to get worse and stay worse forever unless they are outed with something decentralized they can't defeat. I hate to say it but if you or M7 were to be grabbed for some stupid mask infraction or whatever, intentionally or accidentally silencing SaidIt or leaving it rudderless, we'd have no way of knowing or helping or whatever. (I hope you guys have some backup plans, deadman switches, virtual emergency flares, etc.)
6) That is fun. If I recall, Lemmy had an epic fail data loss at one point? What's the other instance called? (Today I saw your Reddit alternatives list got censored.) I don't know what "real Reddit federation" means. I though you had one. Can they merge? This is exciting (I think). I read the federation news a few weeks ago (congrats) and meant to add it to the SaidIt article on IG, but I didn't understand parts of it and wanted to get more feedback/details from you two. Then the DDoS happened, then I got locked out for several days, then my end got fubar and I took all that as a sign to finally quit my SaidIt addiction which I was slowly getting around to (by making more banners and non-discussion stuff). I'm going to try to only come back at the ends and middles of every month until my project(s) are ready to share. I'll make exceptions and return to SaidIt for 3 things: 1) work on CSS and/or banners stuff (with your help when you're not busy) 2) work on the mobile SaidIt thing you'd mentioned (whenever you've got time for it, or can explain it and leave it with me, or whatever) or 3) if you guys are interested I could design a logo for your new federation under your guidance (something for the inevitable online store).
Also, what's new regarding SaidIt the last few weeks? I feel like I missed some things (I don't recall blue checkmarks and some other newish formatting). (I'm good news-wise, and know about the purges on Reddit and YouTube from elsewhere and I've only seen a bit of it here so far. I plan to catch up a bit while I share some stuff over the next few days before I return to my own non-SaidIt-addicted new normal.) PM if you like, or not, or ignore it if there's nothing grand since the DDoS (of which I'm more than a little curious about).
[–][deleted] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 0 fun2 insightful - 1 fun - 2 years ago (0 children)
Oh, one called Prismo had a data fail. Here's the first non lemmy owned lemmy instances:
no one talks about lemmy cuz they're commies, but it's good stuff.
Things have been good here, it's exciting to have some new users. There's nearly been some volunteer coders and designers too!
[–]bug-in-recovery 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 1 fun4 insightful - 2 fun - 2 years ago (7 children)
This guy already archived most good subs.
[–]spezz 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago* (3 children)
based what happend to the r/MDE archive
edit: well oy vey that was just too much wrongthink because it just got deleted
edit 2: nah he just put the wrong url, still grab it while you can
[–]bug-in-recovery 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (1 child)
I fucked up the link, still there:
[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 1 fun3 insightful - 2 fun - 2 years ago (0 children)
Github will host these archives for free, so they can be easily viewed without downloading. I'm trying to spread the word.
[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 1 fun3 insightful - 2 fun - 2 years ago* (0 children)
i tested that myself, pushshift does not have much of that data, not sure what happened. edit: im an idiot, i checked mde not milliondollarextreme
[–]holy_goat 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (2 children)
archive.org sniped it real quick.... also archive.org is in some potential legal trouble at the moment (literary publishers suing them) so we can't be confident they'll stay around.
[–]spezz 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 1 fun4 insightful - 2 fun - 2 years ago (0 children)
see if the-eye.eu will take them
[–]bug-in-recovery 3 insightful - 2 fun3 insightful - 1 fun4 insightful - 1 fun4 insightful - 2 fun - 2 years ago (0 children)
It's still there, I fucked up the link:
Apparently, their URLs are case sensitive.
[–]calmbluejay 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 1 fun3 insightful - 2 fun - 2 years ago (3 children)
I just get a blank page
[–][deleted] 2 insightful - 2 fun2 insightful - 1 fun3 insightful - 1 fun3 insightful - 2 fun - 2 years ago (2 children)
a blank page for a link like this? it should at least show '' https://api.pushshift.io/reddit/search/submission/?subreddit=darkhumourandmemes
[–]calmbluejay 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 1 fun5 insightful - 2 fun - 2 years ago (1 child)
This is what I get : https://i.imgur.com/lCZTsMG.png
[–][deleted] 3 insightful - 3 fun3 insightful - 2 fun4 insightful - 2 fun4 insightful - 3 fun - 2 years ago (0 children)
Okay, that looks good. Using this link is just a quick way to tell if the content you want is there in PushShift. So if the sub you are interested in returns anything in that 'data' array, like it did here, you can use the archive tool and it will work.