you are viewing a single comment's thread.

view the rest of the comments →

[–]d3rr 3 insightful - 1 funny3 insightful - 0 funny4 insightful - 1 funny -  (4 children)

yeah wow, you guys are seriously organized.

[–]pitterpatterwater[S] 4 insightful - 1 funny4 insightful - 0 funny5 insightful - 1 funny -  (3 children)

/u/snallygaster deserves the credit; I just became an approved submitter fairly recently, he's the one who maintains the list and posted most of the linked stuff.

Anyways, I'm thinking a python script would be sufficient. Problem is that it's nearly 300 posts; I need a method which won't use up my bandwidth downloading it, aka I need to get famillair with website scraping and the Reddit API.

[–]d3rr 2 insightful - 1 funny2 insightful - 0 funny3 insightful - 1 funny -  (2 children)

in Python world i recommend Beautiful Soup for scraping and I'd put a delay in there or they will block your ip. Sounds like a fun project. I'd help but I'm overwhelmed with this site already.

[–]pitterpatterwater[S] 2 insightful - 1 funny2 insightful - 0 funny3 insightful - 1 funny -  (1 child)

I'm a bit busy myself; I'll post it here once I'm done with it. Can probably be generalised to a reddit archival tool. Do you know what the delay should be?

[–]d3rr 1 insightful - 1 funny1 insightful - 0 funny2 insightful - 1 funny -  (0 children)

if you are scraping at your leisure, I'd put it high, like a random 30 seconds to 2 mins between requests.

yeah man throw her up on github it could prove very useful to a lot of people.