you are viewing a single comment's thread.

view the rest of the comments →

[–]d3rr 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (4 children)

yeah wow, you guys are seriously organized.

[–][deleted] 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (3 children)

/u/snallygaster deserves the credit; I just became an approved submitter fairly recently, he's the one who maintains the list and posted most of the linked stuff.

Anyways, I'm thinking a python script would be sufficient. Problem is that it's nearly 300 posts; I need a method which won't use up my bandwidth downloading it, aka I need to get famillair with website scraping and the Reddit API.

[–]d3rr 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (2 children)

in Python world i recommend Beautiful Soup for scraping and I'd put a delay in there or they will block your ip. Sounds like a fun project. I'd help but I'm overwhelmed with this site already.

[–][deleted] 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

I'm a bit busy myself; I'll post it here once I'm done with it. Can probably be generalised to a reddit archival tool. Do you know what the delay should be?

[–]d3rr 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

if you are scraping at your leisure, I'd put it high, like a random 30 seconds to 2 mins between requests.

yeah man throw her up on github it could prove very useful to a lot of people.