you are viewing a single comment's thread.

view the rest of the comments →

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (13 children)

with a fuckton of subs named "u{username}",

Don't suppose you could send a "string counter" program through to tell how many "u/" and "r/" strings there are in the database?

[–]Maniak🥃😾 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (12 children)

9,501,204 matches for lines starting with u_, which leaves 4,074,185 others.

If I remove the lines with a count of 1 (because those are clearly not 'actual' subs): 4,616,932 "u_*" and 3,037,203 others.

With a count of at least 10: 1,023,366 u_*, 1,277,125 others.

The number of 'real subs' drops fast.

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (2 children)

With that "u*" vs "u_*" mixup, might you need to rerun those numbers?

[–]Maniak🥃😾 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (1 child)

Nah, the mixup was on the sql side. These numbers were direct from the text file, where I hadn't forgotten to escape the underscore :)

[–]NetweaselContinuing the struggle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

Cool.

[–]NetweaselContinuing the struggle 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (1 child)

There's another thing in this, which is much more complicated...

As I understand it, the PushShift numbers are aggregate totals. If a subreddit blew up for a month and then died off two years ago, those huge numbers would still be sitting there.

If you could subtract the February numbers from the March numbers, you could get the March activity alone.
But, as I said, complicated.

[–]Maniak🥃😾 5 insightful - 2 fun5 insightful - 1 fun6 insightful - 2 fun -  (0 children)

Hence why the API is needed. Because I sure as shit am not going to download multiple multi-TB torrents and process them manually in order to do this :)

[–]NetweaselContinuing the struggle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (6 children)

4,074,185 others.

Now we're getting toward reasonable numbers......

Tougher database manipulation question, can you delete every "/u" line and port what's left to a different file?

If you can, then follow up with checking https://subredditstats.com/ on the 100,000th "/r."

It's called "chasing the lower bound." If you then check the 50,000th one, then the 20,000th one, then the 10,000th one... you'll probably see a great jump between two of them. The "lower bound" would probably be between those two.

I figure it would have to be below [a higher number than] 5000.

[–]Maniak🥃😾 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

Name             Count    Rank
growcastle      133361  #10000
menshealth       39784  #20000
panamacitybeach   6835  #50000
boners            1583 #100000

(counting only those that don't start with u_)

growcastle

panamacitybeach

menshealth

r/boners not found, so I went to the #99999:

bourbontrade same count (1583)

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (2 children)

Off to the stats page!

r/ boners (#100,000): "not found"
r/ panamacitybeach (#50,000): Subscribers -- 3,324 Comments Per Day -- 24 Posts Per Day -- 1
r/ menshealth (#20,000): Subscribers -- 11,170 Comments Per Day -- 12 Posts Per Day -- 5
r/ growcastle (#10,000): Subscribers -- 34,966 Comments Per Day -- 60 Posts Per Day -- 12

For comparison...

r/ WayoftheBern (<1000): Subscribers -- (That's odd. It does not show on that page. No matter.) 87,991.
Comments Per Day -- 93 Posts Per Day -- 17.

Hmm. Perhaps the blackout is skewing numbers. Maybe this should be checked next week.


Update: r/ WayoftheBern -- Comments in past 24 hours: Zero. Posts in past 24 hours: Zero.


From random archive 25 posts in 12 hours, and from memory 25 comments in less than two hours, usually.

[–]Maniak🥃😾 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (1 child)

Oh wait, I fucked up the query removing the u_, it removed all those starting with u :)

That'll teach me to go too fast.

So:

Name                   Count     Rank
wayofthebern         3492849      713
makeupflatlays        136923    10000
winnipeggonewild       41101    20000
chrisdeliauncensored    7059    50000
hl_women_only           1639    99999

wayofthebern makeupflatlays winnipeggonewild chrisdeliauncensored hl_women_only

[–]NetweaselContinuing the struggle 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

hl_women_only (not listed)/(not listed)/4
chrisdeliauncensored 2,139/8/1
winnipeggonewild 9,893/103/9
makeupflatlays 77,408/(not listed)/3

[–]Maniak🥃😾 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

I ended up importing it in a quick sql table because I was getting bored with doing regexes in notepad++ so... that opens up the queries :)

(then again it's only name + count, so the information is very limited)

[–]NetweaselContinuing the struggle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

it's only name + count, so the information is very limited)

That's what https://subredditstats.com/ is for. All you need is a name.