you are viewing a single comment's thread.

view the rest of the comments →

[–]NetweaselContinuing the struggle 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (29 children)

7978 out of 8304 pledged at the moment, so the numbers may still be increasing.

But that's just the reporting on one twitch stream. There could be others going dark without registering their intent.

The problem with counting subscribers (which would be a lovely number to have) is overlap.

[–]3andfro 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (28 children)

Thanks. I had no idea the blackout covered such a large % of subs. 👍🏽

A call for a general labor strike wouldn't approach that level.

[–]NetweaselContinuing the struggle 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (27 children)

I had no idea the blackout covered such a large % of subs. 👍🏽

Well, "percentage of subs" can mean many things. That is a very high percentage of subs that said they were gonna do it doing it, but out of the millions of subreddits that currently exist, not so much.

Of all the subreddits with over 10,000 people in them, we don't know the percentage of that group.

[–]3andfro 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (26 children)

That was the Q: how many subs are there in total, even a ballpark figure.

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

The extra problem is this: If you divide subreddits into "functioning" and "non-functioning" subreddits, we do not know how many of the about 8000 dark subreddits are on which side of the line.

You could assume that the 8000 are all in the "only 7,500 actually functioning subreddits," (to arbitrarily choose a number), but you'd be wrong.

[–]3andfro 4 insightful - 2 fun4 insightful - 1 fun5 insightful - 2 fun -  (0 children)

Seems the same issue would arise for that (inflated?) total number of subs from Maniak's comment.

[–]NetweaselContinuing the struggle 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (4 children)

Alternatively, you could go to subredditstats, choose one of their criteria, and scroll waaaay down to the Xth sub on that list and see how "actual" the Xth sub actually is.

If it turns out that the 5000th sub ranked by "comments per day" is pretty dead, well, there ya go.

[–]3andfro 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (3 children)

So we don't know, and probably can't know, how many subs and how many users (even with overlapping membership across subs) are participating in the blackout. Not surprising, but a tad frustrating.

[–]NetweaselContinuing the struggle 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (2 children)

Not surprising, but a tad frustrating.

Even more frustrating (maybe): Reddit could know. And probably does.

[–]3andfro 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (1 child)

And is probably lying misleading about it.

[–]InumaGaming Socialist 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (0 children)

Their history is based in fudging numbers.

So the truth is we should have left a long time ago for greener pastures.

[–]NetweaselContinuing the struggle 6 insightful - 1 fun6 insightful - 0 fun7 insightful - 1 fun -  (18 children)

how many subs are there in total, even a ballpark figure.

Well, Maniak grabbed and dove into the Pushshift Data Dump, looking for WayoftheBern, and found it in a "ranked by posts+comments" ranking at #714 out of about 13.5 million alleged subreddits, each with an individual name.

We're on line 714 out of 13575389

Best "ballpark" I can give you. Upper bound of "number of subreddits" : 13,575,389. You could theoretically crawl through the same database and see how many of those 13.5 million would count as "actual" subs, if you could define the term by "total comments+posts."

Simply see what line the smallest "actual" sub is on, and Bob's your uncle.

[–]Maniak🥃😾 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (17 children)

Well, Maniak grabbed and dove into the Pushshift Data Dump, looking for WayoftheBern, and found it in a "ranked by posts+comments" ranking at #714 out of about 13.5 million alleged subreddits, each with an individual name.

Addendum to this, I used a "subreddit_counts.txt" file that was made specifically for this purpose, and those subreddits include the "user subs", with a fuckton of subs named "u_{username}", which explains the 13.5 million.

Going up to the first lines where the count is at least 1000 brings me to r/discountharmony at 1000, still with a lot of user subs above that. Whether or not they should be counted as proper subs is another question.

In any case, and that's without having information about which ones are actually active, the number of subs that have any kind of significance when it comes to usage and traffic is way, way, WAY below the 2.8 million number that was asserted in the thread above.

The 100k mark (for posts + comments) is crossed at line 12075 with r/imagesofflorida at 100001.

On the other end of the list, the #1 sub is r/askreddit with 746,740,850 posts+comments, that one is participating in the blackout.

#2 is r/politics, that one is as usual being an establishment bitch.

Then r/funny, r/pics, r/worldnews, r/memes, r/teenagers, r/nba, the only remaining ones above 100M. Of those, only r/worldnews and r/memes are not participating.

So basically, as far as content is concerned, the top 8 subs have more than 1.5 billion posts+comments. Of those, about 1.2 billion are blacked out.

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (13 children)

with a fuckton of subs named "u{username}",

Don't suppose you could send a "string counter" program through to tell how many "u/" and "r/" strings there are in the database?

[–]Maniak🥃😾 5 insightful - 1 fun5 insightful - 0 fun6 insightful - 1 fun -  (12 children)

9,501,204 matches for lines starting with u_, which leaves 4,074,185 others.

If I remove the lines with a count of 1 (because those are clearly not 'actual' subs): 4,616,932 "u_*" and 3,037,203 others.

With a count of at least 10: 1,023,366 u_*, 1,277,125 others.

The number of 'real subs' drops fast.

[–]NetweaselContinuing the struggle 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (2 children)

With that "u*" vs "u_*" mixup, might you need to rerun those numbers?

[–]Maniak🥃😾 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (1 child)

Nah, the mixup was on the sql side. These numbers were direct from the text file, where I hadn't forgotten to escape the underscore :)

[–]NetweaselContinuing the struggle 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (1 child)

There's another thing in this, which is much more complicated...

As I understand it, the PushShift numbers are aggregate totals. If a subreddit blew up for a month and then died off two years ago, those huge numbers would still be sitting there.

If you could subtract the February numbers from the March numbers, you could get the March activity alone.
But, as I said, complicated.

[–]Maniak🥃😾 5 insightful - 2 fun5 insightful - 1 fun6 insightful - 2 fun -  (0 children)

Hence why the API is needed. Because I sure as shit am not going to download multiple multi-TB torrents and process them manually in order to do this :)

[–]NetweaselContinuing the struggle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (6 children)

4,074,185 others.

Now we're getting toward reasonable numbers......

Tougher database manipulation question, can you delete every "/u" line and port what's left to a different file?

If you can, then follow up with checking https://subredditstats.com/ on the 100,000th "/r."

It's called "chasing the lower bound." If you then check the 50,000th one, then the 20,000th one, then the 10,000th one... you'll probably see a great jump between two of them. The "lower bound" would probably be between those two.

I figure it would have to be below [a higher number than] 5000.

[–]Maniak🥃😾 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

Name             Count    Rank
growcastle      133361  #10000
menshealth       39784  #20000
panamacitybeach   6835  #50000
boners            1583 #100000

(counting only those that don't start with u_)

growcastle

panamacitybeach

menshealth

r/boners not found, so I went to the #99999:

bourbontrade same count (1583)

[–]Maniak🥃😾 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

I ended up importing it in a quick sql table because I was getting bored with doing regexes in notepad++ so... that opens up the queries :)

(then again it's only name + count, so the information is very limited)

[–]NetweaselContinuing the struggle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (2 children)

The 100k mark (for posts + comments) is crossed at line 12075 with r/imagesofflorida at 100001.

In my head I was estimating about 15,000 "actual" subreddits, whatever that term would actually mean.

[–]Maniak🥃😾 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (1 child)

Here's a conspiracy theory: what if getting rid of PushShift was seen as a highly profitable move by the cunts-in-power because without a way to easily query the entirety of Reddit and be able to see just how many subs are actually active, with how many actually active users, it's way easier for the executives to make up bullshit numbers in order to get more money from investors?

[–]NetweaselContinuing the struggle 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (0 children)

As I have said for years, while FaceBook is "Weaponized Peer Pressure," Reddit is "Weaponized Autism."

They left data lying around for people to analyze. People will. And have, and are.

And when the numbers do not add up, it becomes obvious that they do not.