you are viewing a single comment's thread.

view the rest of the comments →

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (14 children)

But to answer your actual point, why on earth would we expect k clusters to align to those attributes?? As I just explained, WE care about that stuff. WE notice that stuff. To a computer, a gene is a gene. A gene that makes a blue eye is no more or less a gene than one that is not expressed. We definitely would not expect k clusters to align to those attributes in ANY case, whether race is psuedoscience or not.

Whoever was using this argument you are responding to, I agree that it isn't a great one. That's why I don't use it.

But AGAINST ALL ODDS, the neutral algorithm returns K clusters that just happens to correspond to our racial groups! This is a wild coincidence.

•what racial groups •what racial groups •what racial groups

Please tell me what these racial groups are. Please tell me what our 'preconceived ideas about race' are. You keep saying this, as if this is some information we all know. It isn't. White people in the 19th century didn't know this. The world doesn't 'know' this. I don't know how many different ways I can ask you to clarify here, but you keep ignoring me. Again, though, I know why. You are hoping I gloss over this. You know that being put on the spot and asked to say precisely what 'our preconceived ideas about race' are opens up a biiiig can of worms, and you do not want to put yourself in an even more defensive position where it will be even easier for me to poke holes in your arguments. I mean, I readily admit that I have the high ground here. You are the one claiming a positive position: "race is. Race is, this. Our ideas are, this." All I have to do is nitpick. I don't need to be right, all I have to do is show that you aren't right.

I could write the clustering algorithm that you're using from scratch, it is nothing special

I'm sure you could. I presume you are a professional or at least a serious hobbyist programmer, and I sure as heck know you aren't a geneticist or a researcher who does cluster analysis.

I don't know if this is a case of low IQ or you just not being familiar with how the k-means algorithm works.
https://youtu.be/HVXime0nQeI
https://youtu.be/4b5d3muPQmA

And there it is. I see this as the moment where I won the debate, actually. You may recall, my first post in this thread was a reply to someone who descended into calling another poster a retard. In my reply, I asked, "Are you going to call me a retard now as well?" And here, you just did. The very first point I made in this thread is that when you opt to just call someone a retard, it can look quite like you are just frustrated because you are losing.

I know you aren't going to change your mind, not tonight. That isn't how these debates work. But I already know that you know I am not stupid. I know you aren't stupid either. I already did get through to you--you are not going to change your mind, I'm not a miracle worker, but you are going to be more careful with your wording in the future. You'll refine your arguments. See, I know so much about you. You feel that your intelligence is not sufficiently appreciated. And it isn't. You want to weigh in on social issues, many of which you have vastly superior answers to than the majority of society, and yet you are silenced on grounds outside of your control. So you throw your lot on with Race and Country to reclaim some of that value that is owed to you. But the truth is, man, you're a mongrel. Europeans were already mostly mongrels in the 20th century. You're no shining beacon of Whiteness. You're a smart guy who works a job he mostly dislikes, desperate to find some secret truths that validate your intelligence, because society refuses to do so. I fucking know you, man. Of course I know you. When was the last time you even wrote back and forth this much with someone? We're basically penpals now. I'm an anthropologist, I've been reading between the lines this whole time. I know exactly who you are.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

But to answer your actual point, why on earth would we expect k clusters to align to those attributes?

Unless the clusters are totally random, then we would expect the clusters to align to SOMETHING. Those attributes were just examples of what the clusters DONT align to. You are free to be creative and find other attributes, phenotypical or genotypical or otherwise that the clusters DONT align to.
The interesting part is that AGAINST ALL ODDS, the clusters align to race.
Ask yourself the opposite question (of the quoted), why would we expect to find that the k clusters align to race if race was a bogus manmade concept that had no relation to biology? How crazy is it that the k clusters just happen to correspond perfectly to our bogus manmade concept? That's like 1 in a qutrillion. And not just once? Every time we run the algo and no matter the initial conditions. Definitely a weird coincidence that keeps repeating itself.

Please tell me what these racial groups are.

The idea that we can cluster human beings into clusters of genetic similarity. A swede wont be more similar to a Gambian than he is to another Swede. You might find examples of "Swedes" being more similar to say Danes than other Swedes but this is due to the color spectrum fallacy.

And there it is. I see this as the moment where I won the debate, actually. You may recall, my first post in this thread was a reply to someone who descended into calling another poster a retard. In my reply, I asked, "Are you going to call me a retard now as well?" And here, you just did. The very first point I made in this thread is that when you opt to just call someone a retard, it can look quite like you are just frustrated because you are losing.

No. You display a repeated misconception of how the kmeans algorithm works when it does its clustering. This is why I sent you two introduction videos so that you may educate yourself. This would help you correct your misconception and help you understand my argument. Because right now you do NOT understand my argument.
This could either be because you do not understand how kmeans works or because you have a low IQ. I assume, and I really hope, that it is because you don't understand how kmeans works. If its an issue with IQ, then we will be stuck.
You might see this as a winning moment but I am merely responding in kind. If you do not want this kind of response, then cut out your arrogance. I have also told you to do this or I would respond in kind. And I will continue to do this as long as you continue to this.

Your first response to me showed to me that you lacked basic knowledge of this topic and to be honest I felt that it was a loss. I was hoping to learn something new that I had not considered, or gain a new insight or learn something new. I am sure that you have knowledge that I don't know that can help broaden my perspective, but before we reach that, you will have to humble yourself and start listening to the arguments I am putting forward, because you are not listening or not understanding them. If it is a lack of understanding, watching a few videos could help sort out that misunderstanding and we could move on from there onto more interesting insights and angles. This is 101 stuff of cluster analysis that you are not understanding and its a shame.

But I already know that you know I am not stupid. I know you aren't stupid either.

Yes, but someone can be low IQ and not be stupid. Most people that aren't stupid have a sufficiently high IQ and assume you to be high IQ too, and that you just lack knowledge in certain aspects of cluster analysis which leads you to a misunderstanding of how something works.
You seem to have the misconception that its a given that the kmeans clustering algo returns k racial clusters but its not a given. We didn't ask it to give us RACIAL clusters. We just told it to give us K whatever clusters and it just happend to return whatever=racial. Why didn't it give us K eye color clusters? Because human beings aren't clustered by eye color, but by race.

I know exactly who you are.

You really don't. You were right about one thing. I haven't discussed this topic in a long time, because 1: its already settled science and 2: its a banned topic on social media so literally impossible to have back and forths about. You're an anthropologist? The anthropology subreddit has this as their rule 3:
""Race realism", "human biodiversity", conspiracy theories, and pseudoscience will be removed as will any other content that is incorrect or not supported by reputable scholarship."
So yeah, its hard to challenge someone with expertise on this subject when its a banned subject.
Like you told another user, you wouldn't want to dox yourself, because just having a debate with alt righters would put your career in jeopardy, well, its the same for me, so you might understand why I don't regularly have a back and forth like this.
I used to, back before social media started mass censoring taboo subjects.

Europeans were already mostly mongrels in the 20th century.

Oh, noo!

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (12 children)

But, just so you don't accuse me of trying to obscure the poont, I'll answer: your point... it's moot. You debunked yourself already. Yes, the k value is still arbitrary. You admitted that. You admitted that the choice of the number is arbitrary. That proceeds into the analysis. Once one arbitrary distinction is made, the entire analysis remains, at least to that extent, arbitrary. It doesn't necessarily become MORE arbitrary during the application of the algorithim, unless other arbitrary assumptions have been woven into the code. And just because an analysis contains an arbitrary component does not mean it is incorrect or useless. But it at least maintains the arbitrary quality from the first decision of the number, as you already said.

Once an element of arbitrary decision making enters your analysis, the entire analysis remains subject to challenge on those grounds, even if no additional arbitrary decisions are made.

This isn't the night before your stats final in junior year, bro. I'm a geneticist.

I did watch your videos, because I know what's up here. You've gotten desperate. When people get desperate, they mess up. At this point in the debate, I know these videos will back ME up more than you. So let's dig in.

YOUR source says: "In this case the data make three relatively obvious clusters. But rather than rely on our eye, let's see if we can get the computer to identify the same three clusters."

Sounds familiar! I refer you to my previous post:

Its intended function is very much like an ANOVA or MANOVA. It is a confirmation test, a way to say: "Hm, I am pretty sure I see 3 clusters here, at this scale, and I do indeed want to use 3 clusters in my analysis. However, I worry that if I eyeball it like this, the peer reviewers will take issue with that. What I can do instead is use this algorithim to confirm that, at this scale, the computer also sees 3 distinct clusters. It might seem obvious, but this way, my reviewers won't be able to chastise me for eyeballing the chart. It is obvious that I see 3 clusters here, but this is just one little test I can use to not make it seem like I am choosing to see the 3."

Damn. That's exactly what I effin' said. YOUR source is on MY side! This is a CONFIRMATION test designed to help you double down on your assumption! You don't even use it until you have ALREADY decided that you want to see x number of clusters! It literally says it, right here, in YOUR source!

It goes on: "Step 1: select the number of clusters you want to identify in your data." That I want to identify? That doesn't sound very scientific. That sounds like a fancy way to eyeball something. Which is exactly what I said in my last post, before you shared this video. And exactly what the video says. Hell, they even reference eyeballs, too. I could not have asked you to send a source that is more damning to your own position and more supportive of mine

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Once an element of arbitrary decision making enters your analysis, the entire analysis remains subject to challenge on those grounds, even if no additional arbitrary decisions are made.

Well put. Its open to challenge on those grounds. On those grounds. You are free to attack that we have 7 clusters instead of 5 but its not really relevant to the concept of race.
Common misunderstandings, often intentionally so, consist of 1: Races are genetically distinct and 2: Fixed number of races
Neither of those opinions are held by any race realist. But in every study that tries to debunk race, they debunk those points or a variant of those points. Points that no one holds.

YOUR source says: "In this case the data make three relatively obvious clusters. But rather than rely on our eye, let's see if we can get the computer to identify the same three clusters."

This is an introductory video. He is showing you how the algo works. He is showing you that it works.
You don't want to actually eye ball it and then determine the number of clusters. Like.... in most data analysis we work with thousands or millions of dimensions and you can't exactly "eye ball" that.
If you ran the elbow test, you would also find that k=2 is worse than k=3 and that k>3 does not provide any meaningful improvement.
The whole point of these algos is to not rely on our eyes but to let the computer cluster high dimensional data (like genetics)

You don't even use it until you have ALREADY decided that you want to see x number of clusters! It literally says it, right here, in YOUR source!

Yes, but I think maybe you are stuck on the assumption that race realists care about a FIXED number of races? You can put x as 3, 5, 7, 23 if you want. Swedes wont be put into the same cluster as Ethiopians, while Danes are put in another.

I could not have asked you to send a source that is more damning to your own position and more supportive of mine

What I had hope that you got from the video was that we chose the number of clusters, not HOW it clusters. "HOW" here being that it just so happens to cluster on RACIAL grounds, not any other feature/attribute/whatever.
The algorithm doesn't even know that it is looking at genetic data. It could've been about finance and it thought that it created clusters that describe different spending habits. Its just getting numbers and returning some clusters.
To our surprise these k clusters just so happen to be RACES.

This is me repeating the distinction hoping you at some point will start to understand that the distinction is important.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (10 children)

Now, this is medicine. About cancer, specifically. You might not recall since you haven't watched this since junior year before that test, but it's ok to make arbitrary assumptions when we have a purpose. If it helps to cure cancer, what does it matter if we assumed some clusters that might not be totally accurate? We're trying to preserve life here. If you made THAT argument for race, a social, non-scientific one, the anthropologist in me would have more sympathy. Show me that at the end of the day it is best for everyone to see things your way, but don't try to claim the mantle of science when it doesn't fit you.

Of course, this analysis COULD be wrong. What if there is a 4th cancerous cluster, a small one, just forming, that the assumption of 3 misses? The chemo might miss it. The surgeon might miss it. This hypothetical person could literally die due to a false assumption.

This entire video is rooted in the premise that we have a goal: identify cancer cells. If we fail to identify cells, that is bad. A bad-fit model fails to identify the cells. A perfect model identifies all of them. The correct number of clusters depends on our question, in this case, "How many cancerous cells are there, so we can kill them?" If we apply this logic to race, there is no clear goal. You keep talking about 'preconceived ideas' that you refuse to define. But there are so many different kinds of preconceived ideas about race. There either IS or IS NOT a cancer cell. It is not social. But 'preconceived ideas about race' ARE social. In order to find the reality of it, we must ask WHY humans form ideas about race, how they go about doing it, what parameters we use to do it, how those parameters have changed over the development of society, and how our criteria for racial classification evolved in the ancestral environment. You don't seem interested in any of those scientific questions. You just want to stop at 'preconceived ideas about race' because that is what is important to you, not the scientific method.

By the way, StasQuest is run by the genetics department at UCNC. Josh seems like a smart fellow. If you came to Josh and asked him about race, what do you think he would say? I mean, it's an academic genetics department. They tend to be pretty woke. Do you think he would agree with your assessment of race here?

Let's look at your first video now (I watched them out of order): "Step 1: Start with a dataset with known categories."

Well, fuck. Sorry, man. Step 1 knocked you out of the game again. KNOWN CATEGORIES. So we are supppsed to come into this KNOWING the categories at STEP ONE, so says your source. And yet you have been speaking this whole time as if the analysis will GIVE us the categories. See, this is also what I said in my last post. I knew you had to try and shoehorn in 'preconceived ideas about race' that you refuse to define, because you know that this analysis requires KNOWN CATEGORIES at STEP ONE. You can't get this stuff past me, man.

So, the point of the video is to try and classify an unknown category based on a known category. So, presumably, you share this to try and argue that if we KNOW person A is African, we can determine the racial category of person B who happens to be a neighbor of person A in the cluster analysis. But don't you see them problem? You already classified person A before you started. You didn't arrive at their race through the analysis, you presumed that you have some other means of knowing that 'African' is their correct racial designation. Which is all well and good--probably, most of us would agree about what an 'African' looks like, as long as we don't include white South Africans, Egyptians, Morroccans, Mauritians, etc. etc. That is a lot of exceptions. But we are walking into this with the categories in-hand. As step 1. Not determined by objective scientific analysis. And I didn't even invoke the k problem, which persists in this video. Specifically, Josh points out that if you chose a k of 1, it will classify based on 1 neighbor. If you choose a k of 11, it will classify based on 11 neighbors. Josh is a great educator! I already knew this, but it is so nice to have him debunk your arguments for me. Wait, did you share these videos, or did I??

Thanks for the homework. It does come across like you are trying to bully me into not replying by just regurgitating links at me because you can't make your case effectively, but I like refreshers so I enjoyed watching them.

Assuming its not an issue of low IQ (because then we can keep going back and forth forever),

I know. There are three ways to 'win' online like this. One, someone gets tired of typing and leaves. That isn't much of a victory though. Two, the loser gets frustrated and starts calling the other person a retard with a low IQ. Also not satisfying. Three, and I know I already won here, the loser realizes that they have to refine their arguments. They don't change their mind, not yet, but they are more careful with their words the next time around, or perhaps avoid the debate entirely for fear of getting spanked again.

we don't tell the algorithm to give us the racial clusters. We tell the algorithm to give us K clusters. And these K clusters HAPPEN to be the racial clusters.

That's precisely the problem. You are saying that you know the clusters in advance. You know what the races are in advance. You are starting from the premise, "I am correct. I am right. Let's use this program to find out more about my correct answer." This is the definition of circular logic. You are wrong before you even started! YOU DO NOT KNOW THE CATEGORIES! You won't even tell me them, even though I keep asking!

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Of course, this analysis COULD be wrong. What if there is a 4th cancerous cluster, a small one, just forming, that the assumption of 3 misses? The chemo might miss it. The surgeon might miss it. This hypothetical person could literally die due to a false assumption.

It doesn't really matter to the concept of race if there is a 4th race that we are not getting, because we used k=3

If we apply this logic to race, there is no clear goal.

People have done studies that try to align genetic clusters with self-identified race and they match like 98-99%, with the mistakes being boundary cases, which is to be expected, because the races are not genetically distinct but continous along a spectrum.

But 'preconceived ideas about race' ARE social.

That's true. It is interesting that our social concept of race align so perfectly with the biological reality.
And here I am not talking about 'race' as defined by sociologists where its mostly just a political label or a label that society racializes you into.
Because if humanists get to define all words, then sure, I will happily admit that race is a bogus concept right here and now and that race has no biological meaning. With these definitions that humanists use, race as a concept has been HEAVILY debunked by science. Irrefutably so.
The only problem is that no one ever claimed that debunked position.
Whats not debunked is the position of race realists.
The simple fact that you can use kmeans on genome wide population data and get racial clusters, 5 clusters, 3 clusters, 7 clusters, x clusters, is proof that race is real. And I will repeat myself again, because you failed to understand the distinction. We can use k-means to get k racial clusters - this proves that race is real, because kmeans does not return k racial clusters, it returns k clusters. The fact that these clusters are RACIAL clusters and not EYE COLOR clusters or SPENDING HABIT clusters or whatever else clusters is the proof.

By the way, StasQuest is run by the genetics department at UCNC. Josh seems like a smart fellow. If you came to Josh and asked him about race, what do you think he would say? I mean, it's an academic genetics department. They tend to be pretty woke. Do you think he would agree with your assessment of race here?

StatQuest is hilarious and he is better at explaining some concepts than a lot of professors or textbooks. I always recommend people to watch his videos because he breaks every concept down in ELI5 formats. Would he agree with me on race? I heavily doubt it

Well, fuck. Sorry, man. Step 1 knocked you out of the game again. KNOWN CATEGORIES.

You really should've watched these videos to learn and not to win a debate. Ok, so the video K-nearest neighbors is one of the simplest clustering algos and you start with known categories and then you see if an unknown person is nearest to whichever cluster. The Kmeans algo is slightly different but builds on the same idea and your program STRUTUCE uses a variant of the Kmeans algorithm.
In Kmeans you don't start with KNOWN CATEGORIES, but rather (usually) random initialization of the "center" of each k cluster.

That's precisely the problem. You are saying that you know the clusters in advance. You know what the races are in advance. You are starting from the premise, "I am correct. I am right. Let's use this program to find out more about my correct answer." This is the definition of circular logic. You are wrong before you even started!

No, kmeans doesn't use KNOWN CATEGORIES, so your argument is moot. This also shows me that I was completely right. You DID NOT know the fundamentals of how your program STRUCTURE does the clustering. STRUCTURE does not use k nearest neighbor (knn). I shared that video because I wanted to teach you the basics. Knn is a prerequosite for kmeans. I hoped to build your knowledge up in cluster analysis, because this is basics and it would help you understand my point and help you apply your knowledge of genetics more appropriately, I am certain that once you get the fundamentals of how clustering actually works you can then expand further and provide new insight to me using some knowledge where you have more expertise than I do.
I'm trying to learn and I am humble enough to realize that you likely know some stuff better than me, but you seem unable to humble yourself and the result is that you come off as arrogant and uneducated. Like in this comment you're heavily trying to debunk my arguments about your own program and you don't even know the basics. You confuse knn with kmeans.
I'm not really annoyed because I've come to expect this kind of behaviour from people in your camp of the debate. You've been taught that we are stupid and that you know it all and then your camp has outlawed all discussion on the topic so that your view is never challenged.
When your type then gets into a debate with one of us, you're misrepresenting our views (because you've been lied to by your educators) and you're debunking concepts that no one holds and you're displaying an extreme lack of knowledge on the subject, often citing 50 years old fallacies. I'm not saying you are doing all those things, but this is what your type usually does and thats why I have come to expect this behaviour and is more tolerant towards your admittedly rather nasty & unmannered behaviour.

With that said I am pleased that you're not resorting to the 50 year old fallacies of "more within than between" that 99% of students are still taught in class. Got to uphold that pseudoscientific narrative, eh?

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (7 children)

More from your source: "Low values for k (like k=1 or k=2) can be noisy and and subject to the effects of outliers. Large values for k smooth over things, but you don't want k to be so large that a category with only a few samples in it will always be outvoted by other categories."

Well, that's convenient. You keep saying that the clusters match our 'preconceived ideas about race,' but there are quirks in this method of analysis that make it not work well with low or high k values. So there is a sweet spot of values for k, which happens to correspond to the numbers of races that you like (even though you refuse to give me any of those numbers) and do not correspond to the values for k which you dislike. To be clear, there is no fundamental reason why a low or high k value would not be the best fit for the model. With a perfect dataset, with perfect computing power, we would be able to eliminate nearly all noise. It just so happens that the limitations of the method do not work very well with low or high k values, so you are going to reject low or high numbers of races out of hand. You are, as programmers often do, letting the algorithim do the thinking for you. What if your tool is not fit for the job? All you have is a hammer, so your problem looks like a nail.

It returns 7 clusters that nicely correspond to 7 racial groups.

Is this it? Are you finally going to share these mysterious 'preconceived ideas about race' with me? Or are you only using 7 because I used 7 earlier? Please tell me what these known categories are. I certainly don't know them. The racialist thinkers of the past didn't know them. I doubt most people you ask would be able to guess precisely what your preconceived ideas are, nor would you be able to guess theirs.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Well, that's convenient. You keep saying that the clusters match our 'preconceived ideas about race,' but there are quirks in this method of analysis that make it not work well with low or high k values. So there is a sweet spot of values for k, which happens to correspond to the numbers of races that you like (even though you refuse to give me any of those numbers) and do not correspond to the values for k which you dislike. To be clear, there is no fundamental reason why a low or high k value would not be the best fit for the model. With a perfect dataset, with perfect computing power, we would be able to eliminate nearly all noise. It just so happens that the limitations of the method do not work very well with low or high k values, so you are going to reject low or high numbers of races out of hand. You are, as programmers often do, letting the algorithim do the thinking for you. What if your tool is not fit for the job? All you have is a hammer, so your problem looks like a nail.

See? Building your fundamentals was the right thing to do in order to get us into a higher quality of debate.
I think you might raise a good point here that I would like you to expand upon if you so desire. I get the overall gist of your argument and it might hold some merit thats worth exploring some more.

Is this it? Are you finally going to share these mysterious 'preconceived ideas about race' with me? Or are you only using 7 because I used 7 earlier?

Yes, only using 7 because you used 7 earlier.
You could put 3 races and it would maybe return europeans, africans and asians. You could then put 4 and it would maybe return europeans, africans, asians and oceanians. Or 5 and it would include latinos/hispanics/indians. As you increase the number of clusters it will fine tune the races.
If you have 1000 samples and you put k=1000, then it would simply return each sample as a race, which is why its not very good with high k.
Likewise if you pick a too low k it will combine "clusters that ought to be" in weird ways. Again its worth repeating, since you didn't understand why the distinction earlier was important. Its not the number of clusters that is important, its the fact that the clusters correspond to RACIAL clusters that is important. We didn't tell the algo to find k racial clusters, we told it to find k whatever clusters, and these "whatever clusters" happend to be "racial clusters", swedes with swedes, gambians with gambians.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (5 children)

The elbow method wont return k=49 after having returned k=7 on the same sample. But, I can see some situations, where the returned k might differ if we introduce randomized initial configuration of the k-means algo.

No, you completely missed the point. You're tunnelvisioning, as programmers do (a strength and a weakness). I'm not saying that there is a stochastic element in machine learning. We all know that. I'm saying that there is no good reason not to run the analysis again on the 7 clusters and get several more clusters. If we are talking about cancer cells, we don't really have a good reason to do this, unless we have some reason to believe that there is a yet-smaller tumor to find (though I doubt you would simply redo the analysis on returned clusters for that, what with signal degradation). But we have many, MANY more races, historical and contemporary, to identify with repeated cluster analysis of 'known' races. It's not just 'African.' What of all the African subtypes? Dozens more races, all determined with a non-arbitrary k, right? Well, except for that it is arbitrary, like you admitted.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

But we have many, MANY more races, historical and contemporary, to identify with repeated cluster analysis of 'known' races. It's not just 'African.' What of all the African subtypes? Dozens more races, all determined with a non-arbitrary k, right? Well, except for that it is arbitrary, like you admitted.

Yes! And that's whats so wonderful about using the computers to tell us how to best cluster the racial groups. We can start researching if Swedes really are that different from Danes or we can see if Fins did cluster with the mongoloid race like some Americans claimed 100 years ago and so on. We can start seeing how ancient races compare to modern races. Where ancient individuals cluster into modern races. Where modern individuals cluster into ancient races.
Things like this is just very exciting

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

Are you of the misconception that race realists believe that there exists a fixed number of races? This is not the case. No one holds that position.

Cop out. You refuse to answer the question because you know you can't. There are as many races as you want to see. Just because a bunch of people have decided that they refuse to answer a question because they can't, that doesn't mean it is a defensible position. It isn't. This is your 'turtles all the way down' moment. Scientists DO have an answer for how many species of sea cucumbers we have documented. They DO have an answer for how many subspecies of grey wolves there are. Why don't YOU have an answer? The rest of the scientific world isn't shy about this when it comes to taxonomic classification, but you have got cold feet all of a sudden.

Are you conflating me with someone else?

Nope. No conflation. I mean exactly what I said. Your first link contains data from studies that were conducted using STRUCTURE. They are landmark studies, often the first cited in these discussions, and cite them first you did.

I simply want to argue that race is real.

Of course race is real. I would never say something so ridiculous as 'race isn't real.' Race is one of the most consequential and painfully real things in the modern world, perhaps the single most consequential. But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules. In fact, with reference to those modules, in a way, race IS biology. Not in the way people think of it, as a real attribute of human population genetics, but as a little part of our brain that has evolved to see race wherever we look, because so far it has proved adaptive.

The same objections that you're using against race can be used to deconstruct the concept of species.

No, they can't. As I explained before, you are using the color spectrum argument that you already admitted you reject. I say that SPECIES is a legitimate taxonomic classification and SUBSPECIES is not. You say that my same gripes with subspecies can deconstruct species as well. This is identical to someone saying that the gradient of colors shows that there cannot be an actual yellow, and actual orange, an actual green. The existence of intermediaries does not disprove the existence of the discreet categories. Subspecies is an intermediary between 'species' and 'individual.' It is undefined in science, or, rather, it has so many definitions as to render it mostly meaningless outside of very specific bodies of literature. Are there glimmers of inconsistency in species categories? Of course. There are discrepencies between biological, phylogenetic, cladistic species, etc. That does not mean that the vague and undefined intermediary (subspecies) somehow deligitimizes the defined and specific category (species). That is the color spectrum argument. You said you disagreed with it (even though I never brought it up until you did), and then you used it to try and delegitimize the species concept.

No, it didn't.

Yes, it did. I have an excellent source for this. YOUR video. Didn't quite remember Josh saying that one bit, eh? ;)

I have been very generous in that I have willingly gone into the territory you chose, stats and ML, just to show that you will lose even on your home turf. But we have hardly even explored the anthropological assumptions in your argument. What are our preconceived ideas about race? How do you KNOW what the clusters are in advance? What is this information that you refuse to talk about? It is absolutely imperative to your argument. You keep saying it over and over, so obviously it is important. What are our preconceived ideas about race?

[–]DragonerneJesus is white 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

Cop out. You refuse to answer the question because you know you can't. There are as many races as you want to see. Just because a bunch of people have decided that they refuse to answer a question because they can't, that doesn't mean it is a defensible position. It isn't. This is your 'turtles all the way down' moment. Scientists DO have an answer for how many species of sea cucumbers we have documented. They DO have an answer for how many subspecies of grey wolves there are. Why don't YOU have an answer? The rest of the scientific world isn't shy about this when it comes to taxonomic classification, but you have got cold feet all of a sudden.

I don't think its a cop out. You seem to be of the idea that we believe in a FIXED number of human races. Can you point to any modern geneticist that believe in race that says there is a FIXED number of human races? We can talk about the big ones like europeans, asians, africans, americans, oceanians but we can break each of these down into smaller races too.
The funny part about this is that just for entertainment, lets say I said we have 6 human races and then you run k=7 and show me that we now have 7 human races LOL, it wouldn't disprove the 6 human races. I will repeat it until you understand your misunderstanding: Its not the number of races thats important, its that the algo returns k RACIAL clusters. We didn't ask it to return RACIAL clusters. We asked it to return whatever clusters and it somehow chose to return RACIAL clusters. Out of all the millions of ways it could cluster human beings, it chose down racial lines. What a coincidence.

Nope. No conflation. I mean exactly what I said. Your first link contains data from studies that were conducted using STRUCTURE. They are landmark studies, often the first cited in these discussions, and cite them first you did.

My first link was the medium article about choosing the optimal k....

Of course race is real. I would never say something so ridiculous as 'race isn't real.' Race is one of the most consequential and painfully real things in the modern world, perhaps the single most consequential. But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules. In fact, with reference to those modules, in a way, race IS biology. Not in the way people think of it, as a real attribute of human population genetics, but as a little part of our brain that has evolved to see race wherever we look, because so far it has proved adaptive.

I hope you're not referring to the sociology concept of race, where blacks = people with black skin from africa, southern india, australia, south america, because black skin != same race.
If this is not what you meant and I understood you correctly, then this is actually perfect, because this makes us able to translate from your paradigm into my paradigm. It opens a communication channel where we speak the same language.
"But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules."
This is how we humans classify races. This is my preconceived ideas about racial groups. (Obviously a normie on the street wont have as good a classification as someone who works with different human populations. The race realists before DNA was discovered made a lot of different classifications of human races.

We can compare this with the biological reality.
"Not in the way people think of it, as a real attribute of human population genetics"

The clustering algorithms can help us see if our preconceived classifications match the clustering of genetic population data. If it does, then our preconceived classifications were correct, in the sense that it had a biological/genetic basis. If it does not, then it is evidence to support the hypothesis that the preconceived classifications do not have a biological/genetic basis.
We have found that they DO match with the biological reality.

Subspecies is an intermediary between 'species' and 'individual.'

'species' is an intermediary between 'animal' and 'individual'

Are there glimmers of inconsistency in species categories? Of course. There are discrepencies between biological, phylogenetic, cladistic species, etc.

Yes and how do you choose which type of "species" to use? That is a very arbitrary choice! Oh no...
Is a Tiger and a Lion even different species if they can produce offspring together? Wow, time to eliminate the entire concept of 'species'. Of course not.
Just so I'm not misunderstanding you; you don't reject 'subspecies' as a concept, right? You're just contesting if human beings have races or not.

Yes, it did. I have an excellent source for this. YOUR video. Didn't quite remember Josh saying that one bit, eh? ;)

Please be able to have an attention span of more than 1 comment back. If you use an algorithm to pick the optimal k, then you did not pick k. This algorithm could be using the elbow method for simplicity.
And if you remember I explained how the optimal k might change when we run the algorithm once, because of the randomized initial conditions of the kmeans algo. Well we can use other data analysis tools to increase the chance of finding the optimal k to any arbitrarily high percentage. So if you want to be 99,9999% sure that you are using the optimal k, then you can run the "optimal k algorithm" as a monte carlo algorithm.
There will be a risk of 0,000001% or whatever percentage risk that you tolerate that the MC algo will return, say, 8 clusters instead of 7 clusters (which was actually the optimal k)

I have been very generous in that I have willingly gone into the territory you chose, stats and ML, just to show that you will lose even on your home turf.

The reason I've went here is because of two reasons. 1: you mentioned k in an earlier post (which I know is a bogus argument) and 2: in your first reply to me, you displayed that you lacked an understanding of how the clustering actually works in the program that you're using, which led to you reaching some false conclusions and derive some misconceptions about the clustering.
It seemed to me that you either didn't understand these fundamentals (which seemed plausible considering its not your area of expertise) or you understood the fundamentals but had too low IQ to rationalize about the implications. You took it as a "win", lol. Instead you should've taken it as an invitation to learn the fundamentals and correct your misconception so that we may reach a higher level of debate and gain new insights. I am absolutely certain that you too hold knowledge where you can school me, probably also 1st year undergraduate stuff that I just OUGHT to know, but simply don't because we don't have the same background.
Engaging with you is an opportunity for me to learn, hopefully, and also an opportunity for you to learn, unless you're closeminded and think you know it all, despite that clearly not being the case.

But we have hardly even explored the anthropological assumptions in your argument. What are our preconceived ideas about race? How do you KNOW what the clusters are in advance? What is this information that you refuse to talk about? It is absolutely imperative to your argument. You keep saying it over and over, so obviously it is important. What are our preconceived ideas about race?

As a starting point we could use some of the race realist classifications from 200-100 years ago, knowing that they're outdated, wars have happend, genocides have happend and so on, but still we'll expect them to be mostly correct if we account for some historical changes and admixture events over this time period.
We will find that many of the ideas will be wrong but that the overall idea was correct.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

This is against all laws of data analysis.

It isn't. The program will always give you the number of clusters you ask for. Your misunderstanding is that you are mistaking BEST FIT for TRUTH. The model is only ever there as a tool to help you answer your question. It doesn't represent reality. Actually, the bad fits are just as important as the best fits. They are negative results, scientifically speaking. None of this method ever leaves the realm of the experiment. It is always a hypothetical approximation of reality which presents a picture that is either more or less useful to answering your qustion. Science trumps data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

[–]DragonerneJesus is white 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

It isn't. The program will always give you the number of clusters you ask for.

That's not what I'm saying. Yes, it gives you the number of clusters you ask for.
The point is that you should not BIAS your choice of k by having inspected the data previously. The point of unsupervised learning is exactly that; being unsupervised. In this case its not the 'worst' mistake a researcher can do but its a bad practice.
You can run into situations where its perfectly acceptable to do, but thats not that relevant for what we're talking about here.

Your misunderstanding is that you are mistaking BEST FIT for TRUTH

Is this some epistomology argument? If so I'm not really interested in opening that can of worms. We can put it down on a note and take this subject up again when we're done with this debate.

Science trumps data analysis.

Science is data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

It will be very interesting to hear his response. Just don't spam him. A better approach would be for us to go back and forth, condense some points, figure out say 10 core arguments where we disagree, put it in a proper format and let him return back on those.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

This is the sociololy perspective, where race isn't biological but instead individuals are racialized by society into "politically convenient allyships". In a way, you can say that "one human race" is exactly that: a politically convenient allyship to usher in multiracial societies. But I'm sorry for getting political, so lets leave it at that and keep us grounded in biology and genetics and data analysis, not sociology.
I somewhat agree with the quote though, and I know that many in the alt right sphere definitely agrees with the quote.