you are viewing a single comment's thread.

view the rest of the comments →

[–]DragonerneJesus is white 4 insightful - 1 fun4 insightful - 0 fun5 insightful - 1 fun -  (19 children)

I will try to teach you some basics of how clustering works. Now I've never used your program STRUCTURE, I use python myself, but I've spend the last minutes reading up on the documentation of your program and its not a surprise to find that your program uses the "k-means clustering algo". Its also unsurprisingly the algorithm used in the link that I sent you.
"The K-Means algorithm needs no introduction. It is simple and perhaps the most commonly used algorithm for clustering."

It is not an advanced clustering algorithm but thats fine. In this, simple is better. With K-means you have to specify K before running the algorithm. You pick say 7 clusters, run the algo and the algo returns 7 best fit clusters, exactly as you specified.

You can pick any number. If you want 10 clusters you set k=10 and the algo will output 10 clusters.

In order to get ANY clusters from this program, we must put the desired number of clusters in ourself.

Yes, that goes without saying. Did you read the article I gave you that described how we can estimate the best K to choose?

Your clusters, literally, are arbitrary opinions. They are not in the data.

See I think this is where your lack of understanding of this subject starts. The clusters are not arbitrary opinions. The number of clusters is arbitrary and must be chosen somewhat subjectively, although we can pick an optimal k using 1st year undergraduate methods.
If you pick 2 clusters, the algo will not give you "arbitrary opinions" as you wrote. Instead it will provide the 2 clusters that best split the data.
0. Preconceived ideas about racial groups
1. Choose K
2. Algo returns K clusters
3. These K clusters that our algorithm returned describe the same groups that we had in our preconceived ideas about racial groups.

Please pay attention here, because you seem to have missed this point several times now. We are NOT telling the algorithm to create K racial groups!!! We are telling the algorithm to create K clusters from the genetic data. This is a VERY important distinction.
Why? Because if racial groups were pseudoscience, then we would NOT expect the algorithm to return K clusters that align almost perfectly with our preconceived ideas about racial groups!!
If racial groups were pseudoscience, we might find that the algorithm would return K clusters that happens to correspond to hair type groups, or eye color groups, or nose length, or height, or IQ, or whatever random group you might think of. But AGAINST ALL ODDS, the neutral algorithm returns K clusters that just happens to correspond to our racial groups! This is a wild coincidence.

Hell, you could use data that includes multiple samples from single individuals and ask STRUCTURE to give you more clusters than there are individual humans in your sample!! And it WOULD! You are only talking about a program, STRUCTURE, that you are only just learning about from me. I have been using this program for years.

Please keep the arrogance down. I could write the clustering algorithm that you're using from scratch, it is nothing special, and I think you might want to read an introduction to k-means clustering algorithms, because you seem to have some very basic misunderstandings about the algorithms that you're using.

Here is what you are doing: without realizing it, you are choosing to use a number of clusters that corresponds to the 'races' you want to see. You say they 'correspond.' They do not. You have to first choose an arbitrary number of clusters (a k value) for STRUCTURE to give you.

I don't know if this is a case of low IQ or you just not being familiar with how the k-means algorithm works.
https://youtu.be/HVXime0nQeI
https://youtu.be/4b5d3muPQmA

Here are some videos for you to watch, which I would advice you to go through. Especially if you've been using that program for years and still haven't taken the time to understanding the fundamentals of how it works.

Assuming its not an issue of low IQ (because then we can keep going back and forth forever), we don't tell the algorithm to give us the racial clusters. We tell the algorithm to give us K clusters. And these K clusters HAPPEN to be the racial clusters. You are saying the opposite: We tell the algorithm to give us K racial clusters and then the algorithm gives us K clusters that of course correspond to the K racial clusters that we told it to give us.
What you are saying is NOT what we are doing. We tell it to pick, say, 7 clusters, the algorithm could decide to give us 7 clusters that correspond to red hair, brown hair, black hair, yellow hair, blonde hair, golden hair, orange hair but thats not what the algorithm returns.
It returns 7 clusters that nicely correspond to 7 racial groups.

It is so great, let's run it on the same sample again!

Whoops. 49. Get it?

The elbow method wont return k=49 after having returned k=7 on the same sample. But, I can see some situations, where the returned k might differ if we introduce randomized initial configuration of the k-means algo. However setting the random_seed to a fixed number solves that "problem" (its not a problem, it just introduces some randomness into the data analysis, which is not even a bad thing imo)
One time the algo gives you optimal value of k = 7 and other times it gives you k = 10.
This is not arbitrary, its not a problem with the concept of race either, its also likely due to how the k-means algo is setup. Randomness does not suddenly introduce any human element to it either.

Are you of the misconception that race realists believe that there exists a fixed number of races? This is not the case. No one holds that position.

That does not change the fact that your data, the data you cited, with your first link, mostly came from studies which used STRUCTURE.

Are you conflating me with someone else?
I simply want to argue that race is real. You have been failing so far to deal with any argument that I've put forth and you have come to this debate underprepared, showcasing poor knowledge/fundamentals of the underlying algorithms and possibly a mental barrier where you conflate "k clusters correspond to k racial groups" with the incorrect view that we "choose k racial clusters and algo returns our chosen k racial clusters" which is not what is happening. This could also be a simple misunderstanding that you have of how the algo works.

You didn't address the issue of the taxonomic nonexistence of the race concept.

Please reformulate it then, because I fail to see how I haven't dealt with this. The same objections that you're using against race can be used to deconstruct the concept of species.

You then again claim that k is not arbitrary, but fail to recognize that a human told STRUCTURE, or your analysis package in Python, how many clusters to find.

No, it didn't.

Could you explain what you mean by "level of magnification"? Is that a structure specific term

At best, you can arbitrarily choose a specific scale where you see 3 clusters, and run an algorithim that will show you 3 clusters.

This is against all laws of data analysis.
Please watch this introduction video:
https://youtu.be/fSytzGwwBVw
And then this video:
https://youtu.be/EuBBz3bI-aA

If you view your data and then decide your parameters based on this, then you don't get an unbiased estimator. In this case your estimator will be very biased and we say that its "overfitting". You're choosing your model based on your data... can't do that.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (18 children)

Before I start, let us clarify, YOU brought STRUCTURE into this discussion, not me. It's your citation, not mine. The first link you shared uses data from studies that used STRUCTURE as their analysis package. I'm surprised that you freely admit that you didn't know about it and are only reading about it now, because it was the very first thing that you brought into this debate.

Did you read the article I gave you that described how we can estimate the best K to choose?

Yes, but you weren't the first one to show it to me. That's why I asked you to link to it in my first post :)

The clusters are not arbitrary opinions. The number of clusters is arbitrary and must be chosen somewhat subjectively, although we can pick an optimal k using 1st year undergraduate methods.

Well of course the computer doesn't come up with the clusters arbitrarily. It is done through machine learning. Yes, the number of clusters is arbitrary, as you acknowledge here. But there is no such thing as an 'optimal k' outside of a specific question. Again, clusters do not exist in reality. They are scientific tools that allow us to answer specific questions and test specific hypotheses. You are seeing 'optimal' and making the mistake of assuming that 'optimal' means 'correct.' It doesn't mean that. It means 'optimal for the parameters of our research question.' There are as many optimal values for k as there are different ways you can meaningfully analyze the data with different k values. This is how clusters work.

I am noticing a pattern here--you use your space to explain something, then sneak in a (willful?) misinterpretation of what I said, and hope that I let it go unnoticed. It hasn't worked so far, and it isn't going to.

If you pick 2 clusters, the algo will not give you "arbitrary opinions" as you wrote. Instead it will provide the 2 clusters that best split the data.

Here's your problem. You say that the algorithim does not give you arbitrary opinions. However, I have a very good friend who can debunk you right now. I will quote him directly: "The number of clusters is arbitrary" My friend is very smart and you stand no chance of defeating him. In his deep wisdom, he acknowledges that the number of clusters is arbitrary. That assumption, that arbitrary nature, follows the rest of the analysis. It is rooted in something arbitrary. Try to tell a peer reviewer, "Ok, yes, I know I chose the initial value arbitrarily, but I promise, the analysis which proceeded from that arbitrary value is NOT arbitrary!" It is by definition arbitrary. If you want to escape that, you MUST find a non-arbitrary way to determine your original number.

Please pay attention here, because you seem to have missed this point several times now. We are NOT telling the algorithm to create K racial groups!!! We are telling the algorithm to create K clusters from the genetic data. This is a VERY important distinction.

It is a completely unimportant distinction. I am pretty sure that you are again going to try to cross the boundary from science into your own personal opinion of how many races there ought to be, fail to justify why that number is correct, and hope that I don't notice that you just spouted a bunch of 101 cluster analysis stuff that you found just now on Google, all so that it would seem more legitimate when the science suddenly vanishes out window like a stale fart.

Why? Because if racial groups were pseudoscience, then we would NOT expect the algorithm to return K clusters that align almost perfectly with our preconceived ideas about racial groups!!

Lol. Thanks. Really, I'm not psychic, I just know you already. I like you! Always have. What are our preconceived ideas about racial groups? You keep talking about these preconceived ideas over and over again. Preconceived ideas. Preconceived ideas. We have preconceived ideas. What ideas?? I asked you in my last post and you ignored the question. WHAT ARE YOUR PRECONCEIVED IDEAS ABOUT RACE? WHAT ARE OUR PRECONCEIVED IDEAS ABOUT RACE? Your preconceived ideas are not likely to be the same as mine. There are dozens, scores, hundreds of preconceived ideas of how many races there are. Sure, we tend to make small lists, but we are humans. We make small lists of everything. Small lists of gods, small lists of types of foods, small lists of animals, small lists of races.

But I know the answer already. By 'preconceived ideas,' you mean, quite specifically, the ideas of racialist thinkers of the 19th and 20th centuries, and their intellectual descendents--or, rather, you THINK you mean that. You don't know what they actually said. And hoo boy, buddy, I'll tell you what--you know I'm strong on genetics and cluster analysis. I know you are smart enough to recognize that no matter how much you pretend to call me uninformed. But I'm equally informed on racialist thought in the 19th and 20th centuries. That's where a lot of the anthropology comes in.

What preconceived ideas? the preconceived ideas of Thomas Huxley, of E.B. Tylor, of Blumenbach or Linnaeus or Meiners? Let's talk about their preconceived ideas about race. Let's name some races. Anglo-Saxon (dark & white variety), Teuton, Laplander, Fin, Sarmatian/Slav, Hindu, Celtic, Nord, Assyrian, Chaldean, Mede, Scythian, Parthain, Philistine, Phoenician, Jew (Jesus is white), Georgian, Circassian, Mingrelian, Armenian, Turk, Persian, Arabian, Afghan, Egyptian, Abyssinian, Guanche. Whew!! We have barely even covered any geography, and we have a score or more races!!

British physiologist William Lawrence, one of the most important of the racialist thinkers, wrote: "The Caucasian variety encompasses numerous races." Is this the preconceived idea of race you had in mind? I have a feeling that you disagree with Lawrence. Will you classify Caucasian as a race? No? European, then? Preconceived ideas, indeed! I think Lawrence is on to something.

The list goes on. There has NEVER been cohesion about 'preconceived racial ideas.' There is a snapshot in time, right now, where you believe there is some kind of unity of thought on this subject. There is not, and what scant unity you might try to point to unravels completely just 100 years back. Nevermind 200. In fact, for most of Western history, 'race' was primarily wielded as a proxy for distinct kingdoms, a form of crude propaganda which tries to invoke phenotypic differences to drum up nationalistic fervor amongst a populace that was usually closely related to the enemy. This is exactly what your flair means. Your enemy is the Jews. Jesus was white. Jesus was a Jew. Your enemy is your fellow whites, just the ones who have more money and won't share it with you.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Before I start, let us clarify, YOU brought STRUCTURE into this discussion, not me. It's your citation, not mine. The first link you shared uses data from studies that used STRUCTURE as their analysis package. I'm surprised that you freely admit that you didn't know about it and are only reading about it now, because it was the very first thing that you brought into this debate.

No I didn't.

Again, clusters do not exist in reality. They are scientific tools that allow us to answer specific questions and test specific hypotheses. You are seeing 'optimal' and making the mistake of assuming that 'optimal' means 'correct.' It doesn't mean that. It means 'optimal for the parameters of our research question.' There are as many optimal values for k as there are different ways you can meaningfully analyze the data with different k values. This is how clusters work.

"It means 'optimal for the parameters of our research question.'", Yes and when our research question is how do we best group people based on their genetics and it spits out by using 7 clusters that just happen to correspond to racial groups, then that answers our research question pretty well.

Here's your problem. You say that the algorithim does not give you arbitrary opinions. However, I have a very good friend who can debunk you right now. I will quote him directly: "The number of clusters is arbitrary"

Yes, the algorithm does not give you arbitrary opinions. As I said in my previous post, the number of clusters is arbitrary. Your friend is right about that.

That assumption, that arbitrary nature, follows the rest of the analysis. It is rooted in something arbitrary. Try to tell a peer reviewer, "Ok, yes, I know I chose the initial value arbitrarily, but I promise, the analysis which proceeded from that arbitrary value is NOT arbitrary!" It is by definition arbitrary. If you want to escape that, you MUST find a non-arbitrary way to determine your original number.

So first of all, as a general thing, this is factually wrong, because if you wanted to, you could train a simple deep layered network to optimize over an arbitrary input configuration. But that's more technical and not really relevant for our topic.
The arbitrary part of k-means is choosing k, it is not arbitrary how the clusters are made. The clustering wont suddenly put Khoisan next to Swedes or cluster English/Aboriginals together + French/Paraguans together. It is the number of clusters that are arbitrary but not the clusters themselves.

Try to tell a peer reviewer, "Ok, yes, I know I chose the initial value arbitrarily, but I promise, the analysis which proceeded from that arbitrary value is NOT arbitrary!" It is by definition arbitrary

Thats standard practice in data analysis.

It is a completely unimportant distinction.

No its not. Until you understand this distinction you will never understand why you are mistaken. This is why I told you to pay close attention. This distinction is the basis of your misconception. It either stems from you not know how k-means clustering works or from you having a very low IQ and having trouble understanding how the distinction matters. In the post down below you write "I've barely made any reference to hair, eyes, nose, etc. You are the one who keeps bringing that up, because you are used to people raising those attributes to try and attack the idea of race. But I'm not doing that. These aren't the arguments I made." but this is exactly why this distinction is important.
Why do the K means clusters align with our concept of race (puts english, swedes, french together in 1 cluster, puts khoisan, bantu in another, puts chinese, koreans in another)? If race was a bogus concept we would expect the clusters to put individual swedes in the different clusters, the french mixed with the koreans, other koreans mixed into a cluster with a subset of some africans and so on.
But it is not randomly throwing individuals together into clusters, it neatly puts them into racial clusters. This is the killer argument and before you understand the distinction, you will forever be mistaken.

What are our preconceived ideas about racial groups?

Things like Swedes are genetically similar to other Swedes. Whites are genetically similar to other whites and so on. You could break up Sweden and find that some subset of Swedes are closer together than other subsets of Swedes.

And hoo boy, buddy, I'll tell you what--you know I'm strong on genetics and cluster analysis

You are definitely not strong on cluster analysis. I've found no faults with your understanding of genetics so far. However with cluster analysis you give off the vibe of someone very uneducated.

There are dozens, scores, hundreds of preconceived ideas of how many races there are.

You seem to be stuck on the misunderstanding that any race realist thinks there is a FIXED set of races. You're debunking a position that literally no one has. We don't care if you set k=3 or k=7 or k=21. They all show a clustering of racial groups and thats what race is. Swedes together, French together, Bantus together etc.

Let's name some races. Anglo-Saxon (dark & white variety), Teuton, Laplander, Fin, Sarmatian/Slav, Hindu, Celtic, Nord, Assyrian, Chaldean, Mede, Scythian, Parthain, Philistine, Phoenician, Jew (Jesus is white), Georgian, Circassian, Mingrelian, Armenian, Turk, Persian, Arabian, Afghan, Egyptian, Abyssinian, Guanche. Whew!! We have barely even covered any geography, and we have a score or more races!!

The brilliant part is that if we cluster individuals from these groups, they will likely cluster together in the same manner as we humans cluster them. Laplanders together, Teutons together, Chaldeans together etc. If some groups cluster with other groups, its likely due to close proximity / racial mixing between the groups or a shared ancestry.

"The Caucasian variety encompasses numerous races." Is this the preconceived idea of race you had in mind?

Yes

This is exactly what your flair means. Your enemy is the Jews. Jesus was white. Jesus was a Jew. Your enemy is your fellow whites, just the ones who have more money and won't share it with you.

Let's keep the discussion on topic.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

If racial groups were pseudoscience, we might find that the algorithm would return K clusters that happens to correspond to hair type groups, or eye color groups, or nose length, or height, or IQ, or whatever random group you might think of.

I've barely made any reference to hair, eyes, nose, etc. You are the one who keeps bringing that up, because you are used to people raising those attributes to try and attack the idea of race. But I'm not doing that. These aren't the arguments I made. Again, you are doing what you know. You are responding to the arguments you know how to deal with, not the ones that I am raising. I'm not thinking of any random groups. That's what you're thinking of. I never said race does or should correspond to these attributes, in fact I mocked that idea in my last post. I said that caring about those things first is monkeybrain stuff, the obvious externalities that we evolved to notice so we could make quick and dirty approximations of who we are likely related to and who we are likely not related to.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

This post is exactly why you are mistaken. You misread what I wrote.

We DONT find that the K clusters correspond to hair type clusters
We DONT find that the K clusters correspond to Eye color groups
We DONT find that the K clusters correspond to nose length
We DONT find that the K clusters correspond to whatever

We DO find that the K clusters correspond to RACIAL GROUPS

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (14 children)

But to answer your actual point, why on earth would we expect k clusters to align to those attributes?? As I just explained, WE care about that stuff. WE notice that stuff. To a computer, a gene is a gene. A gene that makes a blue eye is no more or less a gene than one that is not expressed. We definitely would not expect k clusters to align to those attributes in ANY case, whether race is psuedoscience or not.

Whoever was using this argument you are responding to, I agree that it isn't a great one. That's why I don't use it.

But AGAINST ALL ODDS, the neutral algorithm returns K clusters that just happens to correspond to our racial groups! This is a wild coincidence.

•what racial groups •what racial groups •what racial groups

Please tell me what these racial groups are. Please tell me what our 'preconceived ideas about race' are. You keep saying this, as if this is some information we all know. It isn't. White people in the 19th century didn't know this. The world doesn't 'know' this. I don't know how many different ways I can ask you to clarify here, but you keep ignoring me. Again, though, I know why. You are hoping I gloss over this. You know that being put on the spot and asked to say precisely what 'our preconceived ideas about race' are opens up a biiiig can of worms, and you do not want to put yourself in an even more defensive position where it will be even easier for me to poke holes in your arguments. I mean, I readily admit that I have the high ground here. You are the one claiming a positive position: "race is. Race is, this. Our ideas are, this." All I have to do is nitpick. I don't need to be right, all I have to do is show that you aren't right.

I could write the clustering algorithm that you're using from scratch, it is nothing special

I'm sure you could. I presume you are a professional or at least a serious hobbyist programmer, and I sure as heck know you aren't a geneticist or a researcher who does cluster analysis.

I don't know if this is a case of low IQ or you just not being familiar with how the k-means algorithm works.
https://youtu.be/HVXime0nQeI
https://youtu.be/4b5d3muPQmA

And there it is. I see this as the moment where I won the debate, actually. You may recall, my first post in this thread was a reply to someone who descended into calling another poster a retard. In my reply, I asked, "Are you going to call me a retard now as well?" And here, you just did. The very first point I made in this thread is that when you opt to just call someone a retard, it can look quite like you are just frustrated because you are losing.

I know you aren't going to change your mind, not tonight. That isn't how these debates work. But I already know that you know I am not stupid. I know you aren't stupid either. I already did get through to you--you are not going to change your mind, I'm not a miracle worker, but you are going to be more careful with your wording in the future. You'll refine your arguments. See, I know so much about you. You feel that your intelligence is not sufficiently appreciated. And it isn't. You want to weigh in on social issues, many of which you have vastly superior answers to than the majority of society, and yet you are silenced on grounds outside of your control. So you throw your lot on with Race and Country to reclaim some of that value that is owed to you. But the truth is, man, you're a mongrel. Europeans were already mostly mongrels in the 20th century. You're no shining beacon of Whiteness. You're a smart guy who works a job he mostly dislikes, desperate to find some secret truths that validate your intelligence, because society refuses to do so. I fucking know you, man. Of course I know you. When was the last time you even wrote back and forth this much with someone? We're basically penpals now. I'm an anthropologist, I've been reading between the lines this whole time. I know exactly who you are.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

But to answer your actual point, why on earth would we expect k clusters to align to those attributes?

Unless the clusters are totally random, then we would expect the clusters to align to SOMETHING. Those attributes were just examples of what the clusters DONT align to. You are free to be creative and find other attributes, phenotypical or genotypical or otherwise that the clusters DONT align to.
The interesting part is that AGAINST ALL ODDS, the clusters align to race.
Ask yourself the opposite question (of the quoted), why would we expect to find that the k clusters align to race if race was a bogus manmade concept that had no relation to biology? How crazy is it that the k clusters just happen to correspond perfectly to our bogus manmade concept? That's like 1 in a qutrillion. And not just once? Every time we run the algo and no matter the initial conditions. Definitely a weird coincidence that keeps repeating itself.

Please tell me what these racial groups are.

The idea that we can cluster human beings into clusters of genetic similarity. A swede wont be more similar to a Gambian than he is to another Swede. You might find examples of "Swedes" being more similar to say Danes than other Swedes but this is due to the color spectrum fallacy.

And there it is. I see this as the moment where I won the debate, actually. You may recall, my first post in this thread was a reply to someone who descended into calling another poster a retard. In my reply, I asked, "Are you going to call me a retard now as well?" And here, you just did. The very first point I made in this thread is that when you opt to just call someone a retard, it can look quite like you are just frustrated because you are losing.

No. You display a repeated misconception of how the kmeans algorithm works when it does its clustering. This is why I sent you two introduction videos so that you may educate yourself. This would help you correct your misconception and help you understand my argument. Because right now you do NOT understand my argument.
This could either be because you do not understand how kmeans works or because you have a low IQ. I assume, and I really hope, that it is because you don't understand how kmeans works. If its an issue with IQ, then we will be stuck.
You might see this as a winning moment but I am merely responding in kind. If you do not want this kind of response, then cut out your arrogance. I have also told you to do this or I would respond in kind. And I will continue to do this as long as you continue to this.

Your first response to me showed to me that you lacked basic knowledge of this topic and to be honest I felt that it was a loss. I was hoping to learn something new that I had not considered, or gain a new insight or learn something new. I am sure that you have knowledge that I don't know that can help broaden my perspective, but before we reach that, you will have to humble yourself and start listening to the arguments I am putting forward, because you are not listening or not understanding them. If it is a lack of understanding, watching a few videos could help sort out that misunderstanding and we could move on from there onto more interesting insights and angles. This is 101 stuff of cluster analysis that you are not understanding and its a shame.

But I already know that you know I am not stupid. I know you aren't stupid either.

Yes, but someone can be low IQ and not be stupid. Most people that aren't stupid have a sufficiently high IQ and assume you to be high IQ too, and that you just lack knowledge in certain aspects of cluster analysis which leads you to a misunderstanding of how something works.
You seem to have the misconception that its a given that the kmeans clustering algo returns k racial clusters but its not a given. We didn't ask it to give us RACIAL clusters. We just told it to give us K whatever clusters and it just happend to return whatever=racial. Why didn't it give us K eye color clusters? Because human beings aren't clustered by eye color, but by race.

I know exactly who you are.

You really don't. You were right about one thing. I haven't discussed this topic in a long time, because 1: its already settled science and 2: its a banned topic on social media so literally impossible to have back and forths about. You're an anthropologist? The anthropology subreddit has this as their rule 3:
""Race realism", "human biodiversity", conspiracy theories, and pseudoscience will be removed as will any other content that is incorrect or not supported by reputable scholarship."
So yeah, its hard to challenge someone with expertise on this subject when its a banned subject.
Like you told another user, you wouldn't want to dox yourself, because just having a debate with alt righters would put your career in jeopardy, well, its the same for me, so you might understand why I don't regularly have a back and forth like this.
I used to, back before social media started mass censoring taboo subjects.

Europeans were already mostly mongrels in the 20th century.

Oh, noo!

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (12 children)

But, just so you don't accuse me of trying to obscure the poont, I'll answer: your point... it's moot. You debunked yourself already. Yes, the k value is still arbitrary. You admitted that. You admitted that the choice of the number is arbitrary. That proceeds into the analysis. Once one arbitrary distinction is made, the entire analysis remains, at least to that extent, arbitrary. It doesn't necessarily become MORE arbitrary during the application of the algorithim, unless other arbitrary assumptions have been woven into the code. And just because an analysis contains an arbitrary component does not mean it is incorrect or useless. But it at least maintains the arbitrary quality from the first decision of the number, as you already said.

Once an element of arbitrary decision making enters your analysis, the entire analysis remains subject to challenge on those grounds, even if no additional arbitrary decisions are made.

This isn't the night before your stats final in junior year, bro. I'm a geneticist.

I did watch your videos, because I know what's up here. You've gotten desperate. When people get desperate, they mess up. At this point in the debate, I know these videos will back ME up more than you. So let's dig in.

YOUR source says: "In this case the data make three relatively obvious clusters. But rather than rely on our eye, let's see if we can get the computer to identify the same three clusters."

Sounds familiar! I refer you to my previous post:

Its intended function is very much like an ANOVA or MANOVA. It is a confirmation test, a way to say: "Hm, I am pretty sure I see 3 clusters here, at this scale, and I do indeed want to use 3 clusters in my analysis. However, I worry that if I eyeball it like this, the peer reviewers will take issue with that. What I can do instead is use this algorithim to confirm that, at this scale, the computer also sees 3 distinct clusters. It might seem obvious, but this way, my reviewers won't be able to chastise me for eyeballing the chart. It is obvious that I see 3 clusters here, but this is just one little test I can use to not make it seem like I am choosing to see the 3."

Damn. That's exactly what I effin' said. YOUR source is on MY side! This is a CONFIRMATION test designed to help you double down on your assumption! You don't even use it until you have ALREADY decided that you want to see x number of clusters! It literally says it, right here, in YOUR source!

It goes on: "Step 1: select the number of clusters you want to identify in your data." That I want to identify? That doesn't sound very scientific. That sounds like a fancy way to eyeball something. Which is exactly what I said in my last post, before you shared this video. And exactly what the video says. Hell, they even reference eyeballs, too. I could not have asked you to send a source that is more damning to your own position and more supportive of mine

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Once an element of arbitrary decision making enters your analysis, the entire analysis remains subject to challenge on those grounds, even if no additional arbitrary decisions are made.

Well put. Its open to challenge on those grounds. On those grounds. You are free to attack that we have 7 clusters instead of 5 but its not really relevant to the concept of race.
Common misunderstandings, often intentionally so, consist of 1: Races are genetically distinct and 2: Fixed number of races
Neither of those opinions are held by any race realist. But in every study that tries to debunk race, they debunk those points or a variant of those points. Points that no one holds.

YOUR source says: "In this case the data make three relatively obvious clusters. But rather than rely on our eye, let's see if we can get the computer to identify the same three clusters."

This is an introductory video. He is showing you how the algo works. He is showing you that it works.
You don't want to actually eye ball it and then determine the number of clusters. Like.... in most data analysis we work with thousands or millions of dimensions and you can't exactly "eye ball" that.
If you ran the elbow test, you would also find that k=2 is worse than k=3 and that k>3 does not provide any meaningful improvement.
The whole point of these algos is to not rely on our eyes but to let the computer cluster high dimensional data (like genetics)

You don't even use it until you have ALREADY decided that you want to see x number of clusters! It literally says it, right here, in YOUR source!

Yes, but I think maybe you are stuck on the assumption that race realists care about a FIXED number of races? You can put x as 3, 5, 7, 23 if you want. Swedes wont be put into the same cluster as Ethiopians, while Danes are put in another.

I could not have asked you to send a source that is more damning to your own position and more supportive of mine

What I had hope that you got from the video was that we chose the number of clusters, not HOW it clusters. "HOW" here being that it just so happens to cluster on RACIAL grounds, not any other feature/attribute/whatever.
The algorithm doesn't even know that it is looking at genetic data. It could've been about finance and it thought that it created clusters that describe different spending habits. Its just getting numbers and returning some clusters.
To our surprise these k clusters just so happen to be RACES.

This is me repeating the distinction hoping you at some point will start to understand that the distinction is important.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (10 children)

Now, this is medicine. About cancer, specifically. You might not recall since you haven't watched this since junior year before that test, but it's ok to make arbitrary assumptions when we have a purpose. If it helps to cure cancer, what does it matter if we assumed some clusters that might not be totally accurate? We're trying to preserve life here. If you made THAT argument for race, a social, non-scientific one, the anthropologist in me would have more sympathy. Show me that at the end of the day it is best for everyone to see things your way, but don't try to claim the mantle of science when it doesn't fit you.

Of course, this analysis COULD be wrong. What if there is a 4th cancerous cluster, a small one, just forming, that the assumption of 3 misses? The chemo might miss it. The surgeon might miss it. This hypothetical person could literally die due to a false assumption.

This entire video is rooted in the premise that we have a goal: identify cancer cells. If we fail to identify cells, that is bad. A bad-fit model fails to identify the cells. A perfect model identifies all of them. The correct number of clusters depends on our question, in this case, "How many cancerous cells are there, so we can kill them?" If we apply this logic to race, there is no clear goal. You keep talking about 'preconceived ideas' that you refuse to define. But there are so many different kinds of preconceived ideas about race. There either IS or IS NOT a cancer cell. It is not social. But 'preconceived ideas about race' ARE social. In order to find the reality of it, we must ask WHY humans form ideas about race, how they go about doing it, what parameters we use to do it, how those parameters have changed over the development of society, and how our criteria for racial classification evolved in the ancestral environment. You don't seem interested in any of those scientific questions. You just want to stop at 'preconceived ideas about race' because that is what is important to you, not the scientific method.

By the way, StasQuest is run by the genetics department at UCNC. Josh seems like a smart fellow. If you came to Josh and asked him about race, what do you think he would say? I mean, it's an academic genetics department. They tend to be pretty woke. Do you think he would agree with your assessment of race here?

Let's look at your first video now (I watched them out of order): "Step 1: Start with a dataset with known categories."

Well, fuck. Sorry, man. Step 1 knocked you out of the game again. KNOWN CATEGORIES. So we are supppsed to come into this KNOWING the categories at STEP ONE, so says your source. And yet you have been speaking this whole time as if the analysis will GIVE us the categories. See, this is also what I said in my last post. I knew you had to try and shoehorn in 'preconceived ideas about race' that you refuse to define, because you know that this analysis requires KNOWN CATEGORIES at STEP ONE. You can't get this stuff past me, man.

So, the point of the video is to try and classify an unknown category based on a known category. So, presumably, you share this to try and argue that if we KNOW person A is African, we can determine the racial category of person B who happens to be a neighbor of person A in the cluster analysis. But don't you see them problem? You already classified person A before you started. You didn't arrive at their race through the analysis, you presumed that you have some other means of knowing that 'African' is their correct racial designation. Which is all well and good--probably, most of us would agree about what an 'African' looks like, as long as we don't include white South Africans, Egyptians, Morroccans, Mauritians, etc. etc. That is a lot of exceptions. But we are walking into this with the categories in-hand. As step 1. Not determined by objective scientific analysis. And I didn't even invoke the k problem, which persists in this video. Specifically, Josh points out that if you chose a k of 1, it will classify based on 1 neighbor. If you choose a k of 11, it will classify based on 11 neighbors. Josh is a great educator! I already knew this, but it is so nice to have him debunk your arguments for me. Wait, did you share these videos, or did I??

Thanks for the homework. It does come across like you are trying to bully me into not replying by just regurgitating links at me because you can't make your case effectively, but I like refreshers so I enjoyed watching them.

Assuming its not an issue of low IQ (because then we can keep going back and forth forever),

I know. There are three ways to 'win' online like this. One, someone gets tired of typing and leaves. That isn't much of a victory though. Two, the loser gets frustrated and starts calling the other person a retard with a low IQ. Also not satisfying. Three, and I know I already won here, the loser realizes that they have to refine their arguments. They don't change their mind, not yet, but they are more careful with their words the next time around, or perhaps avoid the debate entirely for fear of getting spanked again.

we don't tell the algorithm to give us the racial clusters. We tell the algorithm to give us K clusters. And these K clusters HAPPEN to be the racial clusters.

That's precisely the problem. You are saying that you know the clusters in advance. You know what the races are in advance. You are starting from the premise, "I am correct. I am right. Let's use this program to find out more about my correct answer." This is the definition of circular logic. You are wrong before you even started! YOU DO NOT KNOW THE CATEGORIES! You won't even tell me them, even though I keep asking!

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Of course, this analysis COULD be wrong. What if there is a 4th cancerous cluster, a small one, just forming, that the assumption of 3 misses? The chemo might miss it. The surgeon might miss it. This hypothetical person could literally die due to a false assumption.

It doesn't really matter to the concept of race if there is a 4th race that we are not getting, because we used k=3

If we apply this logic to race, there is no clear goal.

People have done studies that try to align genetic clusters with self-identified race and they match like 98-99%, with the mistakes being boundary cases, which is to be expected, because the races are not genetically distinct but continous along a spectrum.

But 'preconceived ideas about race' ARE social.

That's true. It is interesting that our social concept of race align so perfectly with the biological reality.
And here I am not talking about 'race' as defined by sociologists where its mostly just a political label or a label that society racializes you into.
Because if humanists get to define all words, then sure, I will happily admit that race is a bogus concept right here and now and that race has no biological meaning. With these definitions that humanists use, race as a concept has been HEAVILY debunked by science. Irrefutably so.
The only problem is that no one ever claimed that debunked position.
Whats not debunked is the position of race realists.
The simple fact that you can use kmeans on genome wide population data and get racial clusters, 5 clusters, 3 clusters, 7 clusters, x clusters, is proof that race is real. And I will repeat myself again, because you failed to understand the distinction. We can use k-means to get k racial clusters - this proves that race is real, because kmeans does not return k racial clusters, it returns k clusters. The fact that these clusters are RACIAL clusters and not EYE COLOR clusters or SPENDING HABIT clusters or whatever else clusters is the proof.

By the way, StasQuest is run by the genetics department at UCNC. Josh seems like a smart fellow. If you came to Josh and asked him about race, what do you think he would say? I mean, it's an academic genetics department. They tend to be pretty woke. Do you think he would agree with your assessment of race here?

StatQuest is hilarious and he is better at explaining some concepts than a lot of professors or textbooks. I always recommend people to watch his videos because he breaks every concept down in ELI5 formats. Would he agree with me on race? I heavily doubt it

Well, fuck. Sorry, man. Step 1 knocked you out of the game again. KNOWN CATEGORIES.

You really should've watched these videos to learn and not to win a debate. Ok, so the video K-nearest neighbors is one of the simplest clustering algos and you start with known categories and then you see if an unknown person is nearest to whichever cluster. The Kmeans algo is slightly different but builds on the same idea and your program STRUTUCE uses a variant of the Kmeans algorithm.
In Kmeans you don't start with KNOWN CATEGORIES, but rather (usually) random initialization of the "center" of each k cluster.

That's precisely the problem. You are saying that you know the clusters in advance. You know what the races are in advance. You are starting from the premise, "I am correct. I am right. Let's use this program to find out more about my correct answer." This is the definition of circular logic. You are wrong before you even started!

No, kmeans doesn't use KNOWN CATEGORIES, so your argument is moot. This also shows me that I was completely right. You DID NOT know the fundamentals of how your program STRUCTURE does the clustering. STRUCTURE does not use k nearest neighbor (knn). I shared that video because I wanted to teach you the basics. Knn is a prerequosite for kmeans. I hoped to build your knowledge up in cluster analysis, because this is basics and it would help you understand my point and help you apply your knowledge of genetics more appropriately, I am certain that once you get the fundamentals of how clustering actually works you can then expand further and provide new insight to me using some knowledge where you have more expertise than I do.
I'm trying to learn and I am humble enough to realize that you likely know some stuff better than me, but you seem unable to humble yourself and the result is that you come off as arrogant and uneducated. Like in this comment you're heavily trying to debunk my arguments about your own program and you don't even know the basics. You confuse knn with kmeans.
I'm not really annoyed because I've come to expect this kind of behaviour from people in your camp of the debate. You've been taught that we are stupid and that you know it all and then your camp has outlawed all discussion on the topic so that your view is never challenged.
When your type then gets into a debate with one of us, you're misrepresenting our views (because you've been lied to by your educators) and you're debunking concepts that no one holds and you're displaying an extreme lack of knowledge on the subject, often citing 50 years old fallacies. I'm not saying you are doing all those things, but this is what your type usually does and thats why I have come to expect this behaviour and is more tolerant towards your admittedly rather nasty & unmannered behaviour.

With that said I am pleased that you're not resorting to the 50 year old fallacies of "more within than between" that 99% of students are still taught in class. Got to uphold that pseudoscientific narrative, eh?

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (7 children)

More from your source: "Low values for k (like k=1 or k=2) can be noisy and and subject to the effects of outliers. Large values for k smooth over things, but you don't want k to be so large that a category with only a few samples in it will always be outvoted by other categories."

Well, that's convenient. You keep saying that the clusters match our 'preconceived ideas about race,' but there are quirks in this method of analysis that make it not work well with low or high k values. So there is a sweet spot of values for k, which happens to correspond to the numbers of races that you like (even though you refuse to give me any of those numbers) and do not correspond to the values for k which you dislike. To be clear, there is no fundamental reason why a low or high k value would not be the best fit for the model. With a perfect dataset, with perfect computing power, we would be able to eliminate nearly all noise. It just so happens that the limitations of the method do not work very well with low or high k values, so you are going to reject low or high numbers of races out of hand. You are, as programmers often do, letting the algorithim do the thinking for you. What if your tool is not fit for the job? All you have is a hammer, so your problem looks like a nail.

It returns 7 clusters that nicely correspond to 7 racial groups.

Is this it? Are you finally going to share these mysterious 'preconceived ideas about race' with me? Or are you only using 7 because I used 7 earlier? Please tell me what these known categories are. I certainly don't know them. The racialist thinkers of the past didn't know them. I doubt most people you ask would be able to guess precisely what your preconceived ideas are, nor would you be able to guess theirs.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

Well, that's convenient. You keep saying that the clusters match our 'preconceived ideas about race,' but there are quirks in this method of analysis that make it not work well with low or high k values. So there is a sweet spot of values for k, which happens to correspond to the numbers of races that you like (even though you refuse to give me any of those numbers) and do not correspond to the values for k which you dislike. To be clear, there is no fundamental reason why a low or high k value would not be the best fit for the model. With a perfect dataset, with perfect computing power, we would be able to eliminate nearly all noise. It just so happens that the limitations of the method do not work very well with low or high k values, so you are going to reject low or high numbers of races out of hand. You are, as programmers often do, letting the algorithim do the thinking for you. What if your tool is not fit for the job? All you have is a hammer, so your problem looks like a nail.

See? Building your fundamentals was the right thing to do in order to get us into a higher quality of debate.
I think you might raise a good point here that I would like you to expand upon if you so desire. I get the overall gist of your argument and it might hold some merit thats worth exploring some more.

Is this it? Are you finally going to share these mysterious 'preconceived ideas about race' with me? Or are you only using 7 because I used 7 earlier?

Yes, only using 7 because you used 7 earlier.
You could put 3 races and it would maybe return europeans, africans and asians. You could then put 4 and it would maybe return europeans, africans, asians and oceanians. Or 5 and it would include latinos/hispanics/indians. As you increase the number of clusters it will fine tune the races.
If you have 1000 samples and you put k=1000, then it would simply return each sample as a race, which is why its not very good with high k.
Likewise if you pick a too low k it will combine "clusters that ought to be" in weird ways. Again its worth repeating, since you didn't understand why the distinction earlier was important. Its not the number of clusters that is important, its the fact that the clusters correspond to RACIAL clusters that is important. We didn't tell the algo to find k racial clusters, we told it to find k whatever clusters, and these "whatever clusters" happend to be "racial clusters", swedes with swedes, gambians with gambians.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (5 children)

The elbow method wont return k=49 after having returned k=7 on the same sample. But, I can see some situations, where the returned k might differ if we introduce randomized initial configuration of the k-means algo.

No, you completely missed the point. You're tunnelvisioning, as programmers do (a strength and a weakness). I'm not saying that there is a stochastic element in machine learning. We all know that. I'm saying that there is no good reason not to run the analysis again on the 7 clusters and get several more clusters. If we are talking about cancer cells, we don't really have a good reason to do this, unless we have some reason to believe that there is a yet-smaller tumor to find (though I doubt you would simply redo the analysis on returned clusters for that, what with signal degradation). But we have many, MANY more races, historical and contemporary, to identify with repeated cluster analysis of 'known' races. It's not just 'African.' What of all the African subtypes? Dozens more races, all determined with a non-arbitrary k, right? Well, except for that it is arbitrary, like you admitted.

[–]DragonerneJesus is white 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

But we have many, MANY more races, historical and contemporary, to identify with repeated cluster analysis of 'known' races. It's not just 'African.' What of all the African subtypes? Dozens more races, all determined with a non-arbitrary k, right? Well, except for that it is arbitrary, like you admitted.

Yes! And that's whats so wonderful about using the computers to tell us how to best cluster the racial groups. We can start researching if Swedes really are that different from Danes or we can see if Fins did cluster with the mongoloid race like some Americans claimed 100 years ago and so on. We can start seeing how ancient races compare to modern races. Where ancient individuals cluster into modern races. Where modern individuals cluster into ancient races.
Things like this is just very exciting

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

Are you of the misconception that race realists believe that there exists a fixed number of races? This is not the case. No one holds that position.

Cop out. You refuse to answer the question because you know you can't. There are as many races as you want to see. Just because a bunch of people have decided that they refuse to answer a question because they can't, that doesn't mean it is a defensible position. It isn't. This is your 'turtles all the way down' moment. Scientists DO have an answer for how many species of sea cucumbers we have documented. They DO have an answer for how many subspecies of grey wolves there are. Why don't YOU have an answer? The rest of the scientific world isn't shy about this when it comes to taxonomic classification, but you have got cold feet all of a sudden.

Are you conflating me with someone else?

Nope. No conflation. I mean exactly what I said. Your first link contains data from studies that were conducted using STRUCTURE. They are landmark studies, often the first cited in these discussions, and cite them first you did.

I simply want to argue that race is real.

Of course race is real. I would never say something so ridiculous as 'race isn't real.' Race is one of the most consequential and painfully real things in the modern world, perhaps the single most consequential. But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules. In fact, with reference to those modules, in a way, race IS biology. Not in the way people think of it, as a real attribute of human population genetics, but as a little part of our brain that has evolved to see race wherever we look, because so far it has proved adaptive.

The same objections that you're using against race can be used to deconstruct the concept of species.

No, they can't. As I explained before, you are using the color spectrum argument that you already admitted you reject. I say that SPECIES is a legitimate taxonomic classification and SUBSPECIES is not. You say that my same gripes with subspecies can deconstruct species as well. This is identical to someone saying that the gradient of colors shows that there cannot be an actual yellow, and actual orange, an actual green. The existence of intermediaries does not disprove the existence of the discreet categories. Subspecies is an intermediary between 'species' and 'individual.' It is undefined in science, or, rather, it has so many definitions as to render it mostly meaningless outside of very specific bodies of literature. Are there glimmers of inconsistency in species categories? Of course. There are discrepencies between biological, phylogenetic, cladistic species, etc. That does not mean that the vague and undefined intermediary (subspecies) somehow deligitimizes the defined and specific category (species). That is the color spectrum argument. You said you disagreed with it (even though I never brought it up until you did), and then you used it to try and delegitimize the species concept.

No, it didn't.

Yes, it did. I have an excellent source for this. YOUR video. Didn't quite remember Josh saying that one bit, eh? ;)

I have been very generous in that I have willingly gone into the territory you chose, stats and ML, just to show that you will lose even on your home turf. But we have hardly even explored the anthropological assumptions in your argument. What are our preconceived ideas about race? How do you KNOW what the clusters are in advance? What is this information that you refuse to talk about? It is absolutely imperative to your argument. You keep saying it over and over, so obviously it is important. What are our preconceived ideas about race?

[–]DragonerneJesus is white 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

Cop out. You refuse to answer the question because you know you can't. There are as many races as you want to see. Just because a bunch of people have decided that they refuse to answer a question because they can't, that doesn't mean it is a defensible position. It isn't. This is your 'turtles all the way down' moment. Scientists DO have an answer for how many species of sea cucumbers we have documented. They DO have an answer for how many subspecies of grey wolves there are. Why don't YOU have an answer? The rest of the scientific world isn't shy about this when it comes to taxonomic classification, but you have got cold feet all of a sudden.

I don't think its a cop out. You seem to be of the idea that we believe in a FIXED number of human races. Can you point to any modern geneticist that believe in race that says there is a FIXED number of human races? We can talk about the big ones like europeans, asians, africans, americans, oceanians but we can break each of these down into smaller races too.
The funny part about this is that just for entertainment, lets say I said we have 6 human races and then you run k=7 and show me that we now have 7 human races LOL, it wouldn't disprove the 6 human races. I will repeat it until you understand your misunderstanding: Its not the number of races thats important, its that the algo returns k RACIAL clusters. We didn't ask it to return RACIAL clusters. We asked it to return whatever clusters and it somehow chose to return RACIAL clusters. Out of all the millions of ways it could cluster human beings, it chose down racial lines. What a coincidence.

Nope. No conflation. I mean exactly what I said. Your first link contains data from studies that were conducted using STRUCTURE. They are landmark studies, often the first cited in these discussions, and cite them first you did.

My first link was the medium article about choosing the optimal k....

Of course race is real. I would never say something so ridiculous as 'race isn't real.' Race is one of the most consequential and painfully real things in the modern world, perhaps the single most consequential. But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules. In fact, with reference to those modules, in a way, race IS biology. Not in the way people think of it, as a real attribute of human population genetics, but as a little part of our brain that has evolved to see race wherever we look, because so far it has proved adaptive.

I hope you're not referring to the sociology concept of race, where blacks = people with black skin from africa, southern india, australia, south america, because black skin != same race.
If this is not what you meant and I understood you correctly, then this is actually perfect, because this makes us able to translate from your paradigm into my paradigm. It opens a communication channel where we speak the same language.
"But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules."
This is how we humans classify races. This is my preconceived ideas about racial groups. (Obviously a normie on the street wont have as good a classification as someone who works with different human populations. The race realists before DNA was discovered made a lot of different classifications of human races.

We can compare this with the biological reality.
"Not in the way people think of it, as a real attribute of human population genetics"

The clustering algorithms can help us see if our preconceived classifications match the clustering of genetic population data. If it does, then our preconceived classifications were correct, in the sense that it had a biological/genetic basis. If it does not, then it is evidence to support the hypothesis that the preconceived classifications do not have a biological/genetic basis.
We have found that they DO match with the biological reality.

Subspecies is an intermediary between 'species' and 'individual.'

'species' is an intermediary between 'animal' and 'individual'

Are there glimmers of inconsistency in species categories? Of course. There are discrepencies between biological, phylogenetic, cladistic species, etc.

Yes and how do you choose which type of "species" to use? That is a very arbitrary choice! Oh no...
Is a Tiger and a Lion even different species if they can produce offspring together? Wow, time to eliminate the entire concept of 'species'. Of course not.
Just so I'm not misunderstanding you; you don't reject 'subspecies' as a concept, right? You're just contesting if human beings have races or not.

Yes, it did. I have an excellent source for this. YOUR video. Didn't quite remember Josh saying that one bit, eh? ;)

Please be able to have an attention span of more than 1 comment back. If you use an algorithm to pick the optimal k, then you did not pick k. This algorithm could be using the elbow method for simplicity.
And if you remember I explained how the optimal k might change when we run the algorithm once, because of the randomized initial conditions of the kmeans algo. Well we can use other data analysis tools to increase the chance of finding the optimal k to any arbitrarily high percentage. So if you want to be 99,9999% sure that you are using the optimal k, then you can run the "optimal k algorithm" as a monte carlo algorithm.
There will be a risk of 0,000001% or whatever percentage risk that you tolerate that the MC algo will return, say, 8 clusters instead of 7 clusters (which was actually the optimal k)

I have been very generous in that I have willingly gone into the territory you chose, stats and ML, just to show that you will lose even on your home turf.

The reason I've went here is because of two reasons. 1: you mentioned k in an earlier post (which I know is a bogus argument) and 2: in your first reply to me, you displayed that you lacked an understanding of how the clustering actually works in the program that you're using, which led to you reaching some false conclusions and derive some misconceptions about the clustering.
It seemed to me that you either didn't understand these fundamentals (which seemed plausible considering its not your area of expertise) or you understood the fundamentals but had too low IQ to rationalize about the implications. You took it as a "win", lol. Instead you should've taken it as an invitation to learn the fundamentals and correct your misconception so that we may reach a higher level of debate and gain new insights. I am absolutely certain that you too hold knowledge where you can school me, probably also 1st year undergraduate stuff that I just OUGHT to know, but simply don't because we don't have the same background.
Engaging with you is an opportunity for me to learn, hopefully, and also an opportunity for you to learn, unless you're closeminded and think you know it all, despite that clearly not being the case.

But we have hardly even explored the anthropological assumptions in your argument. What are our preconceived ideas about race? How do you KNOW what the clusters are in advance? What is this information that you refuse to talk about? It is absolutely imperative to your argument. You keep saying it over and over, so obviously it is important. What are our preconceived ideas about race?

As a starting point we could use some of the race realist classifications from 200-100 years ago, knowing that they're outdated, wars have happend, genocides have happend and so on, but still we'll expect them to be mostly correct if we account for some historical changes and admixture events over this time period.
We will find that many of the ideas will be wrong but that the overall idea was correct.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

This is against all laws of data analysis.

It isn't. The program will always give you the number of clusters you ask for. Your misunderstanding is that you are mistaking BEST FIT for TRUTH. The model is only ever there as a tool to help you answer your question. It doesn't represent reality. Actually, the bad fits are just as important as the best fits. They are negative results, scientifically speaking. None of this method ever leaves the realm of the experiment. It is always a hypothetical approximation of reality which presents a picture that is either more or less useful to answering your qustion. Science trumps data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

[–]DragonerneJesus is white 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

It isn't. The program will always give you the number of clusters you ask for.

That's not what I'm saying. Yes, it gives you the number of clusters you ask for.
The point is that you should not BIAS your choice of k by having inspected the data previously. The point of unsupervised learning is exactly that; being unsupervised. In this case its not the 'worst' mistake a researcher can do but its a bad practice.
You can run into situations where its perfectly acceptable to do, but thats not that relevant for what we're talking about here.

Your misunderstanding is that you are mistaking BEST FIT for TRUTH

Is this some epistomology argument? If so I'm not really interested in opening that can of worms. We can put it down on a note and take this subject up again when we're done with this debate.

Science trumps data analysis.

Science is data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

It will be very interesting to hear his response. Just don't spam him. A better approach would be for us to go back and forth, condense some points, figure out say 10 core arguments where we disagree, put it in a proper format and let him return back on those.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

This is the sociololy perspective, where race isn't biological but instead individuals are racialized by society into "politically convenient allyships". In a way, you can say that "one human race" is exactly that: a politically convenient allyship to usher in multiracial societies. But I'm sorry for getting political, so lets leave it at that and keep us grounded in biology and genetics and data analysis, not sociology.
I somewhat agree with the quote though, and I know that many in the alt right sphere definitely agrees with the quote.