you are viewing a single comment's thread.

view the rest of the comments →

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (4 children)

More from your source: "Low values for k (like k=1 or k=2) can be noisy and and subject to the effects of outliers. Large values for k smooth over things, but you don't want k to be so large that a category with only a few samples in it will always be outvoted by other categories."

Well, that's convenient. You keep saying that the clusters match our 'preconceived ideas about race,' but there are quirks in this method of analysis that make it not work well with low or high k values. So there is a sweet spot of values for k, which happens to correspond to the numbers of races that you like (even though you refuse to give me any of those numbers) and do not correspond to the values for k which you dislike. To be clear, there is no fundamental reason why a low or high k value would not be the best fit for the model. With a perfect dataset, with perfect computing power, we would be able to eliminate nearly all noise. It just so happens that the limitations of the method do not work very well with low or high k values, so you are going to reject low or high numbers of races out of hand. You are, as programmers often do, letting the algorithim do the thinking for you. What if your tool is not fit for the job? All you have is a hammer, so your problem looks like a nail.

It returns 7 clusters that nicely correspond to 7 racial groups.

Is this it? Are you finally going to share these mysterious 'preconceived ideas about race' with me? Or are you only using 7 because I used 7 earlier? Please tell me what these known categories are. I certainly don't know them. The racialist thinkers of the past didn't know them. I doubt most people you ask would be able to guess precisely what your preconceived ideas are, nor would you be able to guess theirs.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

The elbow method wont return k=49 after having returned k=7 on the same sample. But, I can see some situations, where the returned k might differ if we introduce randomized initial configuration of the k-means algo.

No, you completely missed the point. You're tunnelvisioning, as programmers do (a strength and a weakness). I'm not saying that there is a stochastic element in machine learning. We all know that. I'm saying that there is no good reason not to run the analysis again on the 7 clusters and get several more clusters. If we are talking about cancer cells, we don't really have a good reason to do this, unless we have some reason to believe that there is a yet-smaller tumor to find (though I doubt you would simply redo the analysis on returned clusters for that, what with signal degradation). But we have many, MANY more races, historical and contemporary, to identify with repeated cluster analysis of 'known' races. It's not just 'African.' What of all the African subtypes? Dozens more races, all determined with a non-arbitrary k, right? Well, except for that it is arbitrary, like you admitted.

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (2 children)

Are you of the misconception that race realists believe that there exists a fixed number of races? This is not the case. No one holds that position.

Cop out. You refuse to answer the question because you know you can't. There are as many races as you want to see. Just because a bunch of people have decided that they refuse to answer a question because they can't, that doesn't mean it is a defensible position. It isn't. This is your 'turtles all the way down' moment. Scientists DO have an answer for how many species of sea cucumbers we have documented. They DO have an answer for how many subspecies of grey wolves there are. Why don't YOU have an answer? The rest of the scientific world isn't shy about this when it comes to taxonomic classification, but you have got cold feet all of a sudden.

Are you conflating me with someone else?

Nope. No conflation. I mean exactly what I said. Your first link contains data from studies that were conducted using STRUCTURE. They are landmark studies, often the first cited in these discussions, and cite them first you did.

I simply want to argue that race is real.

Of course race is real. I would never say something so ridiculous as 'race isn't real.' Race is one of the most consequential and painfully real things in the modern world, perhaps the single most consequential. But it isn't a scientific concept. It is a social construct emerging out of the biological reality of our intuitive, cognitive racial-classification modules. In fact, with reference to those modules, in a way, race IS biology. Not in the way people think of it, as a real attribute of human population genetics, but as a little part of our brain that has evolved to see race wherever we look, because so far it has proved adaptive.

The same objections that you're using against race can be used to deconstruct the concept of species.

No, they can't. As I explained before, you are using the color spectrum argument that you already admitted you reject. I say that SPECIES is a legitimate taxonomic classification and SUBSPECIES is not. You say that my same gripes with subspecies can deconstruct species as well. This is identical to someone saying that the gradient of colors shows that there cannot be an actual yellow, and actual orange, an actual green. The existence of intermediaries does not disprove the existence of the discreet categories. Subspecies is an intermediary between 'species' and 'individual.' It is undefined in science, or, rather, it has so many definitions as to render it mostly meaningless outside of very specific bodies of literature. Are there glimmers of inconsistency in species categories? Of course. There are discrepencies between biological, phylogenetic, cladistic species, etc. That does not mean that the vague and undefined intermediary (subspecies) somehow deligitimizes the defined and specific category (species). That is the color spectrum argument. You said you disagreed with it (even though I never brought it up until you did), and then you used it to try and delegitimize the species concept.

No, it didn't.

Yes, it did. I have an excellent source for this. YOUR video. Didn't quite remember Josh saying that one bit, eh? ;)

I have been very generous in that I have willingly gone into the territory you chose, stats and ML, just to show that you will lose even on your home turf. But we have hardly even explored the anthropological assumptions in your argument. What are our preconceived ideas about race? How do you KNOW what the clusters are in advance? What is this information that you refuse to talk about? It is absolutely imperative to your argument. You keep saying it over and over, so obviously it is important. What are our preconceived ideas about race?

[–]milkmender11 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

This is against all laws of data analysis.

It isn't. The program will always give you the number of clusters you ask for. Your misunderstanding is that you are mistaking BEST FIT for TRUTH. The model is only ever there as a tool to help you answer your question. It doesn't represent reality. Actually, the bad fits are just as important as the best fits. They are negative results, scientifically speaking. None of this method ever leaves the realm of the experiment. It is always a hypothetical approximation of reality which presents a picture that is either more or less useful to answering your qustion. Science trumps data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

[–]DragonerneJesus is white 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (0 children)

It isn't. The program will always give you the number of clusters you ask for.

That's not what I'm saying. Yes, it gives you the number of clusters you ask for.
The point is that you should not BIAS your choice of k by having inspected the data previously. The point of unsupervised learning is exactly that; being unsupervised. In this case its not the 'worst' mistake a researcher can do but its a bad practice.
You can run into situations where its perfectly acceptable to do, but thats not that relevant for what we're talking about here.

Your misunderstanding is that you are mistaking BEST FIT for TRUTH

Is this some epistomology argument? If so I'm not really interested in opening that can of worms. We can put it down on a note and take this subject up again when we're done with this debate.

Science trumps data analysis.

Science is data analysis.

I wrote an email to Josh. Seems like a good professional contact to have. I copied your arguments here (username redacted) and asked him if he agrees with your argument about racial clusters. I'll be sure to share his reply with you when he gets back to me.

It will be very interesting to hear his response. Just don't spam him. A better approach would be for us to go back and forth, condense some points, figure out say 10 core arguments where we disagree, put it in a proper format and let him return back on those.

This will keep going until someone stops replying or the mods decide to step in, but, for what it's worth, there are hints of truth in the race realist narrative. It isn't scientific, but it doesn't need to be. Here, I'll make a better version of your argument for you:

"Science exists in service of human longevity and well-being. There is a truth that trumps scientific consensus, and that is the truth of which ideas work in the real world and which don't. Sure, you can poke holes in my attempt to scientifically classify races all day, but that won't change the fact that race is immensely important to people, guides their actions, motivates them to kill and hurt and riot. If, one day, there are people banging down your door because you are or aren't one race or another, you'll regret all of this obfuscation you're engaging in here. You'll regret playing science-games to catch me on technicalities, because no amount of scientific reasoning is going to persuade those people to stop crushing your door. At that point, the only 'truth' that will matter to you is the truth of your arsenal and your allies. And we have SEEN this happen, recently. By attacking the people trying to bring attention to the importance of race, you only make it that much more likely that we are overwhelmed by what we do not understand, because you refused to hear us."

This is the sociololy perspective, where race isn't biological but instead individuals are racialized by society into "politically convenient allyships". In a way, you can say that "one human race" is exactly that: a politically convenient allyship to usher in multiracial societies. But I'm sorry for getting political, so lets leave it at that and keep us grounded in biology and genetics and data analysis, not sociology.
I somewhat agree with the quote though, and I know that many in the alt right sphere definitely agrees with the quote.