The Pain of Labeling Things

The Pain of Labeling Things

Labeling things is hard, but labeling groups of things is harder! At CWTS we automatically group publications and label them with an algorithm, but these labels can be puzzling for human minds. In this post, I find out how the same group of publications can have the labels "queer theory" and "home".

This week, as I was browsing the CWTS fields of science (as used for the Leiden Ranking), just for fun, I found a field with the following labels:

  • Feminism
  • Politic
  • Queer theory
  • Space
  • Home

There is something weird with these labels, I thought. Feminism, Politic and Queer theory have nothing to do with Space and Home. You see, the CWTS science fields are created by an algorithm that clusters papers that cite each other. To label the fields, the algorithm uses the most representative terms from the titles and abstracts of these papers. The details for this process are explained in Waltman & Van Eck (2012). The question is, then, why did the algorithm use these labels in particular?

To discover the reason, I knew I had to read the papers of this field. But the field contains 4154 papers! I didn’t feel like reading them all, so I tried other approaches.

My first approach was to get the most frequent journals of the papers, which were:

  • GLQ-A Journal of lesbian and gay studies
  • Sexualities
  • Journal of homosexuality
  • Gender place and culture
  • Journal of the history of sexuality

Okay, I thought again: this field is about sexuality. But then why does it have the labels Space and Home?

My second approach was to get the titles of the most cited papers, and there I saw that the label Space actually refers to Queer space. Now the only mystery left was the word Home.

My third approach was to search for the titles and abstract that contained the word Home, and there I saw that many of the papers are on queer sexuality at home.

I did it, I solved the mystery! But still I was left with an uneasy feeling about the labels. Clearly, the topic of the field was queer sexuality, but the labels were so confusing! I dream of a day when the algorithm will be smart enough to create unambiguous labels. Until then, I will have to take every label with a grain of salt.


Add a comment