Five key facts to consider when studying science on Wikipedia

Five key facts to consider when studying science on Wikipedia

The presence of science on Wikipedia is a recurrent research topic in the scientometric community. However, its full potential for the study of science-society relations has not yet been fully explored. These are some of the key facts to be considered when studying it.

Since its very beginnings, Wikipedia has been the target of criticism. The first (and negative) comparisons of its contents with those of other encyclopaedias are long gone, although the perception from academia was more optimistic. However, in education, the terrain in which this platform is most valuable, the controversy is greater. Its established use among students collides completely with the sceptical perception of part of the teachers. Despite this, there are more and more voices in favour of its use, as well as an increasing number of educational projects that integrate it into the classroom. This conflict has yet to be resolved, although the general perception has progressively improved over time.

In the case of scientometrics, its community has been studying the presence of science on Wikipedia since before the formal birth of ‘altmetrics’. In most cases, however, these previous studies have mostly focused on the analysis of the scientific works mentioned on Wikipedia, rather than taking Wikipedia itself as their main research object. This science-centric focus typically overlooks the potential of exploring the different relationships that Wikipedia has (or doesn’t have) with science. In this post I reflect about such potential by presenting five key facts about the nature of Wikipedia and its possibilities as a research source for the study of science-society interactions.

1) Why are scientific publications cited on Wikipedia?

The most common critique of Wikipedia has to do with the reliability of its contents, a problem that Wikipedia itself exhibits with complete transparency. In its quest for reliability, Wikipedia places great importance on verifiability, which is one of its core content policies.

There are several issues in these content policy guidelines that cannot be overlooked when studying Wikipedia citations to scientific publications. Firstly, Wikipedia is an encyclopaedia. It may seem obvious, but as stated in its content policy guidelines, "Wikipedia does not publish original research". Moreover, Wikipedia only publishes information of encyclopaedic relevance. Secondly, not all sources are valid as citations on Wikipedia. At the top of the list of source typologies recommended by Wikipedia are peer-reviewed scientific publications. Books are one of the most relevant materials. This relevance of books for Wikipedia has even led publishers to offer free access to their collections to Wikipedia editors via proposals such as The Wikipedia Library.

The fundamental difference between Wikipedia citations and scientific citations cannot be ignored, as the interpretation of these differs greatly. Thus, the dynamic nature of Wikipedia must be clearly understood. Contrary to the static nature of citations to scientific papers, which theoretically speaking can never decrease, the references in a Wikipedia article can indeed disappear, and even reappear later. Analysing this phenomenon through a snapshot in which only the resources cited at a specific moment in time appear is useful, but it may hinder the consideration of all these fluctuations and specificities of Wikipedia citations.

2) Linguistic and cultural multiverses

Wikipedia is a decentralised medium whose management falls in the hands of its community of editors, also known as wikipedians, who dictate (many of) its policies, which must therefore have the support of the community. There is nothing immutable on Wikipedia. This is an important feature, resulting in more than 300 language editions, which are far from being mere translations. The community of wikipedians for each edition (also known as wikipedistas in Spanish, wikipédiens in French or wikipedianen in Dutch) establishes their own policies and manages their contents. It is enough to take a quick look at the main page of the Spanish, French and Dutch Wikipedias to observe clear differences. In fact, even the design or the name itself can have a slight variation, see for example the case of the Catalan Viquipèdia or the Galician Galipedia. This obviously has an impact on the contents, which may introduce cultural biases.

Although the edits made to Wikipedia articles can come from users who contribute independently, there are also communities organised around topics. These are the so-called WikiProjects. Each of them is focused on a specific topic, for example astronomy, cats or Lady Gaga. Just as each language edition has complete autonomy, so do the WikiProjects. Each one establishes its own specific guidelines for the development and improvement of the project's articles of interest. They can provide recommendations, such as following a specific structure, or even offer suggested literature, as in the case of the lepidoptera WikiProject. Some of these activities can thus affect the contents of an entire block of articles. In addition, especially in the case of the English Wikipedia edition, WikiProjects organise the articles in a very remarkable way. Wikipedia articles are classified according to two criteria: the quality of the article and its importance or priority for the WikiProject in question. The use of references plays a key role in establishing one categorisation or another. It should be noted that this assignment is made freely by wikipedians, although the more advanced categories (Featured and Good Article) depend on a more centralised and standardised system with a particular system of nomination and voting.

Average length (in bytes) and referenced publications of Wikipedia articles by quality level

3) Life beyond Wikipedia articles

In Wikipedia, the contents of articles are the result of consensus. This is not always possible and results in a high number of edits in which several editors try to get ahead of each other in their respective points of view. Wikipedia refers to these conflicts as edit wars, and some of the most regrettable ones have been documented. These conflicts are frequent in articles concerning more sensitive and topical issues. When one of these wars takes place, the community tries to reach a consensus on its own or with the intervention of a committee formed to help resolve it.

Furthermore, wikipedians have the possibility to discuss the contents of articles openly with the rest of the community. Something that often goes unnoticed on Wikipedia is the talk page (you can find the link to it next to the article title), where editors do not only leave messages related to changes made or proposed changes, but also allow these contents to be discussed for improvement. The scientific literature also has a place in these discussions, for example by commenting on publications of interest for citation in the article or by being used as support for the statements made in the discussions.

Wikipedia article


Wikipedia article



Donald Trump


Barack Obama



Barack Obama





Climate change


George W. Bush



Intelligent design


United States



United States


Adolf Hitler





Donald Trump



Sarah Palin


Michael Jackson



Gamergate controversy


Climate change








Race and intelligence


September 11 attacks


Top 10 English Wikipedia articles with the highest number of edits on their talks pages (talks) and unique users discussing (talkers). Article names in bold type appear in both lists.

4) How are the contents of Wikipedia classified by topics?

The way in which content is classified by topics on Wikipedia has its ups and downs. Wikipedia's main system is the categories (not to be confused with Wikidata Concepts), a folksonomy which, in the English edition Wikipedia alone, includes 2 million categories. As an example of the usefulness and representativeness of these, the Wikipedia article Bibliometrics has only one category (Bibliometrics), while Derek J. de Solla Price's article has 16, with some such as ‘1922 births’ and ‘1983 deaths’. This problem is undermined by the hierarchical relationships established between them. Because a category may have more than one parent category, it is difficult to establish a single broad topic for each Wikipedia article.

In addition, Wikipedia has other systems that also organise its contents by topic and make browsing easier. Some of these are overview articles, lists or portals. Systems such as WikiProjects can also be used for this purpose, as they delimit articles related to a topic. There is also no shortage of machine learning applications, such as ORES, an article topic model that predicts the topic of an article.

Interactive map of WikiProjects of the English Wikipedia with overlays of the average number of references (total, DOI and ISBN) of its articles

5) Wikipedia as the ultimate social media for measuring social attention

Finally, there is a wide range of metrics that can be obtained from Wikipedia to understand the different interactions taking place at Wikipedia. In this regard, it is worth recalling that Wikipedia is one of the websites with the highest traffic worldwide. It is in fact easy to find Wikipedia articles at the top of web search engine results, attracting millions of visits. Furthermore, not only is there an English Wikipedia, which is the largest one and can be used as a proxy for international forms of attention, but there are also different language editions that can be used to capture local attention. All things considered, what we have is a perfect social thermometer, the usefulness of which has already been noted, for example, for monitoring outbreaks.

The number of times an article has been edited, as well as the unique number of editors involved, can shed light on which articles are most active and interesting for the Wikipedia community to engage with. In the case of discussions, these can even be seen as a proxy for identifying controversial content. On the other hand, the years since the creation of the article and its length make it possible to characterise the article, while the number of references to scientific articles reflects the scientific orientation or interest of the article. The possibilities are numerous, go far beyond these more general approaches, and many have yet to be explored.

What is certain is that only by paying attention to these aspects when analysing science in this social medium will it be possible to understand the role that science plays in Wikipedia, beyond its greater or lesser presence, as well as the implications and reach of these resources within the community of editors and society in general.


Add a comment