Mapping science using Microsoft Academic data

Mapping science using Microsoft Academic data

This blog post discusses the emergence of new data sources in the field of bibliometrics, and how to use them to map science.

One of the most exciting developments in the past few years in the field of bibliometrics is the emergence of a number of important new data sources. Dimensions, created by Digital Science and made openly available for research purposes, is a prominent example. Other examples are Crossref and OpenCitations, which provide data that is fully open. The launch of Microsoft Academic in 2016 also represents a significant development. In this blog post, we discuss the data made available by Microsoft Academic and we show how the most recent version of our VOSviewer software can be used to create science maps based on this data.

Microsoft Academic

Like Google Scholar, Microsoft Academic combines data obtained from scholarly publishers with data retrieved by indexing web pages. However, unlike Google Scholar, Microsoft Academic makes its data available at a large scale, both through an API and through the Microsoft Azure platform. Moreover, the data is released under an ODC-BY open data license, which allows the data to be used under minimal restrictions. Microsoft Academic data is for instance used by the Lens, an increasingly popular website for searching and analyzing scholarly literature and patents.

At the moment, the bibliometric community has only a limited knowledge of the coverage of Microsoft Academic and of the completeness and accuracy of its data. A study by Anne-Wil Harzing published earlier this year reports that in the field of business and economics Microsoft Academic has a larger coverage than Web of Science, Scopus, and Dimensions. Likewise, a recent study by a research team at Curtin University finds that Microsoft Academic outperforms Web of Science and Scopus in terms of coverage. However, this study also reports that Microsoft Academic has less complete affiliation data. Other issues with the quality of Microsoft Academic data have also been reported, for instance related to incorrect publication years or incorrect journal names (e.g., see this recent presentation by one of us).

At CWTS, we are currently working on a large-scale comparison of the coverage of bibliometric data sources, including also Microsoft Academic. Our colleague Martijn Visser has developed an algorithm for matching publications in Microsoft Academic with the corresponding publications in Scopus. Provisional results for the period 2014–2017 show that Microsoft Academic covers a much larger number of publications than Scopus (see the figure below). However, Scopus also covers a substantial number of publications that seem to be missing in Microsoft Academic. We also found that for some content covered by Microsoft Academic and not by Scopus the scholarly nature can be questioned. Microsoft Academic for instance covers wedding reports like this one.

Mapping science

Because we see Microsoft Academic as a promising data source for bibliometric analysis, we now offer support for Microsoft Academic data in our VOSviewer software for creating and visualizing bibliometric maps of science. In the most recent version of the software, maps of science can be created based on data from Microsoft Academic. After obtaining an API key, users of VOSviewer are able to query Microsoft Academic. Data is retrieved through the Microsoft Academic API. An important feature of this API is its speed. The API of Microsoft Academic is much faster than the APIs of many other data sources.

VOSviewer’s support for Microsoft Academic data was used in a recent VOSviewer tutorial organized as part of the workshop Open Citations: Opportunities and Ongoing Developments at the ISSI2019 conference in Rome. In this tutorial, participants for instance used Microsoft Academic data to create the following term co-occurrence map based on titles and abstracts of publications in Journal of Informetrics.

6bf55519d2a27f7fe1b15f5b2b3e360c large 6bf55519d2a27f7fe1b15f5b2b3e360c large

Participants also created a map of the citation network of publications in Journal of Informetrics.

9ead031e5874028f09c7f4b6335999f5 large 9ead031e5874028f09c7f4b6335999f5 large

Interestingly, the above two maps cannot be created based on data from Crossref, another open data source supported by VOSviewer. Elsevier, the publisher of Journal of Informetrics, does not make abstracts available in Crossref, while abstracts of publications in Elsevier journals are made available in Microsoft Academic. Likewise, Elsevier is unwilling to support the Initiative for Open Citations, and reference lists of publications in Elsevier journals are therefore not made openly available in Crossref. Microsoft Academic does make these reference lists available. This illustrates some of the advantages of Microsoft Academic over other open data sources.

For further illustrations of science maps created using VOSviewer based on data from Microsoft Academic, we refer to a recent blog post by Aaron Tay.

Next steps

Over the past few years, we have invested considerable effort in extending the range of bibliometric data sources supported by VOSviewer. The software now offers support for all major data sources. Next steps in the development of VOSviewer include opening the source code of the software and releasing a web-based edition of the software.


Ruth Pagell

With emphasis on quantity rather than quality, what is being done to screen out possibly predatory journals and citations from these journals?

Nees Jan van Eck

My understanding is that Microsoft Academic does not attempt to filter out predatory journals. This is a responsibility of users working with data from Microsoft Academic. Users may decide themselves which content they consider to be ‘predatory’ and they may then filter out that content.

Add a comment