COVID-19 research in the news: Visualizing the sentiment and topics in science news about the pandemic
Every day news outlets around the world play a central role in disseminating the latest COVID-19 research. In this post, we discuss the impact of COVID-19 findings on the news by applying state-of-the-art sentiment analysis and present some interesting preliminary results, stay tuned!
There are many reasons why we should be concerned with how science is portrayed in the news media, particularly given the ‘infodemic’ related to COVID-19. For example, over-hyped research results can lead to misinterpretation that may contribute, among other things, to public skepticism and distrust towards science. Because of that, we began to wonder how we could start the exploration of the news reception about the science related to the pandemic. More specifically, we decided to explore the potential of natural language processing (NLP), which incorporates sentiment analysis as an important indicator of expression of news media sentiment about COVID-19 findings. As a disclaimer, the analysis presented in this blog post should be seen as a preliminary exploration on how sentiment approaches can be implemented in the study of the reception of scientific content in social and news media outlets.
In our experiment we used an existing dataset of scientific publications related to research on COVID-19 updated up to April 24th, 2020 and matched it with data by Altmetric.com (Figure 1). From this dataset, we selected publications related to the pandemic as indicated by the WHO or Dimensions. Since our analysis focused on texts, we filtered out publications without an abstract. Also, from the data obtained from Altmetric.com we removed news articles that did not come with a summary text (this summary typically contains about the first 250 characters of the news media text). We ended up with a dataset of 1,910 publications with an abstract and mentions in 38,611 different news media posts.
The Sentiment Analysis
To obtain the sentiments apparent in the news articles, we used a sentiment extraction transformer built on top of BERT (Bidirectional Encoder Representations from Transformers) (See Vaswani et al, 2017). We use the bert-base-multilingual-uncased-sentiment model, which is trained in six different languages: English, Dutch, German, French, Spanish and Italian, and is fine-tuned on a set of 500,000 product reviews with sentiment labels ranging from 0 to 4, where 0 is a bad review and 4 is a good review (the pretrained model can be accessed here). Thus, sentiment scores range between 0 and 4 and can be interpreted as follows: 0=‘very negative’, 1=‘negative’, 2=‘neutral’, 3=’positive’, 4=’very positive’.
How has the science around COVID-19 been received in the news?
To get a sense of how well BERT dealt with topics related to COVID-19 research in the news we plot a term map of the most commonly co-occurring terms in scientific articles. Then, we overlay the average BERT sentiment scores of news corresponding to each paper in the dataset in order to represent the sentiment of news items around COVID-19 research. As we can see, BERT seems to be able to identify paper topics related to solutions like vaccines and treatments as more neutral/slightly positive news media pieces. On the other hand, articles on the topic of symptoms such as fever, hypertension, and policy measures to control the virus are more negatively reported in the news.
We also analyzed the temporal dynamics of the news items and aggregated the average sentiment of the sentences of all the news on a given day (Figures 2 and 3). The number of news items around COVID-19 related scientific publications has increased over time, particularly from mid-March onwards, a pattern that has also been observed for Twitter, other social media sources, and in The Conversation. During the period of higher news activity (March-April), the mean sentiment scores oscillate between slight negativity (1.5) and neutrality (2) (Figure 2).
In Figure 3 we show the aggregated sentiment scores at the month-level to show the overall increase of the sentiment inferred from the news items from the early months to the more recent ones.
Another interesting piece of information recorded by Altmetric.com are the sources of the news items. This enables the study of the type of sentiments expressed by the different news items providers (Figure 4).
Interestingly, some of the most popular news outlets related to medical research (e.g. MedicalXpress, The Conversation or Medscape) exhibit values very close to 2, suggesting a high degree of neutrality in their dissemination of COVID-19 science related news. In contrast, business-related news outlets (Business Insider - Malaysia, Singapore, Australia, India or the Netherlands) tend to have a more negative sentiment in their news items, perhaps due to the negativity around the critical economic situation caused by the pandemic. Other news aggregators such as Yahoo! News, MSN, or Google News also exhibit rather negative sentiments, which is in line with news media such as the New York Times, CNN News or The Guardian. An interesting exception is the conservative channel Fox News, with a fairly positive coverage of the research around the pandemic.
What did we learn from this exercise?
This is a first analysis of the sentiment of news items covering scientific articles about COVID-19. Overall, we observe a slight increase in the neutrality of news as they move from a slightly negative sentiment in the early months to a more neutral sentiment of scientific findings. On average, paper topics related to solutions like vaccines and treatments tend to be more neutral or positively treated in the news, while paper topics about transmission and control measures are more negatively disseminated. Medical-related news sources tend to present more neutral views, while generalistic and business-related news outlets write more negatively about scientific research related to the virus.
However, this exercise is by no means in its final stages. Given the lack of abstracts in many of the publications and occasionally of summary text from news items, we could only study a limited selection of publications and news media items. In the future we will consider larger sets of publications and news media items. Another concern is that we used an already trained BERT model fine tuned for sentiment analysis on product reviews, and used it for classifying news items about research. While models like BERT can be generalized to different contexts (especially social media), we could have obtained state-of-the-art classification by fine tuning the model with a corpus of research articles about COVID-19 and related news items instead.
Nevertheless, BERT reveals interesting findings that we think are worth sharing in this blog post. It also shows the potential of Machine Learning such as text classification for further studying and characterizing the online and social media reception of scientific outputs. Tips on improvement would be greatly appreciated!