Structuring Natural Language Processing Contributions in the Open Research Knowledge Graph
Next-generation digital libraries like the Open Research Knowledge Graph are here! Catering to which, we announce a Shared Task that builds scholarly contributions-focused graphs over Natural Language Processing (NLP) articles. Want to build a machine learner, we provide the data--join us!
Search has long been revolutionized by knowledge-graph-powered services such as the Amazon Marketplace in e-commerce, or Open Street Maps in the cartography and navigation services domains, to name just two examples. Inspired by such knowledge graph (KG) success stories in the general domain, such technology is now being realized over scholarly knowledge as well. In this vein, we highlight the TIB-led project Open Research Knowledge Graph (ORKG) that advocates for representing scholarly articles’ contributions in knowledge graphs and that, as a next-generation digital library platform, stores and publishes such graphs as persistent knowledge items. You can browse this digital library and its scholarly knowledge here!
Since scientific literature is growing at a rapid rate and researchers today are faced with a publications deluge, it is increasingly tedious, if not practically impossible to keep up with the research progress even within one's own narrow discipline. The ORKG then is posited as a solution to the problem of keeping track of research progress minus the cognitive overload that reading dozens of full papers impose. It aims to build a comprehensive knowledge graph that publishes just the research contributions of scholarly publications per paper where the framework can then intelligently compute paperwise or aggregated scholarly knowledge highlights for researchers.
Naturally, then, one wonders what information should be captured in such scholarly contributions’ knowledge-focused graphs? Within the SemEval 2021: NLPContributionGraph (NCG) Shared Task, we seek both to answer and to discover better answers to this question. We have formalized a scholarly contributions-focused graph model over NLP scholarly articles that will be applied to annotate hundreds of NLP articles for their contributions. The corpus will be freely released to the NCG task participants, based on which they will be able to train and test automated machine learners. In essence, such systems will read “contributions” information in a subject-predicate-object structured format to be integrable within Knowledge Graph infrastructures such as the ORKG. The corpus annotation data elements will include: (1) contribution sentences - a set of sentences about the contribution in the article; (2) scientific terms and relations - a set of scientific terms and relational cue phrases extracted from the contribution sentences; and (3) triples - semantic statements that pair scientific terms with a relation, modeled toward subject-predicate-object Resource Description Framework (a standard model for data interchange on the Web, commonly referred to as RDF) statements for KG building. The task is to automatically extract these elements given a new NLP article. Have a look at our pilot annotation task description paper published in the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020).
NCG is organized under the umbrella of the well-known Semantic Evaluation (SemEval) series that have been running since 1998. SemEval tasks bring together researchers with similar text mining and machine learning interests, and facilitate the collaborative building of computational semantic analysis systems. As depicted in Figure 1, our task will conform to the standard SemEval framework for tasks where the gold standards will be released by the organizers and the NLP systems will be developed by the task participants.
Figure 1: SemEval Framework. Source: Wikimedia Commons under the CC-BY-SA 3.0 License
In an earlier blogpost we raised a few questions: What if scholarly knowledge communicated in the scholarly literature would be FAIR, also for machines? What if the global scholarly knowledge base would be more than a repository of digital documents? How would this change the global access to as well as the reuse of scholarly knowledge?
NLPContributionGraph seeks to concretely find the answers. We invite the scholarly communication, information science and related research communities to contribute to the vision and the ORKG, specifically, and help to shape the future of scholarly communication. You may find detailed participation information and the task timeline here.