About TREC-COVID

TREC-COVID was a collaboration among the Allen Institute for Artificial Intelligence (AI2), the National Institute of Standards and Technology (NIST), the National Library of Medicine (NLM), Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UTHealth). Based on the TREC model, it built a set of Information Retrieval (IR) test collections based on the CORD-19 data sets.

The twin goals of the challenge were

to evaluate search algorithms and systems for helping scientists, clinicians, policy makers, and others manage the existing and rapidly growing corpus of scientific literature related to COVID-19, and
to discover methods that will assist with managing scientific information in future global biomedical crises.

TREC-COVID was structured as a series of five rounds, which are now complete. For each round, organizers designated a specific version of the CORD-19 data set to be used in the round and released a set of information need statements, called topics. After the submission deadline for a round, NIST used the submitted runs to produce for each topic a set of documents to be assessed for relevance to the topic by human annotators. The annotators were drawn from NLM, OHSU, and UTHealth to ensure they had the necessary biomedical expertise. The relevance judgments are used to compute effectiveness scores for retrieval runs. Topics, runs, scores, and relevance judgments are archived on this TREC-COVID website where they are freely accessible to everyone.

The CORD-19 data set is released and maintained by AI2 who update it regularly. Each release is generally larger than the previous release. Later rounds of TREC-COVID used the later data sets. TREC-COVID organizers also added new topics to the topic set in each round, with each round using the cumulative set of topics. Relevance judgments were available for the majority of the topics since the first round, and participants were free to use the judgments in constructing their runs if they desired to do so. To fairly evaluate runs when partial relevance judgments are known, runs were scored against a reduced collection consisting of the CORD-19 version with all previously judged documents removed (this is known as residual collection evaluation in the IR literature). The final cumulative collection, TREC-COVID Complete, is available for use as a standard ad hoc retrieval test collection.

Organizers

Steven Bedrick, Oregon Health & Science University
Aaron Cohen, Oregon Health & Science University
Dina Demner-Fushman, National Library of Medicine
William Hersh, Oregon Health & Science University
Kyle Lo, Allen Institute for Artificial Intelligence
Kirk Roberts, University of Texas Health Science Center at Houston
Ian Soboroff, National Institute of Standards and Technology
Ellen Voorhees, National Institute of Standards and Technology
Lucy Lu Wang, Allen Institute for Artificial Intelligence

About the challenge

Organizers