Researchers, clinicians, and policy makers involved with the response to COVID-19 are constantly searching for reliable information on the virus and its impact. This presents a unique opportunity for the information retrieval (IR) and text processing communities to contribute to the response to this pandemic, as well as to study methods for quickly standing up information systems for similar future events. The results of the TREC-COVID Challenge will identify answers for some of today's questions while building infrastructure to improve tomorrow's search systems.
TREC-COVID will follow the TREC model for building IR test collections through community evaluations of search systems. The document set to be used in the challenge is the COVID-19 Open Research Dataset (CORD-19). This is a collection of biomedical literature articles that will be updated weekly. Accordingly, TREC-COVID will consist of a series of rounds, with each round using a later version of the document set and a larger set of COVID-related topics. Participants in a round will create ranked lists of documents for each topic ("runs") and submit their runs to NIST. Based on the collective set of participants' runs, NIST will create small sets of documents to be assessed for relevance by human annotators with biomedical expertise. The results of the human annotation, known as relevance judgments, will then be used to score the submitted runs. After all rounds are complete, the final document and topic sets together with the cumulative relevance judgments will comprise a COVID test collection. The incremental nature of the collection will support research on search systems for dynamic environments.
The TREC-COVID Challenge is being organized by the
Allen Institute for Artificial Intelligence (AI2),
the National Institute of Standards and Technology (NIST),
the National Library of Medicine (NLM),
Oregon Health and Science University (OHSU), and
the University of Texas Health Science Center at Houston (UTHealth).
NIST press release.
Participation in TREC-COVID is open to anyone, subject to the conditions of participation listed on the registration form. The COVID-19 Open Research Dataset (CORD-19) is a free resource of scholarly articles about COVID-19 and the coronavirus family of viruses. Topic sets, as well as relevance judgments from previous rounds, are freely available on the data page. Retrieval results from prior rounds are stored in the open archive of submissions.
The Round 3 submission deadline is now passed, but you can join Round 4, which we expect to kick-off on June 23.
You can also join the trec-covid Google group to discuss the challenge, follow #COVIDSearch on Twitter, or contact the TREC group at NIST for more information. See also the companion COVIDSearch page.