TREC-COVID is a collaboration among the Allen Institute for Artificial Intelligence (AI2), the National Institute of Standards and Technology (NIST), the National Library of Medicine (NLM), Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UTHealth). Based on the TREC model, TREC-COVID is building a set of Information Retrieval (IR) test collections based on the CORD-19 data sets.
The twin goals of the challenge are
TREC-COVID is structured as a series of rounds; we anticipate there will be about 5 rounds total. The Round 3 submission deadline was June 3, and Round 4 will likely kick-off on June 23. For each round, organizers will designate a specific version of the CORD-19 data set to be used in the round and will release a set of information need statements, called topics. Participants have about one week to construct and submit to NIST a run, which consists of a ranked list of documents for each topic.
After the submission deadline for a round, NIST will use the submitted runs to produce for each topic a set of documents to be assessed for relevance to the topic by human annotators. The annotators will be drawn from NLM, OHSU, and UTHealth and will all have biomedical expertise. The relevance judgments will be used to compute effectiveness scores for runs. Topics, runs, scores, and relevance judgments are archived on the TREC-COVID website where they are freely accessible to everyone.
The CORD-19 data set is released and maintained by AI2 who will update it once a week, growing the collection each time. Later rounds of TREC-COVID will use the later data sets. TREC-COVID organizers will also add new topics to the topic set in each round, with each round using the cumulative set of topics. Relevance judgments are available for the majority of the topics since the first round, and participants are free to use the judgments in constructing their runs if desired. To fairly evaluate runs when partial relevance judgments are known, runs are scored against a reduced collection consisting of the CORD-19 version with all previously judged documents removed (this is known as residual collection evaluation in the IR literature).
Steven Bedrick, Oregon Health & Science University
Aaron Cohen, Oregon Health & Science University
Dina Demner-Fushman, National Library of Medicine
William Hersh, Oregon Health & Science University
Kyle Lo, Allen Institute for Artificial Intelligence
Kirk Roberts, University of Texas Health Science Center at Houston
Ian Soboroff, National Institute of Standards and Technology
Ellen Voorhees, National Institute of Standards and Technology
Lucy Lu Wang, Allen Institute for Artificial Intelligence