The BETTER program aims to dramatically compress the information discovery cycle by designing systems that extract personalized, semantic information from text and leverage this information to substantially improve search capabilities.


The BETTER program created datasets for information extraction and cross-language information retrieval. These datasets were built using content from the CommonCrawl news collection. The linguistic annotations were provided by MITRE and ARLIS, and relevance assessment annotations by NIST. The datasets were structured around the following six evaluation tasks:

Abstract IE

Identify events in a sentence, and mark them as material or verbal, helpful or harmful.

Basic IE

Basic events have a type, an agent, a patient, and possibly related events. You might think of Basic as stripped-down MUC events.

Granular IE

Granular events are templates for events that consist of several component Basic events.

Automatic IR

Cross-language retrieval by example, fully automatic given a small number of example documents and passages in place of a query.


Retrieval with a human in the loop. The user is shown a small number of requests with narrative descriptions, and allowed to tune the system for future requests on the same topic.


"Automatic" HITL, where systems are shown the small number of requests with narrative descriptions, and automatically adapt the system to future requests, for example by creating background queries.


Abstract events consist of agent, patient, event anchor, and quad-class. The quad-class is a two-dimensional event type that can be either Material and/or Verbal, and Helpful or Harmful.

As an example, consider the sentence below and the abstract events identified in the table below it. This sentence mentions four events that would be captured in the abstract extraction task.

“According to several witnesses, the boiler explosion in the factory injured three people, but did not melt any of the nearby fuel lines.”
Event ID Agent(s) Anchor(s) Patient(s) Quad-class
1 "several witnesses" "according" "injure", "melt" Verbal-Neutral
2 "boiler" "explosion" "boiler" Material-Harmful
3 "explosion" "injure" "three people" Material-Harmful
4 "explosion" "melt" "the nearby fuel lines" Material-Harmful

Abstract Data available

Abstract documentation (pdf format)
abstract-eng.bp.json English abstract data. This data was hidden in the BETTER evaluation.
abstract-arb.bp.json Arabic abstract data. This was the phase 1 evaluation test set.
abstract-fas.bp.json Farsi abstract data. This was the phase 2 evaluation test data. There is also a README file.


Basic events are more traditional events with an event type, an anchor, agent(s), patients(s), and possibly referred events. As an example, consider the italicized sentence below and the Basic events identified in the table below it. This sentence mentions four events.

According to several witnesses, the boiler explosion in the factory injured three people, but did not melt any of the nearby fuel lines.
Event type Anchor Agent Patient Referred events
Violence-damage "explosion" "boiler" "boiler"
Violence-wound "injure" "explosion" "three people""
Violence-damage "melt" "explosion" "the nearby fuel lines"
Communicate "according" "several witnesses" "explosion", "injure", "melt"

Basic Data available

Basic documentation(pdf format)
basic-eng.full.bp.json English Basic data (full). In the BETTER evaluation, this set was split into train, devtest, analysis, and hidden subsets.
basic-arb.bp.json Arabic Basic data.
basic-fas.bp.json Farsi Basic data.


Granular events are similar to MUC scenario templates, where the constituents of the template are entites and Basic events specific to the scenario, such as what people were involved and where did the event take place. The templates are (phase 1) protests, instances of government corruption, terrorist attacks, disease outbreaks; (phase 2) natural disasters, large-scale human displacement; (phase 3) energy, transportation, and infrastructure projects; and cybercrime activities.

Granular Data available

Granular documentation(pdf format)
granular-eng-p1.full.bp.json English Granular data, phase 1 (full). In the BETTER evaluation, this set was split into train, devtest, analysis, and hidden subsets
granular-eng-p2.full.bp.json English Granular data, phase 2.
granular-arb.bp.json Arabic Granular data.
granular-fas.bp.json Farsi Granular data.


The BETTER information retrieval task was to do cross-language retrieval from English into Arabic (phase 1), Farsi (phase 2), Russian, Chinese, and Korean (phase 3). There were automatic systems as well as systems with a user in the loop. The search topics are structured into a series of individual requests. In the program, search was done "by example", given only passages in English annotated for Basic events; the topics also have TREC-style descriptions for traditional IR experiments.

IR Data available

The IR task is cross-language search from English into a target language. Search needs are structured into high-level analytic tasks that set the context for three or more requests within each task. Systems retrieve a ranked list of documents for each request.

The search tasks include long need statements similar to TREC topics as well as English passages provided as examples of relevant information. The requests have a single sentence description of the specific need as well as example passages. In the BETTER program, automatic systems were only allowed to see some of the examples, but the larger topic statements were available to enable human-in-the-loop search scenarios.

The document collections for IR have 750k-1m documents in the target language. The three collections are (1) English -> Arabic, (2) English -> Farsi, and (3) English -> Russian, Chinese, and Korean.

In the Arabic and Farsi collections, a subset of the relevant documents are annotated for Basic. In the Russian/Chinese/Korean collection, passages from relevant documents are annotated. These annotations supported a scoring metric that integrated relevance ranking with boosting passages that mentioned relevant Basic events.

IR documentation(pdf format)
BETTER-Phase1-IR-HITL-package.tar.gz Phase 1 IR collection. This includes an English training corpus, English queries with Basic annotations, and an Arabic target corpus with relevance judgments and Basic annotations.
BETTER-Phase2-IR-HITL-package.tar.gz Phase 2 IR collection. This includes an English training corpus, English queries with Basic annotations, and a Farsi target corpus with relevance judgments and Basic annotations.
BETTER-Phase3-IR-HITL-package.tar.gz Phase 3 IR collection. This includes an English training corpus, English queries with Basic annotations, and a multilingual Farsi/Russian/Korean target corpus with relevance judgments and Basic annotations.


These are tools for scoring IE and IR outputs, checking data, and converting to and from the BETTER program data formats. BP_LIB is a toolkit developed for the BETTER program that includes the IE scorer and utilities for manipulating the various data formats used in the BETTER program. The evaluation script for BETTER IR, for runs that include both document rankings and Basic extractions. A script to convert a BETTER IR evaluation file to the TREC run format, for use with trec_eval


For more information, contact Ian Soboroff