Abstract IE
Identify events in a sentence, and mark them as material or verbal, helpful or harmful.
The BETTER program aims to dramatically compress the information discovery cycle by designing systems that extract personalized, semantic information from text and leverage this information to substantially improve search capabilities.
The BETTER program created datasets for information extraction and cross-language information retrieval. These datasets were built using content from the CommonCrawl news collection. The linguistic annotations were provided by MITRE and ARLIS, and relevance assessment annotations by NIST. The datasets were structured around the following six evaluation tasks:
Identify events in a sentence, and mark them as material or verbal, helpful or harmful.
Basic events have a type, an agent, a patient, and possibly related events. You might think of Basic as stripped-down MUC events.
Granular events are templates for events that consist of several component Basic events.
Cross-language retrieval by example, fully automatic given a small number of example documents and passages in place of a query.
Retrieval with a human in the loop. The user is shown a small number of requests with narrative descriptions, and allowed to tune the system for future requests on the same topic.
"Automatic" HITL, where systems are shown the small number of requests with narrative descriptions, and automatically adapt the system to future requests, for example by creating background queries.
Abstract events consist of agent, patient, event anchor, and quad-class. The quad-class is a two-dimensional event type that can be either Material and/or Verbal, and Helpful or Harmful.
As an example, consider the sentence below and the abstract events identified in the table below it. This sentence mentions four events that would be captured in the abstract extraction task.
“According to several witnesses, the boiler explosion in the factory injured three people, but did not melt any of the nearby fuel lines.”
Event ID | Agent(s) | Anchor(s) | Patient(s) | Quad-class |
1 | "several witnesses" | "according" | "injure", "melt" | Verbal-Neutral |
2 | "boiler" | "explosion" | "boiler" | Material-Harmful |
3 | "explosion" | "injure" | "three people" | Material-Harmful |
4 | "explosion" | "melt" | "the nearby fuel lines" | Material-Harmful |
Abstract documentation | (pdf format) |
abstract-eng.bp.json | English abstract data. This data was hidden in the BETTER evaluation. |
abstract-arb.bp.json | Arabic abstract data. This was the phase 1 evaluation test set. |
abstract-fas.bp.json | Farsi abstract data. This was the phase 2 evaluation test data. There is also a README file. |
Basic events are more traditional events with an event type, an anchor, agent(s), patients(s), and possibly referred events. As an example, consider the italicized sentence below and the Basic events identified in the table below it. This sentence mentions four events.
According to several witnesses, the boiler explosion in the factory injured three people, but did not melt any of the nearby fuel lines.
Event type | Anchor | Agent | Patient | Referred events |
Violence-damage | "explosion" | "boiler" | "boiler" | |
Violence-wound | "injure" | "explosion" | "three people"" | |
Violence-damage | "melt" | "explosion" | "the nearby fuel lines" | |
Communicate | "according" | "several witnesses" | "explosion", "injure", "melt" |
Basic documentation | (pdf format) |
basic-eng.full.bp.json | English Basic data (full). In the BETTER evaluation, this set was split into train, devtest, analysis, and hidden subsets. |
basic-arb.bp.json | Arabic Basic data. |
basic-fas.bp.json | Farsi Basic data. |
Granular events are similar to MUC scenario templates, where the constituents of the template are entites and Basic events specific to the scenario, such as what people were involved and where did the event take place. The templates are (phase 1) protests, instances of government corruption, terrorist attacks, disease outbreaks; (phase 2) natural disasters, large-scale human displacement; (phase 3) energy, transportation, and infrastructure projects; and cybercrime activities.
Granular documentation | (pdf format) |
granular-eng-p1.full.bp.json | English Granular data, phase 1 (full). In the BETTER evaluation, this set was split into train, devtest, analysis, and hidden subsets |
granular-eng-p2.full.bp.json | English Granular data, phase 2. |
granular-arb.bp.json | Arabic Granular data. |
granular-fas.bp.json | Farsi Granular data. |
The BETTER information retrieval task was to do cross-language retrieval from English into Arabic (phase 1), Farsi (phase 2), Russian, Chinese, and Korean (phase 3). There were automatic systems as well as systems with a user in the loop. The search topics are structured into a series of individual requests. In the program, search was done "by example", given only passages in English annotated for Basic events; the topics also have TREC-style descriptions for traditional IR experiments.
The IR task is cross-language search from English into a target language. Search needs are structured into high-level analytic tasks that set the context for three or more requests within each task. Systems retrieve a ranked list of documents for each request.
The search tasks include long need statements similar to TREC topics as well as English passages provided as examples of relevant information. The requests have a single sentence description of the specific need as well as example passages. In the BETTER program, automatic systems were only allowed to see some of the examples, but the larger topic statements were available to enable human-in-the-loop search scenarios.
The document collections for IR have 750k-1m documents in the target language. The three collections are (1) English -> Arabic, (2) English -> Farsi, and (3) English -> Russian, Chinese, and Korean.
In the Arabic and Farsi collections, a subset of the relevant documents are annotated for Basic. In the Russian/Chinese/Korean collection, passages from relevant documents are annotated. These annotations supported a scoring metric that integrated relevance ranking with boosting passages that mentioned relevant Basic events.
IR documentation | (pdf format) |
BETTER-Phase1-IR-HITL-package.tar.gz | Phase 1 IR collection. This includes an English training corpus, English queries with Basic annotations, and an Arabic target corpus with relevance judgments and Basic annotations. |
BETTER-Phase2-IR-HITL-package.tar.gz | Phase 2 IR collection. This includes an English training corpus, English queries with Basic annotations, and a Farsi target corpus with relevance judgments and Basic annotations. |
BETTER-Phase3-IR-HITL-package.tar.gz | Phase 3 IR collection. This includes an English training corpus, English queries with Basic annotations, and a multilingual Farsi/Russian/Korean target corpus with relevance judgments and Basic annotations. |
These are tools for scoring IE and IR outputs, checking data, and converting to and from the BETTER program data formats.
bp_lib.zip | BP_LIB is a toolkit developed for the BETTER program that includes the IE scorer and utilities for manipulating the various data formats used in the BETTER program. |
eval-better-ir.py | The evaluation script for BETTER IR, for runs that include both document rankings and Basic extractions. |
better2trec.py | A script to convert a BETTER IR evaluation file to the TREC run format, for use with trec_eval |
For more information, contact Ian Soboroff