Background

The BBC Land Girls TV series is a 3 season series. Each season is 5 episodes of about 45mins each. The TRECVID group at NIST worked with the BBC Corp. to release the dataset to the research community to work on video understanding tasks. Unfortunately, the hosting arrangement for the dataset was not successful and the release of the video dataset couldn't be done. We are releasing the annotations conducted by NIST, without any video data, so that the researchers interested in working on knowledge graph understanding and natural language analysis can take advantage of them.

Web Resources

Here are some available web resources for the dataset:
  • wikipedia page of Land Girls (TV series)
  • Land Girls at BBC Online
  • Land Girls at IMDb
  • Annotations

    A dedicated human annotator was hired by NIST to generate the Land Girls annotations as follows:

    A static knowledge graph

    One knowledge graph was generated for each episode after watching the whole episode and extracting main entities such as key persons, locations, and relationships. The yEd graph editor was adopted as a tool to build the static knowledge graph. Each episode's static knowledge graph can be found under the static.kg folder. The knowledge graphs are grouped by season and files names (.xgml) end by episode ids (EP1, EP2, EP3, EP4, EP5).
    Note: all xgml files can be re-saved as tgf (trivial graph format) files from the yEd editor tool which can be easier to parse and interpret.

    A sample of a static movie knowledge graph

    Dynamic (scene-based) knowledge graphs

    Each episode scene has a corresponding json file that represents the scene knowledge graph (KG). All scene-level knowledge graphs are available in the folder dynamic.kgs There are two main components (keys) in each json (knowledge graph) file: "nodes" and "links":
    1. nodes : The nodes value is an array of objects. Where each object represents one node in the knowledge graph. A node object consists of 3 key-value pairs:

    2. links : The links value is an array of objects. Where each object represents one link (edge) between two nodes. A link object consists of the following key-value pairs:

    Please see below a sample scene KG outputs visually as created by annotators and the corresponding json format output:


    Important annotation guidelines and notes:

    Ontology

    The vocab.dvu.json file contains the used vocabulary in the scene annotations (json graph files). Specifically, it has a set of:
  • Emotional states [used by annotators to describe unneutral actors' emotions when observed)
  • Interactions [used by annotators to describe the interaction type that may have happened between at least any two actors in a scene]
  • Relationships [used by annotators to establish a relationship between any two actors when it became apparent to them]
  • Sentiments [used by annotators to assign at least one sentiment to each scene]
  • Locations [used by annotators to describe the location type where the scene happened]
  • Scene language descriptions

    Each scene was described in one or two english sentences. These descriptions should not substitute subtitles but should rather be understod as audio descriptions, short summaries, or video captions. All summaries are located under the scene.summaries folder (one txt file per scene per episode).

    Possible applications using the dataset

    There can be many applications and usage to the provided dataset. The following are couple of examples that the authors imagine might be useful:

    Contact us

    The authors (from the information retrieval group at NIST) can be contacted by email:
    George Awad (gawad at nist.gov)
    Keith Curtis (keith.curtis at nist.gov)
    Shahzad Rajput (shahzad.rajput at nist.gov)
    Ian Soboroff (ian.soboroff at nist.gov)