The Deep Video Understanding Task/Challenge in 2023 runs 2 subtasks: 1- Main task (using original movies and scenes) 2- Robustness task (using the same movies/scenes as in the main task, but after adding real world perturbations in the aduio/video/audio+video channels). Thus the robustness task is composed of 3 variants of the dataset simulating 3 different kinds of noise. Teams can work and are encouraged to submit results against one or more of the dataset variants. **** Teams should process each dataset and task (main and robustness) independently without using or applying any output from any task to answer queries related to another task. The goal of the robustness in multimedia task is to measure the effect of noise and perturbations on systems. *** Each of the main and robustness subtasks consists of the following 3 testing dataset directory structure: : This folder contains the 5 testing movies in mp4 files : This folder contains 1 CSV file for each movie scene segmentation boundaries as follows: The start and end times for each scene per line, separated by a comma: , : This folder contains the segmented movie scenes as webm files (based on the CSV file scene segments). Each scene file is named as -.webm : This folder conatins a vtt transcripts file for each movie. Transcripts were generated using whisper: $whisper filename.mp4 --model medium --language English : This file contains the used vocabulary (ontology) in the training scene annotations (json graph files) and also for testing queries. Specifically, it has a set of - Emotional states [used by annotators to describe unneutral actors' emotions when observed) - Interactions [used by annotators to describe the interaction type that may have happened between at least any two actors in a scene] - Relationships [used by annotators to establish a relationship between any two actors when it became apparent to them] - Sentiments [used by annotators to assign at least one sentiment to each scene] - Locations [used by annotators to describe the location type where the scene happened] NOTES: 1- The movie scenes will form the basis for scene-level queries. 2- The official testing queries will include image snapshot of key characters and locations where queries may reference.