Towards understanding situations in videos

VidSitu is a large-scale dataset containing diverse 10-second videos from movies depicting complex situations (a collection of related events). Events in the video are richly annotated at 2-second intervals with verbs, semantic-roles, entity co-references, and event relations.

Dataset Statistics

VidSRL Task

Annotations in VidSitu support the Video Semantic Role Labeling (VidSRL) task which consists of 3 subtasks.


If you find our work helpful, please cite the following paper:

          author = {Sadhu, Arka and Gupta, Tanmay and Yatskar, Mark and Nevatia, Ram and Kembhavi, Aniruddha},
          title = {Visual Semantic Role Labeling for Video Understanding},
          booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
          month = {June},
          year = {2021}}


Download VidSitu

Detailed instructions for downloading VidSitu are provided on the accompanying Github repo. This repo provides:

Download VidSitu Dataset


All code is available on the accompanying Github repo. We provide code for:

Get Code


Please reach out to Arka Sadhu ( for any queries.