XS-VID: An Extra Small Object Video Detection Dataset
XS-VID is a comprehensive dataset for Extra Small Object Video Detection, including diverse day and night scenes such as rivers, forests, skyscrapers, and streets.
Update
- [20240811] Annotation in YOLO format released!
- [20240530] The quantitative results of several mainstream methods on XS-VID test-set are reported!
- [20240530] We add the visualization of images in XS-VID.
- [20240528] Our Homepage for XS-VID benchmark opens!
XS-VID
XS-VID contains a diverse array of scenes featuring multiple categories and sizes of targets. Notably, XS-VID achieves unprecedented breadth and depth in covering and quantifying minuscule targets (< $32^2$ pixels). Some example images are shown below.
Here is a statistical comparison of our dataset with other related datasets
Results
We exhibit the quantitative experiment results of several representative methods on the XS-VID test-set and Visdrone2019 VID test-dev set as follows.
Download
We provide the downloading of our datasets.
- [Google drive]: annotations; images(0-3); images(4-5);
- [BaiduNetDisk]:annotations and images;
Please choose a download method to download the annotations and all images. Make sure all the split archive files (e.g., images.zip
, images.z01
, images.z02
, etc.) are in the same directory. Use the following command to extract them:
unzip images.zip
unzip annotations.zip
If you get an error while unpacking, you can get help from issues
Codes
The official codes of our benchmark, which mainly includes data preparation and evaluation, are released below.
- our XS-VID baseline: YOLOFT
- VOD Method: MMtracking; DiffusionVID;
- GOD Method: MMdetection
- SOD Method: CFINet; CEASC
- YOLO Method: Ultralytics;StreamYOLO
- Eval Tools: Eval code
Support or Contact
If you have any problems about our XS-VID benchmark, please feel free to contact us at gjh_hust@hust.edu.cn.