📊 Backend Performance Comparison for Different VCodecs

🛠 Dataset: lerobot/aloha_sim_insertion_human_image

Backend VCodec MSE ↓ PSNR ↑ SSIM ↑
PyAV libsvtav1 5.15E-05 43.22 0.9948
TorchCodec libsvtav1 5.10E-05 43.27 0.9949
PyAV libx264 1.59E-04 40.96 0.9784
TorchCodec libx264 1.58E-04 40.99 0.9785
PyAV libx265 1.85E-04 39.84 0.9802
TorchCodec libx265 1.42E-04 40.74 0.9815

🛠 Dataset: lerobot/aloha_sim_transfer_cube_human_image

Backend VCodec MSE ↓ PSNR ↑ SSIM ↑
PyAV libsvtav1 5.47E-05 44.62 0.9950
TorchCodec libsvtav1 5.18E-05 44.68 0.9950
PyAV libx264 1.71E-04 41.84 0.9795
TorchCodec libx264 1.68E-04 41.92 0.9793
PyAV libx265 2.23E-04 40.21 0.9805
TorchCodec libx265 1.46E-04 41.60 0.9826

🛠 Dataset: lerobot/pusht_image

Backend VCodec MSE ↓ PSNR ↑ SSIM ↑
PyAV libsvtav1 1.77E-04 37.79 0.9894
TorchCodec libsvtav1 1.82E-04 37.70 0.9891
PyAV libx264 2.88E-04 37.23 0.9826
TorchCodec libx264 2.88E-04 37.21 0.9826
PyAV libx265 4.34E-04 35.59 0.9782
TorchCodec libx265 3.34E-04 36.45 0.9802

To reproduce the full results, you can run:

python benchmarks/video/run_video_benchmark.py \\
    --output-dir outputs/video_benchmark \\
    --repo-ids \\
        lerobot/aloha_sim_transfer_cube_human_image \\
        lerobot/pusht_image \\
        lerobot/aloha_sim_insertion_human_image \\
    --vcodec libsvtav1 libx265 libx264 \\
    --pix-fmt yuv420p \\
    --g 1 2 3 4 5 6 10 15 20 40 None \\
    --crf 0 5 10 15 20 25 30 40 50 None \\
    --timestamps-modes 1_frame 2_frames 6_frames \\
    --backends torchcodec-cpu pyav \\
    --num-samples 50 \\
    --num-workers 4 \\
    --save-frames 1

Or see the full csv file here:

https://drive.google.com/file/d/1AErjcDxi-DdLuBxD5DIHUAxdbCl_Gskv/view?usp=sharing