🆕 [2023-10-26] Added DINOv2 backbones with registers, following Vision Transformers Need Registers.

DINOv2: Learning Robust Visual Features without Supervision

Meta AI Research, FAIR

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, Armand Joulin, Piotr Bojanowski

[Paper #1] Paper #2] [Blog] [Demo] [BibTeX]

PyTorch implementation and pretrained models for DINOv2. For details, see the papers: DINOv2: Learning Robust Visual Features without Supervision and Vision Transformers Need Registers.

DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations.