πŸ‘οΈViT - ImageNet

Presentation

Vision Transformer (ViT) model was pre-trained by Google on ImageNet-21k (14 million images, 21,843 classes), and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes). The model is hosted by Hugging Face and you can find its complete description card here.

ViT - ImageNet performs image classification, it add tags corresponding to the detected object onto the image. If the detected object class does not exist in the dataset, the AI assistant will create it.

An image automatically tagged as "Ox" by the 'ViT - ImageNet' AI Assistant

Classes

ViT - ImageNet AI Assistant is able to detect 1000 different classes of objects (some better than others we must say...!).

Last updated

Was this helpful?