From the course: The AI Ecosystem for Developers: Models, Datasets, and APIs
Unlock this course with a free trial
Join today to access over 24,600 courses taught by industry experts.
Computer vision architectures: CNNs and vision transformers
From the course: The AI Ecosystem for Developers: Models, Datasets, and APIs
Computer vision architectures: CNNs and vision transformers
- [Instructor] Computer vision is a subfield of AI focused on enabling machines to interpret and understand visual data, such as images and videos. AI architecture in this domain are designed to recognize patterns, detect objects, and analyze spatial relationships, making them essential for tasks like image classification, object detection, and facial recognition. Some of the most popular model architectures used in computer vision include convolutional neural networks, CNNs. CNNs are class of deep learning models designed specifically for analyzing visual data. They are highly effective at capturing spatial hierarchies and patterns within images. Components of CNNs include convolutional layers. These extract features such as edges, textures, and shapes from an image. Activation functions. For example, ReLU. This introduces non-linearity to help the network learn complex patterns. Pooling layers. This reduces the spatial dimensions of feature maps, improving computational efficiency…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
Introduction to AI models and architecture5m 11s
-
(Locked)
NLP architectures: RNNs and transformers5m 49s
-
(Locked)
Computer vision architectures: CNNs and vision transformers6m 25s
-
(Locked)
Generative architectures: Diffusion and GANs6m 10s
-
(Locked)
Multimodal architectures: CLIP and Flamingo5m 29s
-
(Locked)
Efficient architectures7m 32s
-
-
-
-
-