Learning to Learn

Computer Vision Foundations and Model Architectures

Computer Vision Foundations and Model Architectures Foundations of Computer Vision and Model Architectures Computer Vision (CV) focuses on enabling machines to understand visual data . Modern CV systems rely on deep neural networks that perform tasks such as image classification, object detection, and image segmentation . This blog provides a structured overview of these tasks and the most commonly used architectures behind them. Rather than treating models as black boxes, we focus on why each architecture was introduced , what problems it solved , and where it is used today . 1. Core Vision Tasks Image Classification Assigns a single label (or multiple labels) to an entire image. $$ \hat{y} = \arg\max_y p(y \mid x) $$ Object Detection Predicts both what objects are present and where they are. $$ (\text{class}, x, y, w, h) $$ Segmentation Assigns a class label to each pixel. $$ p(y_i \mid x) $$ Classification answers what , detection ...

Explore more 🐱‍🏍

Learning to Learn

Search This Blog

Posts

Computer Vision Foundations and Model Architectures