Computer Vision Foundations and Model Architectures Foundations of Computer Vision and Model Architectures Computer Vision (CV) focuses on enabling machines to understand visual data . Modern CV systems rely on deep neural networks that perform tasks such as image classification, object detection, and image segmentation . This blog provides a structured overview of these tasks and the most commonly used architectures behind them. Rather than treating models as black boxes, we focus on why each architecture was introduced , what problems it solved , and where it is used today . 1. Core Vision Tasks Image Classification Assigns a single label (or multiple labels) to an entire image. $$ \hat{y} = \arg\max_y p(y \mid x) $$ Object Detection Predicts both what objects are present and where they are. $$ (\text{class}, x, y, w, h) $$ Segmentation Assigns a class label to each pixel. $$ p(y_i \mid x) $$ Classification answers what , detection ...