Classical Computer Vision Methods

This article provides a complete, concise, mathematically supported explanation of all important classical computer vision techniques. This includes edge detection, feature descriptors, tracking, segmentation, transforms, stereo, motion analysis, and more.

1. Edge, Corner & Keypoint Detectors

1.1 Sobel, Prewitt, Roberts Operators

These detect edges by convolving horizontal and vertical gradient kernels.

$$ G_x = I * S_x, \qquad G_y = I * S_y $$

$$ |G| = \sqrt{G_x^2 + G_y^2} $$

1.2 Laplacian of Gaussian (LoG)

Detects edges via second derivatives and zero-crossings.

$$ \text{LoG}(x) = \nabla^2 (G_\sigma * I) $$

1.3 Difference of Gaussian (DoG)

$$ \text{DoG} = G_{\sigma_1} - G_{\sigma_2} $$

1.4 Canny Edge Detector

Involves Gaussian smoothing, gradient computation, non-max suppression, hysteresis thresholding.

1.5 Harris Corner Detector

$$ M= \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix} $$

$$ R = \det(M) - k(\text{trace}(M))^2 $$

1.6 Shi–Tomasi Corner Detector

$$ R = \min(\lambda_1,\lambda_2) $$

This figure demonstrates classical edge and corner detection techniques applied to the same grayscale image. It includes Sobel and Laplacian of Gaussian (LoG) for gradient-based edge detection, Difference of Gaussian (DoG) for multi-scale edge detection, and Canny for robust edge extraction. Corner detection is shown with Harris, Shi-Tomasi, and FAST, highlighting distinctive points in the image. Each method emphasizes different structural details, helping visualize edges, corners, and key features for image analysis tasks.

1.7 SUSAN (Smallest Univalue Segment Assimilating Nucleus)

Counts pixels similar to the center; corner occurs when USAN area is small.

1.8 FAST (Features from Accelerated Segment Test)

Checks N pixels on a circle if they are brighter/darker than the center.

1.9 AGAST (Adaptive and Generic Accelerated Segment Test)

FAST improved using adaptive decision trees.

2. Feature Descriptors

2.1 SIFT (Scale-Invariant Feature Transform)

Uses DoG keypoints, orientation histograms, and 128-D descriptors.

2.2 SURF (Speeded Up Robust Features)

Efficient LoG approximation using box filters and integral images.

2.3 BRIEF (Binary Robust Independent Elementary Features)

$$ d_i = \begin{cases} 1 & I(p_i) < I(q_i) \\ 0 & \text{otherwise} \end{cases} $$

2.4 ORB (Oriented FAST and Rotated BRIEF)

Combines FAST keypoints with rotation-corrected BRIEF descriptors.

2.5 BRISK (Binary Robust Invariant Scalable Keypoints)

Scale-space sampling and binary intensity comparisons.

2.6 FREAK (Fast Retina Keypoint)

Binary descriptor based on retina-inspired sampling.

2.7 HOG (Histogram of Oriented Gradients)

Histograms of gradient orientations inside cells.

This figure illustrates various classical feature detection and description techniques applied to the same grayscale image. It includes SIFT, ORB, BRISK for keypoint detection, highlighting distinctive points in the image; HOG for gradient-based texture representation; LBP for local texture patterns; and an approximate GIST descriptor using multiple Gabor filters to capture global scene structure. Each method visualizes different aspects of the image, helping in feature extraction, texture analysis, and object recognition tasks.

2.8 LBP (Local Binary Patterns)

$$ \text{LBP}(x)=\sum_{p=0}^{P-1}s(I_p-I_c)2^p $$

2.9 Shape Context

Histogram describing relative spatial distribution of points.

2.10 GIST Descriptor

Global scene representation using multi-scale Gabor filters.

3. Feature Matching & Tracking

3.1 RANSAC (Random Sample Consensus)

Fits a robust model by sampling minimal sets and counting inliers.

3.2 Lucas–Kanade Optical Flow

$$ I_x u + I_y v + I_t = 0 $$

3.3 Horn–Schunck Optical Flow

$$ E = \iint (I_x u + I_y v + I_t)^2 + \lambda(|\nabla u|^2 + |\nabla v|^2)\,dx\,dy $$

3.4 KLT (Kanade–Lucas–Tomasi) Tracker

Tracks Shi–Tomasi features using Lucas–Kanade optical flow.

4. Filters & Transform Methods

4.1 Gaussian Filter

$$ G_\sigma(x)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2}{2\sigma^2}} $$

4.2 Median Filter

Replaces each pixel with the neighborhood median.

4.3 Bilateral Filter

$$ I'(x)=\frac{1}{W_x}\sum_p I(p)\, e^{-\frac{\|x-p\|^2}{2\sigma_s^2}} e^{-\frac{|I(x)-I(p)|^2}{2\sigma_r^2}} $$

4.4 Anisotropic Diffusion (Perona–Malik)

$$ \frac{\partial I}{\partial t}=\nabla \cdot (c(\|\nabla I\|)\nabla I) $$

4.5 Fourier Transform

$$ F(u,v)=\sum_x\sum_y I(x,y)e^{-j2\pi(ux/M+vy/N)} $$

4.6 Discrete Cosine Transform (DCT)

Used in JPEG compression.

4.7 Wavelet Transform

Multi-resolution analysis using scalable basis functions.

This figure shows the same grayscale image processed with different techniques. It includes smoothing filters (Gaussian, Median, Bilateral), edge-preserving denoising (Anisotropic Diffusion), frequency analysis (Fourier Transform), multi-scale approximation (Wavelet), and line detection (Hough Transform). Each method highlights different aspects of the image, helping visualize noise reduction, structure, and key features.

4.8 Radon Transform

$$ R(\rho,\theta)=\int I(x,y)\delta(\rho - x\cos\theta - y\sin\theta)\,dx\,dy $$

4.9 Hough Transform

Votes in parameter space to detect lines and shapes.

5. Segmentation Methods

5.1 K-means Segmentation

$$ \arg\min \sum_i\|x_i - \mu_{c_i}\|^2 $$

5.2 Graph Cut

$$ E(L)=U(L)+V(L) $$

5.3 GrabCut

Uses Graph Cut + Gaussian Mixture Models.

5.4 Watershed

Treats gradient magnitude as a topographic map.

5.5 Mean Shift Segmentation

Clusters by shifting data toward local maxima.

5.6 Felzenszwalb–Huttenlocher Algorithm

Graph-based region merging based on internal variation.

5.7 SLIC (Simple Linear Iterative Clustering) Superpixels

Clusters in Lab + xy 5-dimensional space.

This figure shows several classical image segmentation techniques applied to the same input image. It includes K-means clustering, GrabCut foreground extraction, Watershed based on image gradients, Felzenszwalb graph-based segmentation, and SLIC superpixels. Each method splits the image into meaningful regions in different ways, highlighting boundaries, objects, and structural elements.

5.8 Active Contours (Snakes)

$$ E = \alpha |\mathbf{v}'|^2 + \beta |\mathbf{v}''|^2 + E_{\text{image}} $$

5.9 Level Set Methods

$$ \frac{\partial \phi}{\partial t} = F|\nabla\phi| $$

6. Classical Object Detection

6.1 Viola–Jones (Haar Cascade)

Uses Haar features, integral images, AdaBoost, and cascaded classifiers.

6.2 HOG + SVM (Support Vector Machine)

$$ \min_w\|w\|^2 \quad \text{s.t. } y_i(w^\top x_i + b) \ge 1 $$

6.3 Deformable Part Models (DPM)

$$ S = w_0\phi(\text{root}) + \sum_i (w_i\phi(\text{part}_i) - d_i) $$

This figure demonstrates traditional object detection and tracking techniques. It includes Viola–Jones face detection, HOG+SVM pedestrian detection, and Template Matching for locating repeated patterns. Motion-related methods include Optical Flow, Background Subtraction, and Mean Shift tracking, showcasing how classical algorithms analyze movement and detect objects in images.

6.4 Template Matching

$$ R(x,y)=\sum_{u,v} I(x+u,y+v)T(u,v) $$

7. Stereo Vision & 3D

7.1 Block Matching

$$ \text{SAD}(d)=\sum |I_L(x,y)-I_R(x-d,y)| $$

7.2 Semi-Global Matching (SGM)

Aggregates matching cost across multiple directions.

7.3 Epipolar Geometry

$$ x_2^\top F x_1 = 0 $$

7.4 Essential Matrix

$$ E = [t]_\times R $$

7.5 Triangulation

$$ X = \arg\min_X \sum_i \|x_i - P_i X\|^2 $$

7.6 Structure from Motion (SfM)

Estimates camera poses and 3D structure from multiple views.

7.7 Bundle Adjustment

$$ \arg\min \sum_{i,j} \|x_{ij} - P_i X_j\|^2 $$

7.8 Visual Odometry

Estimates camera motion using sequential feature correspondences.

8. Motion Analysis & Tracking

8.1 Background Subtraction (Mixture of Gaussians - MOG/MOG2)

$$ p(x)=\sum_k w_k \mathcal{N}(\mu_k,\Sigma_k) $$

8.2 Kalman Filter

$$ x_k = A x_{k-1}+w,\qquad z_k = Hx_k+v $$

8.3 Particle Filter

Approximates posterior distribution using weighted particles.

8.4 Mean Shift Tracking

Tracks objects by iteratively shifting kernel windows.

8.5 CAMShift (Continuously Adaptive Mean Shift)

Enhances Mean Shift with adaptive window size.

References

Gonzalez, R. C. & Woods, R. E. (2018). Digital Image Processing (4th ed.).
Szeliski, R. (2011). Computer Vision: Algorithms and Applications.
Forsyth, D. A. & Ponce, J. (2012). Computer Vision: A Modern Approach (2nd ed.).

Learning to Learn