Skip to main content

Posts

Computer Vision Foundations and Model Architectures

Recent posts

Conditional Diffusion Models in Computational Microscopy

Conditional Diffusion for Brightfield to Fluorescence Translation Conditional Diffusion for Brightfield to Fluorescence Image Translation Fluorescence microscopy provides critical biological insights, but acquiring fluorescent images is often time-consuming, expensive, and phototoxic . This blog describes a conditional diffusion model that translates Brightfield (BF) images into corresponding fluorescence channels (red or green), using a unified and probabilistic generative framework. Instead of predicting fluorescence directly, the model learns how to iteratively denoise fluorescence images conditioned on Brightfield structure and channel identity. Brightfield (BF) Structural cell morphology captured without fluorescence labeling. Green Fluorescence Complementary fluorescence channel with distinct biological specificity. ...

LeJEPA: Predictive Learning With Isotropic Latent Spaces

LeJEPA: Predictive World Models Through Latent Space Prediction LeJEPA: Predictive Learning With Isotropic Latent Spaces Self-supervised learning methods such as MAE, SimCLR, BYOL, DINO, and iBOT all attempt to learn useful representations by predicting missing information. Most of them reconstruct pixels or perform contrastive matching, which forces models to learn low-level details that are irrelevant for semantic understanding. LeJEPA approaches representation learning differently: Instead of reconstructing pixels, the model predicts latent representations of the input, and those representations are regularized to live in a well-conditioned, isotropic space. These animations demonstrate LeJEPA’s ability to predict future latent representations for different types of motion. The first animation shows a dog moving through a scene, highlighting semantic dynamics and object consisten...

Classical Computer Vision Methods

Classical Computer Vision Methods Classical Computer Vision Methods This article provides a complete, concise, mathematically supported explanation of all important classical computer vision techniques . This includes edge detection, feature descriptors, tracking, segmentation, transforms, stereo, motion analysis, and more. 1. Edge, Corner & Keypoint Detectors 1.1 Sobel, Prewitt, Roberts Operators These detect edges by convolving horizontal and vertical gradient kernels. $$ G_x = I * S_x, \qquad G_y = I * S_y $$ $$ |G| = \sqrt{G_x^2 + G_y^2} $$ 1.2 Laplacian of Gaussian (LoG) Detects edges via second derivatives and zero-crossings. $$ \text{LoG}(x) = \nabla^2 (G_\sigma * I) $$ 1.3 Difference of Gaussian (DoG) $$ \text{DoG} = G_{\sigma_1} - G_{\sigma_2} $$ 1.4 Canny Edge Detector Involves Gaussian smoothing, gradient computation, non-max su...

DINOv3

DINOv3: Unified Global & Local Self-Supervision DINOv3: Unified Global & Local Self-Supervision DINOv3 extends the DINOv2 framework by combining global self-distillation with masked patch prediction . This allows the model to learn both image-level and dense, spatial representations within a single self-supervised pipeline. This image shows the cosine similarity maps from DINOv3 output features, illustrating the relationships between the patch marked with a red cross and all other patches (as reported in the DINOv3 GitHub repository ). If you find DINOv3 useful, consider giving it a star ⭐. Citation for this work is provided in the References section. 1. Student–Teacher Architecture As in DINOv2, DINOv3 uses a student–teacher setup: a student network with parameters \( \theta \) a teacher network with parameters \( \xi \) Both networks receive different augmented views of the inpu...

Deep Learning: From Simple Linear Pieces to Powerful Models

Deep Learning: From Simple Linear Pieces to Powerful Models Deep Learning — From Local Linearity to Compact Architectures At its heart, deep learning is a function-approximation engine. The central intuition is simple: zoom in on a complex curve and it looks almost linear . Neural networks exploit that by composing many linear transformations with nonlinear activations — producing a highly expressive, piecewise-linear (or smooth) approximation of the target function. 1. Local Linearity — the basic building block The basic operation in a neural network is a linear transformation followed by a nonlinearity: $$ \mathbf{z} = W \mathbf{x} + b, \quad \mathbf{a} = \sigma(\mathbf{z}) $$ Repeated over layers: $$ f(\mathbf{x}) = \sigma_n(W_n(\sigma_{n-1}(W_{n-1}(...\sigma_1(W_1 \mathbf{x} + b_1)...)+b_{n-1}) ) + b_n) $$ Notation: \( W_i \), \( b_i \): learnable weights and biases \( \sigma_i \): nonlinear activation (ReLU, GELU, etc.) \( f(\mathbf...