Conditional Diffusion for Brightfield to Fluorescence Image Translation
Fluorescence microscopy provides critical biological insights, but acquiring fluorescent images is often time-consuming, expensive, and phototoxic. This blog describes a conditional diffusion model that translates Brightfield (BF) images into corresponding fluorescence channels (red or green), using a unified and probabilistic generative framework.
1. Problem Setup
Given:
- Brightfield image \( x_{\text{BF}} \)
- Fluorescence image \( x_0 \) (Red or Green)
The goal is to learn:
where \( c \) is a condition vector indicating the desired fluorescence channel:
- Red channel → \( c = [1, 0] \)
- Green channel → \( c = [0, 1] \)
2. Data Preparation
Each training sample consists of a triplet:
- Brightfield image
- Red fluorescence
- Green fluorescence
- All images resized to 256 × 256
- Pixel values normalized to [-1, 1]
- One fluorescence channel selected per iteration
3. Input Construction
At each diffusion step, the model receives a 6-channel input:
- Noisy fluorescence image \( x_t \) (1 channel)
- Brightfield RGB image (3 channels)
- Condition map \( c \) broadcast spatially (2 channels)
This produces:
4. Forward Diffusion Process
Noise is gradually added to the clean fluorescence image:
where:
- \( \epsilon \sim \mathcal{N}(0, I) \)
- \( t \) is sampled uniformly
- \( \alpha_t \) follows a predefined noise schedule
5. Conditional UNet Denoiser
A UNet receives:
and predicts the noise component \( \hat{\epsilon} \).
From this, a clean fluorescence estimate is reconstructed:
6. Training Losses
Two complementary losses guide training:
6.1 Noise Prediction Loss
6.2 Perceptual Loss (VGG16)
Total Loss
7. Optimization & EMA
Training details:
- Optimizer: AdamW
- Exponential Moving Average (EMA) of weights
8. Inference: Reverse Diffusion
At inference time, fluorescence images are generated by running the reverse diffusion process, starting from pure Gaussian noise.
For each timestep \( t = T, T-1, \ldots, 1 \), the model predicts the noise component \( \hat{\epsilon}_t = \epsilon_\theta(x_t, t, x_{\text{BF}}, c) \), conditioned on the Brightfield image and the desired fluorescence channel.
The reverse DDPM update is given by:
where:
- \( z \sim \mathcal{N}(0, I) \) if \( t > 1 \), and \( z = 0 \) if \( t = 1 \)
- \( \alpha_t = 1 - \beta_t \)
- \( \bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s \)
- \( \sigma_t = \sqrt{\beta_t} \) (or an equivalent variance schedule)
As \( t \) decreases, noise is gradually removed and biologically meaningful fluorescence structures emerge, guided by Brightfield morphology and the specified fluorescence modality.
Important Losses
- Pixel-wise reconstruction loss (intensity matching)
Pros: Enforces accurate intensity matching and preserves overall fluorescence levels.
Cons: Sensitive to misalignment and produces overly smooth (blurry) outputs. - Perceptual loss (structure matching)
Pros: Preserves tissue morphology and high-level structural features.
Cons: Depends on pre-trained features and may miss fine biological details. - Structural similarity loss (SSIM) (visual similarity matching)
Pros: Maintains cellular structure and contrast consistent with human perception.
Cons: Weak at enforcing absolute intensity accuracy. - Laplacian loss (Edge/high-frequency loss) (Edge/fine detail matching)
Pros: Enhances sharp edges and fine cellular boundaries.
Cons: Amplifies noise and is sensitive to registration errors.
My Work: Conditional Diffusion Framework
- Conditional Diffusion Framework used to translate BF → Red/Green fluorescence
- UNet backbone for epsilon prediction, conditioned on:
- Noisy fluorescence image
- BF RGB image
- One-hot fluorescence type (red/green)
- 6-channel input enabling multi-modal feature learning
- DDPM noise schedule applied during forward diffusion to corrupt fluorescence targets
- Model learns to predict noise (𝜖-prediction) at each timestep
- Reconstruction of clean fluorescence from predicted 𝜖
- Loss functions:
- L1 denoising loss
- VGG16 perceptual loss (weighted)
- EMA (Exponential Moving Average) of weights for stable inference
- Reverse diffusion process generates final fluorescence output from pure noise at inference
Dataset
Dataset: 4 sets (8 folders) from different environments.
Training/Validation: 3 sets, split 80/20 per folder.
Testing: 1 held-out set.
Sample counts: Train: 159 | Val: 42 | Test: 51.
Training epochs: 100.
Dataset source: Kaggle - Brightfield vs Fluorescent Staining Dataset
Results
Training and validation losses over 100 epochs. The model converges steadily, showing good generalization and stable learning for both training and validation sets.
Each result shows 5 images from left to right: BF input, Red GT, Red Pred, Green GT, Green Pred.
Future Improvements
Due to GPU limitations, our current results are limited to 256×256 resolution and a moderate UNet size. For better results, the following improvements can be considered:
- Increase image size (256 → 512+) – captures finer cellular details.
- Use more training data – improves generalization and robustness.
- Deeper/wider UNet – enhances feature extraction and captures complex structures.
- Diffusion + GAN loss – generates sharper outputs and preserves high-frequency features.
- Additional loss functions – e.g., SSIM and Laplacian loss can further improve structural similarity and edge fidelity.
GitHub Repository
This repository contains the complete code for training and evaluating the conditional diffusion model that translates Brightfield (BF) images into Red and Green fluorescence channels. It includes data preprocessing, model architecture, training scripts, and inference examples.

Conditional Diffusion for Brightfield-to-Fluorescence Translation
Comments
Post a Comment