Research
Weak Supervision for White Matter Lesion Detection in Adolescents
IN PREPARATIONThe Big Picture
Leukoaraiosis — rarefaction of the cerebral white matter — is a biomarker of cerebrovascular disease typically studied in adults. Its detection in adolescence could reveal early developmental risk trajectories before clinical onset. The ABCD Study provides the largest pediatric neuroimaging dataset in existence (11,868 participants, 7 longitudinal sessions, 21 acquisition sites), yet only a coarse radiological quality-control score is available as supervision. No voxel-level annotations exist. Fewer than 5% of scans are positive. Multi-site scanner heterogeneity introduces systematic intensity biases. The absence of FLAIR sequences — the clinical gold standard for white matter hyperintensities — forces the model to infer lesion presence from T1w hypointensities and T2w hyperintensities alone, where the ambiguity with CSF signal is significant. This combination of challenges makes it one of the hardest setups in medical imaging.
What I did
I built a five-phase weakly-supervised pipeline centered on a Swin UNETR backbone — a shifted-window 3D transformer that captures long-range dependencies across white matter regions where CNN receptive fields fall short. The model takes paired T1w and T2w volumes as a dual-channel input (2 × 96³ voxels, 62M parameters), encodes them through four hierarchical stages, and outputs a binary lesion classification score via Global Average Pooling and an MLP head. To handle the extreme class imbalance, I replaced cross-entropy with APLoss (LibAUC), a differentiable surrogate that directly optimizes the precision-recall AUC. Cross-validation uses StratifiedGroupKFold, grouped by subject ID to prevent anatomical leakage across the 7 longitudinal sessions per participant. Phase 4 extracts 3D Grad-CAM saliency maps to localize discriminative regions post-hoc. Phase 5 applies Otsu thresholding to generate pseudo-masks for self-training a segmentation decoder with DiceFocalLoss. The full pipeline runs on Wynton HPC (NVIDIA A100 80 GB, SLURM) with gradient checkpointing reducing VRAM by ~40%.
Key results
- End-to-end pipeline implemented across 5 phases: manifest construction, MONAI preprocessing, Swin UNETR classification, Grad-CAM localization, and pseudo-mask segmentation fine-tuning
- Preprocessing handles 21-site heterogeneity via RAS reorientation, isotropic 1 mm resampling, per-channel z-score normalization, and intensity augmentation (noise, blur, scaling)
- APLoss converges on fold-balanced splits; AUPREC > 0.60 threshold gates the Grad-CAM phase, ensuring pseudo-masks are derived from a converged classifier
- Inference script produces NIfTI heatmaps + 3×5 PNG report grid + volumetric CSV per subject (voxel count × 1 mm³ estimated lesion volume)
- Pipeline ready for deployment on Wynton; awaiting finalized clfind_score labels CSV from clinical team
Connectivity Fingerprint Analysis of DBS Stimulation Sites
TECHNICAL REPORT — APRIL 2026The Big Picture
Deep Brain Stimulation is an established treatment for refractory depression and OCD, yet electrode placement remains largely empirical. Clinicians choose among several anatomical targets — subgenual cingulate (SGC), ventral capsule/striatum (VC/VS), bed nucleus of the stria terminalis (BNST), orbitofrontal cortex (OFC) — without a principled way to predict which site or amplitude will produce a therapeutic response. The 'destination network convergence' hypothesis proposes that all effective DBS, regardless of stimulation site, engages a common downstream brain network. If true, structural connectivity fingerprints derived from Volumes of Tissue Activated (VATs) could serve as biomarkers for DBS programming optimization — enabling data-driven targeting in future implantations.
What I did
Using diffusion MRI tractography from 5 patients in the Presidio neuromodulation cohort (UCSF), I computed SIFT2-weighted connectivity fingerprints for 116 unique VAT configurations — 98-dimensional vectors encoding structural connectivity strength to each FreeSurfer parcellation region (68 cortical Desikan-Killiany + 16 subcortical + 14 brainstem). I tested the convergence hypothesis via Pearson cross-target correlation matrices and permutation tests (n=10,000). I then ran a comprehensive ML validation: OLS regression with patient fixed effects (FDR-corrected), Lasso and Ridge logistic regression under LOO-116 and LOPO-5 cross-validation, nonlinear SVM with RBF kernel, Gaussian Process classification on 20 principal components, Sparse PCA, and FastICA. A hemisphere mirror test controlled for the 73% left-sided effective VAT imbalance.
Key results
- Convergence hypothesis NOT supported: delta-r = +0.030 (permutation p = 0.114, bootstrap 95% CI [−0.139, +0.173]); 2 of 6 target pairs show negative delta-r, meaning non-therapeutic fingerprints are more correlated
- VC_VS is the only target with clear efficacy separation (cosine distance = 0.690, p < 0.001; within-group similarity = 0.53; GP classification accuracy = 0.952 on VC_VS VATs)
- Left ventral striato-pallido-thalamic circuit identified as key signature across methods: L_Pallidum (100% Lasso selection, FDR-significant OLS β = +0.209, permutation importance rank #2), L_Thalamus (86% Lasso selection, permutation rank #1), L_Caudate (FDR-significant)
- Best cross-patient generalization: Lasso LOPO-CV BACC = 0.667, AUC = 0.682 (sparse features generalize; Ridge BACC = 0.538 barely above chance); best LOO: Gaussian Process BACC = 0.854, AUC = 0.886
- Patient identity dominates variance (~95% of fingerprint variance); efficacy contributes ~5% — cross-patient analyses require substantially larger cohorts
Multi-Scale Temporal Masking and Distribution Regularization for Self-Supervised fMRI Learning
PREPRINTThe Big Picture
Resting-state fMRI captures spontaneous brain activity with temporal structure spanning multiple timescales — from fast cortical fluctuations (0.05–0.1 Hz, < 20 s) to slow default-mode oscillations (0.01–0.05 Hz, ~20–100 s) and infra-slow vigilance drifts (< 0.01 Hz, > 2 min). Brain-JEPA adapts I-JEPA to rs-fMRI, but its single-scale masking generates target patches spanning at most 3 consecutive temporal windows — analogous to asking a language model to only predict the next token. This local-only prediction objective cannot force the encoder to build long-range temporal abstractions. Separately, JEPA-style predictive architectures are theoretically susceptible to representational collapse: the encoder can minimize the prediction loss with trivial constant embeddings, which provide no signal for downstream tasks.
What I did
I extended Brain-JEPA with two complementary modifications, theoretically motivated by LeJEPA's proof that isotropic Gaussian embeddings minimize worst-case downstream prediction risk. First, multi-scale temporal masking: a modified collator that simultaneously generates target patches at three temporal proportions of the input grid (0.03, 0.15, and 0.45), forcing the predictor to handle both short-range interpolation and long-range extrapolation of brain dynamics within a single forward pass. Second, VICReg covariance regularization: an auxiliary loss (λ = 0.04) that penalizes off-diagonal embedding covariances and variance deviations from unity. Both modifications were evaluated in a systematic 2×2 ablation on the UCLA Consortium for Neuropsychiatric Phenomics dataset (ds000030, N=261) using a frozen linear probe predicting biological sex, with 5-fold stratified cross-validation.
Key results
- Best AUC: S+M combined = 0.567 ± 0.068 vs. baseline 0.542 ± 0.097 (+4.6% relative improvement)
- VICReg alone (AUC = 0.556 ± 0.053) most notably halves the inter-fold variance (std 0.053 vs. 0.097 baseline), consistent with more stable representation geometry and reduced sensitivity to data splits
- Multi-scale masking alone does not improve mean AUC on this dataset: short scan duration (W=10 patches) collapses the short and medium scales to identical effective windows (1 patch = 32 s); three truly distinct scales require longer acquisitions (UK Biobank W≈30, ABCD W≈60+)
- VICReg partially mitigates the degraded Fold 4 (AUC 0.455 vs. 0.353 for baseline), consistent with more robust representations across data subsets
- Codebase and experimental framework established as foundation for scaling to ABCD (N>10,000) with downstream prediction of CBCL depression/anxiety scores — clinically meaningful targets where multi-scale temporal representations may provide stronger gains