MMIBC Case Study — Testimony Adekoya

01 /

The Problem

Breast cancer remains one of the leading causes of death among women globally. In Sub-Saharan Africa, the burden is disproportionately high — late-stage diagnosis is common, trained radiologists are scarce, and advanced imaging infrastructure is expensive. The modalities that are accessible — mammography and ultrasonography — are typically used in isolation, limiting their diagnostic potential.

The AI systems that have shown promise in breast cancer screening are largely unimodal and suffer from a critical limitation: they are black boxes. A model that achieves expert-level accuracy but cannot explain its reasoning erodes clinical trust. Practitioners understandably resist basing patient care decisions on systems they cannot understand or verify.

The research gap: No prior framework fused strictly non-invasive, accessible modalities (mammography + ultrasound) with first-class explainability designed specifically for resource-limited settings in Africa — without requiring biopsy data, genomics, or paired multimodal datasets.

02 /

Approach

MMIBC (Multimodal Medical Imaging for Breast Cancer) proposes an explainable multimodal Vision Transformer framework that fuses mammography and ultrasonography for unified breast cancer diagnosis. The design addresses three constraints simultaneously: low-resource deployment, unpaired dataset availability, and clinical interpretability.

Mammography
VinDr-Mammo
X-ray images

Ultrasonography
BUSI dataset
Ultrasound images

ViT Encoder A
Modality-specific
feature extraction

ViT Encoder B
Modality-specific
feature extraction

Feature Fusion
Feature-level
concatenation

MLP Head
Lightweight
classifier

Diagnosis

Benign / Malignant
/ Normal

Grad-CAM XAI Layer
Gradient-weighted Class Activation Maps — highlights clinically relevant anatomical regions for each prediction

Input processing

Fusion & classification

Explainability

A key challenge was the absence of natively paired multimodal datasets — mammography and ultrasound images of the same patient taken simultaneously. To address this, MMIBC introduces a label-consistent programmatic pairing strategy: samples from the mammography and ultrasound datasets are paired by diagnostic label during training, enabling multimodal learning under realistic data constraints without requiring simultaneous dual-modality imaging.

03 /

Datasets

MMIBC uses two publicly available datasets representing diverse African and Asian populations — deliberately chosen to reflect real-world deployment contexts where proprietary hospital datasets are unavailable.

VinDr-Mammo

Mammography — X-ray

Large-scale mammography dataset from Vietnamese hospitals. Provides X-ray images with radiologist-annotated BI-RADS assessments across benign and malignant findings.

Modality: Mammography

Origin: Vietnam

BUSI

Breast Ultrasound Images

Breast Ultrasound Images dataset collected from Egyptian women aged 25–75. Covers normal, benign, and malignant cases with expert segmentation masks.

Modality: Ultrasonography

Origin: Egypt

04 /

Key Contributions

I

Explainable Multimodal Architecture

Parallel pretrained ViT backbones for modality-specific representation learning, followed by feature-level fusion and a lightweight MLP classification head. Designed to run on accessible imaging hardware without high-compute infrastructure.

II

Programmatic Pairing Strategy

Label-consistent pairing of mammography and ultrasound samples enables multimodal training from publicly available but natively unpaired datasets — removing the dependency on simultaneous dual-modality imaging sessions.

III

Grad-CAM Clinical Transparency

Integration of Gradient-weighted Class Activation Mapping produces visual heatmaps highlighting the anatomical regions that drove each diagnostic decision — providing interpretable evidence that frontline clinicians can evaluate and trust.

IV

Context-Specific Benchmarking

Detailed evaluation of performance trade-offs arising from dataset class imbalance in low-resource settings — surfacing real deployment constraints rather than reporting only best-case accuracy.

05 /

Results

MMIBC achieved 84% overall classification accuracy across three diagnostic classes (normal, benign, malignant) on the VinDr-Mammo and BUSI test sets. Qualitative explainability results confirmed that Grad-CAM attention maps align with clinically relevant anatomical regions — microcalcification clusters and lesion margins — rather than image artifacts or background features.

Class	Precision	Recall	F1 Score	Notes
Normal	High	High	High	Well-represented class
Benign	Moderate	Moderate	Moderate	Boundary cases with malignant
Malignant	Variable	Variable	Variable	Impacted by class imbalance
Overall			84%	Weighted accuracy

Explainability finding: Grad-CAM heatmaps consistently highlighted regions corresponding to known clinical indicators — microcalcifications in mammograms, hypoechoic lesion margins in ultrasound — confirming that the model's predictions are grounded in clinically relevant evidence, not spurious correlations.

Dataset imbalance — particularly for malignant cases — highlighted a critical real-world constraint: deployment in low-resource settings means working with skewed class distributions. The paper analyzes these trade-offs explicitly, which influenced the design of the evaluation protocol to go beyond aggregate accuracy.

06 /

Limitations & Future Work

The programmatic pairing strategy, while necessary and effective, is a proxy for true paired multimodal imaging. Future work should incorporate natively paired datasets as they become available. The class imbalance problem in malignant case detection warrants continued attention — oversampling strategies, focal loss formulations, and synthetic augmentation using diffusion models are all candidate approaches.

The framework was validated on African and Asian population data (Egypt, Vietnam). Extension to broader demographic datasets, particularly with West African imaging data where breast tissue density patterns differ, would strengthen clinical generalizability. Finally, prospective clinical validation with radiologist feedback loops is the essential next step before real-world deployment.