A multi-omic machine learning model can accurately predict response to breast cancer treatment, according to a study published in Nature.
Researchers found that a model integrating clinical, molecular, digital pathology, and treatment data could predict pathological complete response (pCR) to neoadjuvant treatment with “high accuracy.”
For this study, the researchers prospectively enrolled 180 women with early and locally advanced breast cancer who received chemotherapy, with or without HER2-targeted therapy, before surgery.
Pretreatment biopsies from 168 patients were profiled using DNA and RNA sequencing and digital pathology analysis.
The researchers performed response assessment at surgery using the residual cancer burden (RCB) classification in 161 patients who completed neoadjuvant treatment. The pCR rate was 26%, 16% of patients were classified as RCB-I, 40% RCB-II, and 18% RCB-III.
The clinical features associated with pCR in a univariable analysis were tumor grade, estrogen receptor (ER) status, and the absence of lymph node involvement at diagnosis. In a multivariable analysis, only ER status was associated with pCR (odds ratio [OR], 3.8; 95% CI, 1.6-9.2; false discovery rate [FDR] =.009).
Whole-exome sequencing of 168 tumors revealed 16,134 somatic mutations, most commonly observed in driver genes, including TP53 (57%), PIK3CA (26%), GATA3 (10%), and MAP3K1 (8%).
TP53 mutations were associated with pCR (OR, 2.9; 95% CI, 1.3-6.6; P =.01), and PIK3CA mutations were associated with residual disease (OR, 2.1; 95% CI, 1.3-3.4; P =.002).
The researchers also found that tumor mutation burden was higher in samples with a pCR than in those with residual disease. The median mutations per megabase were 2.3 and 1.4, respectively (P =.0005)
Other features associated with pCR were homologous recombination deficiency (OR, 1.1; P =.006) and APOBEC signature (OR, 1.1; P =.02). Tumors with a pCR had more copy number alterations, and chromosomal instability was monotonically associated with RCB class (P =.0002), the researchers noted.
The team also identified 2071 genes that were underexpressed and 2439 genes that were overexpressed in tumors with pCR (FDR <.05). pCR was associated with overexpression of CDKN2A, EGFR, CCNE1, and MYC, and underexpression of CCND1, ZNF703, and ESR1.
Taking these findings together, the researchers developed 6 machine-learning models that integrated the features associated with pCR. The different models integrated (1) clinical features only, (2) clinical features plus DNA, (3) clinical features plus RNA, (4) clinical features plus DNA and RNA, (5) clinical features plus DNA, RNA, and digital pathology, and (6) clinical features, DNA, RNA, digital pathology, and treatment.
The researchers validated these models in an independent cohort of 75 patients who received neoadjuvant therapy. The area under the curve was 0.70 for the clinical-only model, 0.80 for the clinical-DNA model, 0.86 for the clinical-RNA model, 0.86 for the clinical-DNA-RNA model, 0.85 for the clinical-DNA-RNA-digital pathology model, and 0.87 for the model integrating all factors.
The researchers said these results suggest that multi-omic machine learning models can outperform models based on clinical variables alone.
“The high accuracy obtained in external validation suggests that the models are robust and may enable using molecular and digital pathology to determine therapy choice in future clinical trials, including in the adjuvant therapy setting,” the researchers wrote.
“More generally, the framework highlights the importance of data integration in machine learning models for response prediction and could be used to generate similar predictors for other cancers,” they concluded.
Sammut SJ, Crispin-Ortuzar M, Chin SF, et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature. Published online December 7, 2021. doi:10.1038/s41586-021-04278-5