The Challenges With Advancing Radiomics and Pathomics

The biggest challenge in these fields, Dr Rubin and Dr Prior both said, is getting enough data. Just 400 samples “doesn’t generalize to anything,” Dr Prior said. Hundreds of thousands or millions of images are needed, yet, for example, only approximately 15% to 20% of pathology practices digitize their data, Dr Rubin said.

Data digitization is more common in radiology, but lack of data sharing remains a problem. The Cancer Imaging Archive (TCIA), a National Cancer Institute of the National Institutes of Health initiative created and operated by Dr Prior and his team, is attempting to fulfill this need, but its current collections remain relatively small for what the field requires. And access to data is just one challenge of many.

“The second problem is you need labeled data for most machine-learning algorithms, so it’s not just data, but labeled data,” Dr Prior told Cancer Therapy Advisor. Without access to the algorithms researchers use, it’s impossible to replicate their work, Dr Rubin added, so lack of reproducibility has become another issue.

Related Articles

Another hurdle is the dearth of comparison images derived from people without cancer. “We don’t collect normals, so that makes it difficult to look at the overall variability because we’re only looking at people who develop the variants and people who develop the cancer,” Dr Prior said.

There are exceptions, however: the TCIA has brain imaging data from the Human Genome Project and other neuroscience projects involving large numbers of healthy volunteers, whose images can be compared with those from patients with glioblastoma. Similarly, the repository has about 30,000 computed tomography (CT) scans of lungs from heavy smokers — most of which are from individuals who did not have cancer — that were captured and entered as part of the National Lung Screening Trial.

“That’s why those are the most highly used data sets from TCIA, and there’s a whole lot of work in lung screening and radiomics — because we have that kind of data,” Dr Prior said. “We don’t have that data in volume and in both normal and pathological examples for any other cancer type.”

The TCIA has sizable imaging collections for the breast and prostate, as well as a decent collection of scans of head and neck cancers, Dr Prior said, but it has fewer “normals” for those cancers. What it has often depends on what clinical trials are running, as trials are a major source of the database’s images.

But there still aren’t nearly enough images, nor are the images diverse enough. It’s unknown whether any of the large imaging companies have their own data sets. A European analogue to TCIA recently shut down, according to Dr Prior, although researchers are trying to get it back online again, and TCIA representatives have spoken with contemporaries in China and India about interest in creating repositories in those countries.

“We’re seeing a growing movement around the world, and we’re trying to see how we might link those together,” Dr Prior said. “If we’re trying to link repository information across a planet where the rules and laws are different, we need mechanisms for data sharing that don’t necessarily mean that you move it all into one big repository.” An active area of research is exploring how to “accumulate cancer imaging information on a global scale and make it easily accessible to researchers.”

Another not-insignificant obstacle is, again, heterogeneity — not how it applies to patient populations, but how it relates to how images are procured.

“Imaging technology evolves very quickly,” Dr Rubin said. “Pathology is more standardized,” he added, but with radiology, the different vendors of different equipment and different parameters and algorithms used in the equipment can all “affect image quality enough to impact the results of computerized analysis.”

In short, if the input isn’t consistent, the output can’t be trusted.

“The machines themselves, in general, are less of a problem than the way the machines are used,” Dr Prior said.  “There’s very little quality control in the acquisition protocol” because radiologists have different preferences, thereby introducing variability in the acquisition process. “You can’t have reliable radiomics unless you have standardized imaging protocols,” Dr Rubin said. The Radiological Society of North America (RSNA) has been working on this problem with the Quantitative Imaging Bio Alliance (QIBA), but, yet again, more challenges remain.

“Even if you’re controlling protocols, standards change over time,” as do software and calibrations, Dr Prior said. “It’s very difficult to maintain a scanner at the exact software level at the 4 or 5 years of a clinical trial,” he said. Nonetheless, he remains optimistic about how radiomics and pathomics will ultimately contribute to precision medicine. The challenges are substantial, but they aren’t insurmountable — and cancer research has always been a long game.


  1. Banna GL, Olivier T, Rundo F, et al. The promise of digital biopsy for the prediction of tumor molecular features and clinical outcomes associated with immunotherapy. Front Med (Lausanne). 2019;6:172.
  2. Garassino MC. PR02.07 – Evaluation of TMB in KEYNOTE-189: pembrolizumab plus chemotherapy vs placebo plus chemotherapy for nonsquamous NSCLC. Presented at: IASLC 2019 World Conference on Lung Cancer; September 7-10, 2019: Barcelona, Spain.
  3. Tweet by Charu Aggarwal, MD, MPH (@CharuAggarwalMD). Published September 8, 2019. Accessed October 10, 2019.
  4. Sun R, Limkin EJ, Vakalopoulou M, et al. A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol. 2018;19(9):1180-1191.
  5. Yu K-H, Zhang C, Berry GJ, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474.