Scroll to top

Advancing Digital Pathology through Novel Machine Learning Methodologies

Funding Source

National Library of Medicine, R01LM013833

Project Period

9/01/22 – 5/31/27

Principal Investigator

Saeed Hassanpour, PhD (Geisel School of Medicine at Dartmouth)

Other Project Staff

Arief A. Suriawinata, Lorenzo Torresani, Todd A. MacKenzie, Soroush Vosoughi, Naofumi Tomita, Weiyi Wu

Project Summary

Pathology is focused on providing medical diagnoses and prognoses based on laboratory methods to guide patient treatment and management. Microscopy is fundamental for pathologists to examine tissues and cells. Despite numerous advancements, there have not been many changes in the last century in terms of how microscopy images are used in pathology. The current approach in anatomic pathology lacks standardization and relies on the cognitive burden imposed on pathologists to manually evaluate millions of cells across hundreds of slides in a typical workday. Deep learning-based methods have recently shown encouraging results for analyzing microscopy images. However, they rely on standard computer vision architectures and pipelines, which are limited due to the required time and cost of slide digitization and the computational constraints of analyzing huge high-resolution images. Furthermore, developing accurate deep learning models requires having access to large databases of labeled microscopy images, which is challenging. In this application, new methodologies are proposed to take advantage of the unique characteristics of histopathology datasets and the range of features in histology microscopy images to address these limitations. This project presents a novel approach based on generative adversarial networks for difficulty translation to generate augmented data with realistic, rare, and hard-to-classify histopathological patterns. This approach will mitigate data imbalances in annotated histology datasets and improve the performance of deep learning models for histological classification, particularly for uncommon and difficult-to-classify cases. Furthermore, a novel curriculum learning approach for histology image classification will be developed based on the range of classification difficulty among histopathological patterns and multi-annotator labeled datasets. This approach trains on progressively harder-to-classify images, as determined by annotator agreement, and significantly improves the performance of the resulting deep learning models without requiring additional data or computational resources. In addition, a self-supervised knowledge distillation method will be developed to enhance the efficiency of histology image classification. As large, labeled datasets are scarce, this method uses a self-supervised approach to distill feature extraction capabilities at a high resolution into a student model operating at a lower resolution by leveraging unlabeled datasets. The resulting distilled student models can achieve high classification accuracy on low-resolution histology images while saving a significant amount of time and resources on digitization efforts and required computational resources. The proposed methods in this application remove current bottlenecks in deep learning applications for digital pathology. Therefore, the results from this project could have a major impact on new opportunities that use deep learning technology in clinical workflows and integrate histopathological information with other clinical and molecular data to improve patients’ diagnoses, prognoses, and treatments.

Public Health Relevance

This project will provide timely methodological advancements to efficiently and accurately analyze histopathological information using deep learning models. The results of this study will remove current methodological and computational bottlenecks in the application of deep learning technologies in digital pathology. Therefore, this project will have a significant impact on the current standard of patient care by effectively assisting pathologists in clinical practice and integrating histopathological information with other clinical and molecular data to improve patients’ diagnoses, prognoses, and treatments.