Background Histologic evaluation of the mucosal changes associated with celiac disease is important for establishing an accurate diagnosis and monitoring the impact of investigational therapies. While the Marsh-Oberhuber classification has been used to categorize the histologic findings into discrete stages (i.e., Type 0-3c), significant variability has been documented between observers using this ordinal scoring system. Therefore, we evaluated whether pathologist- trained machine learning classifiers can be developed to objectively quantitate the pathological changes of villus blunting, intraepithelial lymphocytosis, and crypt hyperplasia in small intestine endoscopic biopsies. Methods A convolutional neural network (CNN) was trained and combined with a secondary algorithm to quantitate intraepithelial lymphocytes (IEL) with 5 classes on CD3 immunohistochemistry whole slide images (WSI) and used to correlate feature outputs with ground truth modified Marsh scores in a total of 116 small intestine biopsies. Results Across all samples, median %CD3 counts (positive cells/enterocytes) from villous epithelium (VE) increased with higher Marsh scores (Type 0%CD3 VE = 13.4; Type 1–3%CD3 VE = 41.9, p < 0.0001). Indicators of villus blunting and crypt hyperplasia were also observed (Type 0–2 villous epithelium/lamina propria area ratio = 0.81; Type 3a-3c villous epithelium/lamina propria area ratio = 0.29, p < 0.0001), and Type 0–1 crypt/villous epithelial area ratio = 0.59; Type 2–3 crypt/villous epithelial area ratio = 1.64, p < 0.0001). Using these individual features, a combined feature machine learning score (MLS) was created to evaluate a set of 28 matched pre- and post-intervention biopsies captured before and after dietary gluten restriction. The disposition of the continuous MLS paired biopsy result aligned with the Marsh score in 96.4% (27/28) of the cohort. Conclusions Machine learning classifiers can be developed to objectively quantify histologic features and capture additional data not achievable with manual scoring. Such approaches should be further investigated to improve biopsy evaluation, especially for clinical trials.
Loading....