Unraveling the deep learning gearbox in optical coherence tomography image segmentation towards explainable artificial intelligence

  • 1.

    Samuel, A. L. in Computer Games I (ed. Levy D.N.L.) 366–400 (Springer New York, 1988).

  • 2.

    Fletcher, K. H. Matter with a mind; a neurological research robot. Research 4, 305–307 (1951).

    CAS 
    PubMed 

    Google Scholar
     

  • 3.

    Kononenko, I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23, 89–109 (2001).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 4.

    Kugelman, J. et al. Automatic choroidal segmentation in OCT images using supervised deep learning methods. Sci. Rep. 9, 13298 (2019).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 5.

    Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation 234–241 (Springer International Publishing, Cham, 2015).

  • 6.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 7.

    Müller, P. L. et al. in High Resolution Imaging in Microscopy and Ophthalmology: New Frontiers in Biomedical Optics (ed. Bille, J. F.) 87–106 (Springer International Publishing, 2019).

  • 8.

    Huang, D. et al. Optical coherence tomography. Science 254, 1178–1181 (1991).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 9.

    Mrejen, S. & Spaide, R. F. Optical coherence tomography: imaging of the choroid and beyond. Surv. Ophthalmol. 58, 387–429 (2013).

    PubMed 
    Article 

    Google Scholar
     

  • 10.

    Staurenghi, G., Sadda, S., Chakravarthy, U. & Spaide, R. F. Proposed lexicon for anatomic landmarks in normal posterior segment spectral-domain optical coherence tomography. Ophthalmology 121, 1572–1578 (2014).

  • 11.

    von der Emde, L. et al. Artificial intelligence for morphology-based function prediction in neovascular age-related macular degeneration. Sci. Rep. 9, 11132 (2019).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 12.

    Lee, C. S., Baughman, D. M. & Lee, A. Y. Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol. Retina 1, 322–327 (2017).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 13.

    Motozawa, N. et al. Optical coherence tomography-based deep-learning models for classifying normal and age-related macular degeneration and exudative and non-exudative age-related macular degeneration changes. Ophthalmol. Ther. 8, 527–539 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 14.

    Keel, S. et al. Feasibility and patient acceptability of a novel artificial intelligence-based screening model for diabetic retinopathy at endocrinology outpatient services: a pilot study. Sci. Rep. 8, 4330 (2018).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 15.

    Bellemo, V. et al. Artificial intelligence screening for diabetic retinopathy: the real-world emerging application. Curr. Diab. Rep. 19, 72 (2019).

    PubMed 
    Article 

    Google Scholar
     

  • 16.

    Grzybowski, A. et al. Artificial intelligence for diabetic retinopathy screening: a review. Eye 34, 451–460 (2020).

    PubMed 
    Article 

    Google Scholar
     

  • 17.

    Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • 18.

    Arcadu, F. et al. Deep learning algorithm predicts diabetic retinopathy progression in individual patients. NPJ Digit. Med. 2, 92 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 19.

    Waldstein, S. M. et al. Evaluating the impact of vitreomacular adhesion on anti-VEGF therapy for retinal vein occlusion using machine learning. Sci. Rep. 7, 2928 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 20.

    Schlegl, T. et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 125, 549–558 (2018).

    PubMed 
    Article 

    Google Scholar
     

  • 21.

    Zutis, K. et al. Towards automatic detection of abnormal retinal capillaries in ultra-widefield-of-view retinal angiographic exams. In Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. Vol. 2013, 7372–7375 (Osaka, Japan, 2013).

  • 22.

    Müller, P. L. et al. Prediction of function in ABCA4-related retinopathy using Ensemble machine learning. J. Clin. Med. 9, 2428 (2020).

    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 23.

    De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    PubMed 
    Article 
    CAS 

    Google Scholar
     

  • 24.

    Maloca, P. M. et al. Validation of automated artificial intelligence segmentation of optical coherence tomography images. PLoS ONE 14, e0220063 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 25.

    Quellec, G. et al. Feasibility of support vector machine learning in age-related macular degeneration using small sample yielding sparse optical coherence tomography data. Acta Ophthalmol. 97, e719–e728 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 26.

    Darcy, A. M., Louie, A. K. & Roberts, L. W. Machine learning and the profession of medicine. JAMA 315, 551–552 (2016).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 27.

    Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 1–47 (2018).

  • 28.

    Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 29.

    King, B. F. Artificial intelligence and radiology: what will the future hold? J. Am. Coll. Radiol. 15, 501–503 (2018).

    PubMed 
    Article 

    Google Scholar
     

  • 30.

    Coiera, E. The fate of medicine in the time of AI. Lancet 392, 2331–2332 (2018).

    PubMed 
    Article 

    Google Scholar
     

  • 31.

    Jha, S. & Topol, E. J. Adapting to artificial intelligence: radiologists and pathologists as information specialists. JAMA 316, 2353–2354 (2016).

    PubMed 
    Article 

    Google Scholar
     

  • 32.

    Makridakis, S. The forthcoming artificial intelligence (AI) revolution: its impact on society and firms. Futures 90, 46–60 (2017).

    Article 

    Google Scholar
     

  • 33.

    Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 34.

    Chan, S. & Siegel, E. L. Will machine learning end the viability of radiology as a thriving medical specialty? Br. J. Radiol. 92, 20180416 (2019).

    PubMed 
    Article 

    Google Scholar
     

  • 35.

    Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 36.

    Ferrucci, D., Levas, A., Bagchi, S., Gondek, D. & Mueller, E. T. Watson: beyond jeopardy! Artif. Intell. 199–200, 93–105 (2013).

    Article 

    Google Scholar
     

  • 37.

    Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1, e271–e297 (2019).

    PubMed 
    Article 

    Google Scholar
     

  • 38.

    Bouwmeester, W. et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 9, 1–12 (2012).

    PubMed 
    Article 

    Google Scholar
     

  • 39.

    Collins, G. S. & Moons, K. G. M. Reporting of artificial intelligence prediction models. Lancet 393, 1577–1579 (2019).

    PubMed 
    Article 

    Google Scholar
     

  • 40.

    Schulz, K. F., Altman, D. G., Moher, D. & CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 340, c332 (2010).

  • 41.

    Calvert, M. et al. Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: the SPIRIT-PRO Extension. JAMA 319, 483–494 (2018).

    PubMed 
    Article 

    Google Scholar
     

  • 42.

    CONSORT-AI and SPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat. Med. 25, 1467–1468 (2019).

    Article 
    CAS 

    Google Scholar
     

  • 43.

    Liu, X., Faes, L., Calvert, M. J., Denniston, A. K. & CONSORT/SPIRIT-AI Extension Group. Extension of the CONSORT and SPIRIT statements. Lancet 394, 1225 (2019).

  • 44.

    Kaiser, T. M. & Burger, P. B. Error tolerance of machine learning algorithms across contemporary biological targets. Molecules 24, https://doi.org/10.3390/molecules24112115 (2019).

  • 45.

    Beam, A. L., Manrai, A. K. & Ghassemi, M. Challenges to the reproducibility of machine learning models in health care. JAMA 323, 305–306 (2020).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 46.

    Ting, D. S. W. et al. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 103, 167–175 (2019).

    PubMed 
    Article 

    Google Scholar
     

  • 47.

    Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).

    PubMed 
    Article 

    Google Scholar
     

  • 48.

    Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 49.

    Guidotti, R. et al. A survey of methods for explaining black box models. ACM Comput. Surv. 51, 1–42 (2019).

    Article 

    Google Scholar
     

  • 50.

    Lipton, Z. C. The mythos of model interpretability. Queue 16, 31–57 (2018).

    Article 

    Google Scholar
     

  • 51.

    Gunning, D. & Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40, 44–58 (2019).

    Article 

    Google Scholar
     

  • 52.

    Holzinger, A., Kieseberg, P., Weippl, E. & Tjoa, A. M. Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI. In Machine Learning and Knowledge Extraction. CD-MAKE 2018. Lecture Notes in Computer Science, Vol 11015, 1–8 (eds. Holzinger, A. et al.) (Springer, Cham., 2018). https://doi.org/10.1007/978-3-319-99740-7_1.

  • 53.

    Barredo Arrieta, A. et al. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).

    Article 

    Google Scholar
     

  • 54.

    Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018).

    Article 

    Google Scholar
     

  • 55.

    Holzinger, A., Langs, G., Denk, H., Zatloukal, K. & Müller, H. Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Discov. 9, e1312. https://doi.org/10.1002/widm.1312 (2019).

  • 56.

    Holzinger, A., Carrington, A. & Müller, H. Measuring the quality of explanations: The System Causability Scale (SCS). KI K.ünstliche Intell. 34, 193–198 (2020).

    Article 

    Google Scholar
     

  • 57.

    Ribeiro, M. T., Singh, S. & Guestrin, C. Why Should I. Trust You? In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).

  • 58.

    Lakkaraju, H., Kamar, E., Caruana, R. & Leskovec, J. Interpretable & explorable approximations of black box models. Preprint at https://arxiv.org/abs/1707.01154 (2017).

  • 59.

    Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. https://doi.org/10.1007/s11263-019-01228-7 (2016).

  • 60.

    Wickstrom, K., Kampffmeyer, M. & Jenssen, R. Uncertainty modeling and interpretability in convolutional neural networks for polyp segmentation. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) 1–6 (IEEE, 2018).

  • 61.

    Vinogradova, K., Dibrov, A. & Myers, G. Towards Interpretable semantic segmentation via gradient-weighted class activation mapping. Preprint at https://arxiv.org/abs/2002.11434 (2020).

  • 62.

    Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 63.

    Seegerer, P. et al. Interpretable deep neural network to predict estrogen receptor status from haematoxylin-eosin images. in Artificial Intelligence and Machine Learning for Digital Pathology (eds. Holzinger, A. et al.) 16–37 (Springer, Cham, 2020).

  • 64.

    Montavon, G., Lapuschkin, S., Binder, A., Samek, W. & Müller, K.-R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 65, 211–222 (2017).

    Article 

    Google Scholar
     

  • 65.

    Kim, B. et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). in Proceedings of the 35th International Conference on Machine Learning, Vol. 80 (eds Jennifer, D. & Andreas, K.) 2668–2677 (PMLR, Proceedings of Machine Learning Research, 2018).

  • 66.

    Moussa, M. et al. Grading of macular perfusion in retinal vein occlusion using en-face swept-source optical coherence tomography angiography: a retrospective observational case series. BMC Ophthalmol. 19, 127 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 67.

    Swanson, E. A. & Fujimoto, J. G. The ecosystem that powered the translation of OCT from fundamental research to clinical and commercial impact [Invited]. Biomed. Opt. Express 8, 1638 (2017).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 68.

    Holz, F. G. et al. Multi-country real-life experience of anti-vascular endothelial growth factor therapy for wet age-related macular degeneration. Br. J. Ophthalmol. 99, 220–226 (2015).

    PubMed 
    Article 

    Google Scholar
     

  • 69.

    Alshareef, R. A. et al. Segmentation errors in macular ganglion cell analysis as determined by optical coherence tomography in eyes with macular pathology. Int. J. Retin. Vitr. 3, 25 (2017).

    Article 

    Google Scholar
     

  • 70.

    Al-Sheikh, M., Ghasemi Falavarjani, K., Akil, H. & Sadda, S. R. Impact of image quality on OCT angiography based quantitative measurements. Int. J. Retina Vitreous 3, 13 (2017).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 71.

    Sadda, S. R. et al. Errors in retinal thickness measurements obtained by optical coherence tomography. Ophthalmology 113, 285–293 (2006).

    PubMed 
    Article 

    Google Scholar
     

  • 72.

    Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015).

  • 73.

    Sinz, F. H., Pitkow, X., Reimer, J., Bethge, M. & Tolias, A. S. Engineering a less artificial intelligence. Neuron 103, 967–979 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 74.

    Zador, A. M. A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10, 3770 (2019).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 75.

    Tajmir, S. H. et al. Artificial intelligence-assisted interpretation of bone age radiographs improves accuracy and decreases variability. Skelet. Radio. 48, 275–283 (2019).

    Article 

    Google Scholar
     

  • 76.

    Kellner-Weldon, F. et al. Comparison of perioperative automated versus manual two-dimensional tumor analysis in glioblastoma patients. Eur. J. Radiol. 95, 75–81 (2017).

    PubMed 
    Article 

    Google Scholar
     

  • 77.

    Ma, Z., Turrigiano, G. G., Wessel, R. & Hengen, K. B. Cortical circuit dynamics are homeostatically tuned to criticality in vivo. Neuron 104, 655–664.e4 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 78.

    Shibayama, S. & Wang, J. Measuring originality in science. Scientometrics 122, 409–427 (2020).

    Article 

    Google Scholar
     

  • 79.

    Dirk, L. A measure of originality. Soc. Stud. Sci. 29, 765–776 (1999).

    Article 

    Google Scholar
     

  • 80.

    Hägele, M. et al. Resolving challenges in deep learning-based analyses of histopathological images using explanation methods. Sci. Rep. 10, 6423 (2020).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  • 81.

    Panwar, H. et al. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos Solitons Fractals 140, 110190 (2020).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  • 82.

    Anger, E. M. et al. Ultrahigh resolution optical coherence tomography of the monkey fovea. Identification of retinal sublayers by correlation with semithin histology sections. Exp. Eye Res. 78, 1117–1125 (2004).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  • 83.

    Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010).


    Google Scholar
     

  • 84.

    Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization CoRR abs/1412.6980 (2015).

  • 85.

    Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. https://doi.org/10.1016/j.media.2017.07.005 (2017).

  • 86.

    Kosub, S. A note on the triangle inequality for the Jaccard distance arXiv:1612.02696 (2016).

  • 87.

    Borg, I. & Groenen, P. Modern Multidimensional Scaling (Springer New York, 1997).

  • 88.

    R Core Team. R: A Language and Environment for Statistical Computing (2019).

  • 89.

    Fay, M. P. & Shaw, P. A. Exact and asymptotic weighted logrank tests for interval censored data: the interval R package. J. Stat. Softw. 36 (2010).

  • 90.

    Maloca, M. P. et al. Unraveling the deep learning gearbox in optical coherence tomography image segmentation towards explainable artificial intelligence. Code/software v1.0. https://doi.org/10.5281/zenodo.4380269 (2020).