Medical Image Retrieval based on ensemble learning using Convolutional Neural Networks and Vision Transformers

  • Yahya Ahmed Yahya
  • Dalya Khaled
  • Waleed Khaild Al-Azzawi
  • Tawfeeq Alghazali
  • Huda Sabah Jabr
  • Rusul Madhat Abdulla
  • Mohammed Kadhim Abbas Al-Maeeni
  • Nathera Hussin Alwan
  • Salma Saad Najeeb
  • Khaldoon T. Falih
Keywords: content-based image retrieval, medical image retrieval, ensemble learning, convolutional neural networks, vision transformers, deep learning, similarity-based visual search


The rapid increase in the number of medical image repositories nowadays has led to problems in managing and retrieving medical visual data. This has proved the necessity of Content-Based Image Retrieval (CBIR) with the aim of facilitating the investigation of such medical imagery. One of the most serious challenges that require special attention is the representational quality of the embeddings generated by the retrieval pipelines. These embeddings should include global and local features to obtain more useful information from the input data. To fill this gap, in this paper, we propose a CBIR framework that utilizes the power of deep neural networks to efficiently classify and fetch the most related medical images with respect to a query image. Our proposed model is based on combining Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) and learns to capture both the locality and also the globality of high-level feature maps. Our method is trained to encode the images in the database and outputs a ranking list containing the most similar image to the least similar one to the query. To conduct our experiments, an intermodal dataset containing ten classes with five different modalities is used to train and assess the proposed framework. The results show an average classification accuracy of 95.32 % and a mean average precision of 0.61. Our proposed framework can be very effective in retrieving multimodal medical images with the images of different organs in the body.


[1] Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK, “Medical image analysis using convolutional neural networks: a review, ” Journal of medical systems. 2018 Nov;42(11):1-3.
[2] Panayides AS, Amini A, Filipovic ND, Sharma A, Tsaftaris SA, Young A, Foran D, Do N, Golemati S, Kurc T, Huang K, “AI in medical imaging informatics: current challenges and future directions,” IEEE Journal of Biomedical and Health Informatics. 2020 May 29;24(7):1837-57.
[3] Li X, Liu S, Lu R, Khan MK, Gu K, Zhang X, “An efficient privacy-preserving public auditing protocol for cloud-based medical storage system,” IEEE Journal of Biomedical and Health Informatics. 2022 Jan 6;26(5):2020-31.
[4] Latif A, Rasheed A, Sajid U, Ahmed J, Ali N, Ratyal NI, Zafar B, Dar SH, Sajid M, Khalil T, “Content-based image retrieval and feature extraction: a comprehensive review,” Mathematical Problems in Engineering. 2019 Aug 26;2019.
[5] Li X, Yang J, Ma J, “Recent developments of content-based image retrieval (CBIR),” Neurocomputing. 2021 Sep 10;452:675-89.
[6] Carvalho ED, Antonio Filho OC, Silva RR, Araujo FH, Diniz JO, Silva AC, Paiva AC, Gattass M, “Breast cancer diagnosis from histopathological images using textural features and CBIR,” Artificial intelligence in medicine. 2020 May 1;105:101845.
[7] Bressan RS, Bugatti PH, Saito PT, “Breast cancer diagnosis through active learning in content-based image retrieval,” Neurocomputing. 2019 Sep 10;357:1-0.
[8] Haji MS, Alkawaz MH, Rehman A, Saba T, “Content-based image retrieval: A deep look at features prospectus,” International Journal of Computational Vision and Robotics. 2019;9(1):14-38.
[9] Tzelepi M, Tefas A, “Deep convolutional learning for content-based image retrieval,” Neurocomputing. 2018 Jan 31;275:2467-78.
[10] Rahman MM, Antani SK, Thoma GR, “A learning-based similarity fusion and filtering approach for biomedical image retrieval using SVM classification and relevance feedback,” IEEE Transactions on Information Technology in Biomedicine. 2011 Jun 16;15(4):640-6.
[11] Lowe DG, “Object recognition from local scale-invariant features,” InProceedings of the seventh IEEE international conference on computer vision 1999 Sep 20 (Vol. 2, pp. 1150-1157). Ieee.
[12] Bay H, Ess A, Tuytelaars T, Van Gool L, “Speeded-up robust features (SURF),” Computer vision and image understanding. 2008 Jun 1;110(3):346-59.
[13] Yonekawa M, Kurokawa H, “The content-based image retrieval using the pulse-coupled neural network,” In The 2012 International Joint Conference on Neural Networks (IJCNN) 2012 Jun 10 (pp. 1-8). IEEE.
[14] Pogarell T, Bayerl N, Wetzl M, Roth JP, Speier C, Cavallaro A, Uder M, Dankerl P, “Evaluation of a Novel Content-Based Image Retrieval System for the Differentiation of Interstitial Lung Diseases in CT Examinations,” Diagnostics. 2021 Nov 15;11(11):2114.
[15] Karnila S, Irianto S, Kurniawan R, “Face recognition using content-based image retrieval for intelligent security,” International Journal of Advanced Engineering Research and Science. 2019;6(1):91-8.
[16] Ha TN, Lubo-Robles D, Marfurt KJ, Wallet BC, “An in-depth analysis of logarithmic data transformation and per-class normalization in machine learning: Application to unsupervised classification of a turbidite system in the Canterbury Basin, New Zealand, and supervised classification of salt in the Eugene Island mini basin, Gulf of Mexico,” Interpretation. 2021 Aug 1;9(3): T685-710.
[17] Raju VG, Lakshmi KP, Jain VM, Kalidindi A, Padma V, “Study the influence of normalization/transformation process on the accuracy of supervised classification,” In2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) 2020 Aug 20 (pp. 729-735). IEEE.
[18] Li Z, Liu F, Yang W, Peng S, Zhou J, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transactions on neural networks and learning systems. 2021 Jun 10.
[19] Sarvamangala DR, Kulkarni RV, “Convolutional neural networks in medical image understanding: a survey,” Evolutionary intelligence. 2021 Jan 3:1-22.
[20] Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D, “Image segmentation using deep learning: A survey,” IEEE transactions on pattern analysis and machine intelligence. 2021 Feb 17.
[21] Zou Z, Shi Z, Guo Y, Ye J, “Object detection in 20 years: A survey,” arXiv preprint arXiv:1905.05055. 2019 May 13.
[22] Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H, “Sensor-based and vision-based human activity recognition: A comprehensive survey,” Pattern Recognition. 2020 Dec 1;108:107561.
[23] Adjabi I, Ouahabi A, Benzaoui A, Taleb-Ahmed A, “Past, present, and future of face recognition: A review,” Electronics. 2020 Jul 23;9(8):1188.
[24] Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, de Albuquerque VH, “A comprehensive survey of multi-view video summarization,” Pattern Recognition. 2021 Jan 1;109:107567.
[25] Tan M, Le Q, “Efficientnet: Rethinking model scaling for convolutional neural networks,” In International conference on machine learning 2019 May 24 (pp. 6105-6114). PMLR.
[26] Khan A, Sohail A, Zahoora U, Qureshi AS, “A survey of the recent architectures of deep convolutional neural networks,” Artificial intelligence review. 2020 Dec;53(8):5455-516.
[27] Nejatian S, Parvin H, Faraji E, “Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification,” Neurocomputing. 2018 Feb 7;276:55-66.
[28] Fan H, Xiong B, Mangalam K, Li Y, Yan Z, Malik J, Feichtenhofer C, “Multiscale vision transformers,” InProceedings of the IEEE/CVF International Conference on Computer Vision 2021 (pp. 6824-6835).
[29] Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K, “Speech recognition using deep neural networks: A systematic review,” IEEE Access. 2019 Feb 1;7:19143-65.
[30] Dabre R, Chu C, Kunchukuttan A, “A survey of multilingual neural machine translation,” ACM Computing Surveys (CSUR). 2020 Sep 28;53(5):1-38.
[31] El-Kassas WS, Salama CR, Rafea AA, Mohamed HK, “Automatic text summarization: A comprehensive survey,” Expert Systems with Applications. 2021 Mar 1;165:11367
[32] Almansor EH, Hussain FK, “Survey on intelligent chatbots: State-of-the-art and future research directions,” In Conference on Complex, Intelligent, and Software Intensive Systems 2019 Jul 3 (pp. 534-543). Springer, Cham.
[33] Zhao H, Jia J, Koltun V, “Exploring self-attention for image recognition,” InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 10076-10085).
[34] Chen J, Ho CM, “MM-ViT: Multi-modal video transformer for compressed video action recognition,” InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2022 (pp. 1910-1921).
[35] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[36] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[37] M. Subrahmanyam, R. Maheshwari, and R. Balasubramanian, “Local maximum edge binary patterns: a new descriptor for image retrieval and object tracking,” Signal Processing, vol. 92, no. 6, pp. 1467-1479, 2012.
[38] K. Velmurugan and L. D. S. S. Baboo, “Image retrieval using harris corners and histogram of oriented gradients,” International Journal of Computer Applications (0975-8887) Volume, vol. 24, 2011.
[39] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[40] P. Sermanet et al., “Overfeat: Integrated recognition, localization, and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
How to Cite
Yahya, Y., Khaled, D., Al-Azzawi, W., Alghazali, T., Jabr, H., Abdulla, R., Al-Maeeni, M. K., Alwan, N., Najeeb, S., & Falih, K. (2022). Medical Image Retrieval based on ensemble learning using Convolutional Neural Networks and Vision Transformers. Majlesi Journal of Electrical Engineering, 16(3). Retrieved from

Most read articles by the same author(s)