Public Medical Imaging Datasets For Artificial Intelligence Models

Abdulkader Helwan
4 min readFeb 1, 2023

--

Gathering imaging data is a fundamental part of creating artificial intelligence models for diagnostic radiology. These datasets can be used for various functions, such as training and testing machine learning algorithms, segmentation, classification, and other purposes. While many convolutional neural networks for image recognition tasks require at least thousands of images for training, lesser amounts of data are more useful other analyzing textures, transfer learning, fine-tuning, and other techniques.

Given the sensitivity of patient privacy, numerous commercial artificial intelligence models are based on exclusive data sets or individual hospital data sets that are not available. Despite this, there are a few sets of radiological images and/or reports publicly accessible on the following websites. In this post, we will list some of the best available medical imaging and healthcare-related datasets.

Public Medical Imaging Datasets

  • 1000 Functional Connectomes Project: over 1000 functional MRI exams collected from sites across the globe
  • ACR Data Science: list of ~20 data sets
  • CANDID-PTX dataset: 19,237 chest X-ray Dicom images with segmentation labels for pneumothorax, rib fractures, and chest tubes and corresponding free text reports from New Zealand.
  • CheXpert: 224,316 chest radiographs
  • Computed Tomography Emphysema Database small images specifically for texture analysis
  • COVID-19 Open Annotated Radiology Database (RICORD) expert annotated COVID-19 imaging dataset. 1000 chest x-rays and 240 thoracic CT exams
  • Johns Hopkins University Data Archive contains a data set of head CT scans
  • The Medical Image Bank of Valencia
  • MD.ai: a collection of public projects
  • NIH CXR8: 112,120 frontal chest radiographs
  • OpenI — The Open Access Biomedical Image Search Engine: data sets search engine with application programmer interface (API) to create customized data sets available at MedPix
  • OpenNeuro: list of over 200 neuro data sets
  • OASIS: open access neuro data sets
  • Spineweb 16 spinal imaging data sets
  • UCLH Stroke EIT Dataset
  • MRNet: 1,370 annotated knee MRI examinations
  • MURA: a large dataset of musculoskeletal radiographs
  • MIMIC-CXR Database: 377,110 chest radiographs with free-text radiology reports
  • PADCHEST: 160,000 chest X-rays with multiple labels on images
  • RSNA Pulmonary Embolism CT (RSPECT) dataset 12,000 CT studies
  • RSNA 2019 Brain CT Hemorrhage dataset: 25,312 CT studies
  • TB Portals
  • UC Irvine Machine Learning Repository: various radiological and nuclear medicine data sets among other types of data sets
  • York Cardiac MRI Dataset: cardiac MRIs
  • The Visible Human Project Dataset: CT, MRI and cryosectional images of complete cadavers
  • Zenodo searchable projects

The Cancer Imaging Archive

Other datasets can be also found on The Cancer Imaging Archive which contains links to many open radiology data sets such as:

  • 4D-Lung
  • ACRIN-FLT-Breast
  • ACRIN-FLT-Breast
  • ACRIN-FMISO-Brain
  • ACRIN-NSCLC-FDG-PET
  • Anti-PD-1 Immunotherapy Lung (Anti-PD-1_Lung)
  • Anti-PD-1 Immunotherapy Melanoma (Anti-PD-1_MELANOMA)
  • APOLLO-1-VA
  • APOLLO-5-ESCA
  • Brain-Tumor-Progression
  • BREAST-DIAGNOSIS
  • Breast-MRI-NACT-Pilot
  • CBIS-DDSM
  • CPTAC-AML
  • CPTAC-CCRCC
  • CPTAC-CM
  • CPTAC-GBM
  • CPTAC-HNSCC
  • CPTAC-LSCC
  • CPTAC-LUAD
  • CPTAC-PDA
  • CPTAC-SAR
  • CPTAC-UCEC
  • Credence Cartridge Radiomics Phantom CT Scans
  • Credence Cartridge Radiomics Phantom CT Scans with Controlled Scanning Approach (CC-Radiomics-Phantom-2)
  • CT COLONOGRAPHY
  • CT Lymph Nodes
  • Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT)
  • Head-Neck Cetuximab
  • Head-Neck-PET-CT
  • ISPY1
  • Ivy GAP
  • LGG-1p19qDeletion
  • LIDC-IDRI
  • LungCT-Diagnosis
  • Lung CT Segmentation Challenge 2017
  • Lung Phantom
  • Mouse-Astrocytoma
  • Mouse-Mammary
  • NaF Prostate
  • NRG-1308
  • NSCLC-Cetuximab
  • NSCLC Radiogenomics
  • NSCLC-Radiomics
  • NSCLC-Radiomics-Genomics
  • Osteosarcoma data from UT Southwestern/UT Dallas for Viable and Necrotic Tumor Assessment
  • Pancreas-CT
  • Phantom FDA
  • Prostate-3T
  • PROSTATE-DIAGNOSIS
  • Prostate Fused-MRI-Pathology
  • PROSTATE-MRI
  • QIBA CT-1C
  • QIN-BRAIN-DSC-MRI
  • QIN-Breast
  • QIN Breast DCE-MRI
  • QIN GBM Treatment Response
  • QIN-HEADNECK
  • QIN LUNG CT
  • QIN PET Phantom
  • QIN PROSTATE
  • QIN-PROSTATE-Repeatability
  • QIN-SARCOMA
  • Quantitative Imaging Network Collections
  • REMBRANDT
  • RIDER Breast MRI
  • RIDER Collections
  • RIDER Lung CT
  • RIDER Lung PET-CT
  • RIDER NEURO MRI
  • RIDER PHANTOM MRI
  • RIDER Phantom PET-CT
  • Soft-tissue-Sarcoma
  • SPIE-AAPM Lung CT Challenge
  • SPIE-AAPM-NCI PROSTATEx Challenges
  • Synthetic and Phantom MR Images for Determining Deformable Image Registration Accuracy (MRI-DIR)
  • The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA)
  • The Cancer Genome Atlas Cervical Kidney Renal Papillary Cell Carcinoma Collection (TCGA-KIRP)
  • The Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma Collection (TCGA-CESC)
  • The Cancer Genome Atlas Collections
  • The Cancer Genome Atlas Colon Adenocarcinoma Collection (TCGA-COAD)
  • The Cancer Genome Atlas Esophageal Carcinoma Collection (TCGA-ESCA)
  • The Cancer Genome Atlas Glioblastoma Multiforme Collection (TCGA-GBM)
  • The Cancer Genome Atlas Head-Neck Squamous Cell Carcinoma Collection (TCGA-HNSC)
  • The Cancer Genome Atlas Kidney Chromophobe Collection (TCGA-KICH)
  • The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma Collection (TCGA-KIRC)
  • The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC)
  • The Cancer Genome Atlas Low-Grade Glioma Collection (TCGA-LGG)
  • The Cancer Genome Atlas Lung Adenocarcinoma Collection (TCGA-LUAD)
  • The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection (TCGA-LUSC)
  • The Cancer Genome Atlas Ovarian Cancer Collection (TCGA-OV)
  • The Cancer Genome Atlas Prostate Adenocarcinoma Collection (TCGA-PRAD)
  • The Cancer Genome Atlas Rectum Adenocarcinoma Collection (TCGA-READ)
  • The Cancer Genome Atlas Sarcoma Collection (TCGA-SARC)
  • The Cancer Genome Atlas Stomach Adenocarcinoma Collection (TCGA-STAD)
  • The Cancer Genome Atlas Thyroid Cancer Collection (TCGA-THCA)
  • The Cancer Genome Atlas Urothelial Bladder Carcinoma Collection (TCGA-BLCA)
  • The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma Collection (TCGA-UCEC)

The article is originally published on AI-ContentLab:

https://www.ai-contentlab.com/2023/01/gathering-imaging-data-is-fundamental.html

--

--

Abdulkader Helwan
Abdulkader Helwan

No responses yet