Additional Data Sets

Additional Curated and Standardized Research Datasets available from the Georgetown Data Bridge Team:


Below is a list of specialized datasets that were co-developed by ICBI investigators and collaborators. These datasets are currently managed by AIM-AHEAD team from the ICBI Center and available to research investigators for research use with proper . 

Description: The REMBRANDT dataset was originally created at the National Cancer Institute and funded by Glioma Molecular Diagnostic Initiative. The data was collected from 2004-2006. In 2015, the NCI transferred this dataset to Georgetown, and it is now physically located on the Georgetown Database of Cancer (G-DOC), a cancer data integration and sharing platform for hosting alongside other cancer studies. REMBRANDT includes genomic data from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma, and a number that are mixed or of an unknown subclass. Outcomes data include more than 13,000 data points. More information

Description: The Georgetown Pediatric Cancer Outcomes Database was co-developed by investigators at the Georgetown University Medical Center (GUMC) Lombardi Comprehensive Cancer Center (LCCC), the Pediatric Hematology Oncology and Bone Marrow Transplantation Program and ICBI. More information

Description: A centralized research data warehouse for ImmunoOncology that is enabling novel hypothesis generation and retrospective outcomes research at the 10 DC-Baltimore based MedStar Health network hospitals. Data sources include hospital medical records, Labs, pathology, radiology, and Cancer Registry data. More information

Description: Molecular profiling of any patient’s tumor identifies their disease biomarker pattern that then allows that patient’s medical team to select personalized treatment options that they may not have previously considered. The data set includes results of around 5000 patients from Caris Molecular Intelligence™ service with the ultimate goal of better informing treatment decisions. More information

Description: Colorectal cancer (CRC) patient biospecimens with extensive clinical and follow-up data were selected from the Indivumed GmbH biobank for 40 patients (20 relapse and 20 no-relapse). The patients consisted of 12 with late stage I, and 28 with stage II. More information

Description: Protein structure simulation results generated from the SNP2SIM workflow, and used to develop protein specific models of variant function. Contains metadata on simulation configuration/parameterization, and output from Nanoscale Molecular Dynamics (NAMD) and Autodock Vina. More information

Description:  Oncology-specific electronic health record data (EHR) in the Observational Medical Outcomes Partnership common data model (OMOP CDM), which is one of the leading standard data models used in nationwide research and information sharing initiatives. OMOP was developed to be a shared analytics model and it has been adopted by the Observational Health Data Sciences and Informatics (OHDSI) Consortium. More information

Description: This is a public gene expression dataset containing primary bladder cancer samples. It includes 165 primary bladder cancer samples, 23 recurrent non-muscle invasive tumor tissues, 58 normal looking bladder mucosae surrounding cancer and 10 normal bladder mucosae for microarray analysis. Available in NCBI GEO at Series GSE13507 - More information

Description: Comprehensive clinical pathway system to increase the understanding and use of Next Generation Sequencing (NGS) in Non-Small Cell Lung Cancer (NSCLC) by general oncologists, pathologists, and oncology nurse navigators. This new and innovative pathway training system will make decision-making easier for clinicians who are trying to understand the appropriate tests and treatment algorithms for their patients. More information