Open Access Links

Home

Download data

Sequencing

Privacy

Data

Participant Selection

The Texas Cancer Research Biobank (TCRB) recognized that the aim of benefitting cancer research must be balanced with the need to protect individual privacy. One of the primary concerns in this project was to protect the privacy of its participants. TCRB specimens were collected from hospitals and clinics across the state of Texas, providing a large cohort from which specimens for Open Access release could be selected.


Conditions of Data Use

Last Updated 10/15/2015

These tumor and normal specimen sequence files (FASTQ and BAM), somatic variant calls (VCF and MAF) and germline MAF files are available for each consented patient as described on the TCRB Open Access Privacy page.

By downloading or utilizing any part of this dataset, end users must agree to the following conditions of use:

  • No attempt to identify any specific individual represented by these data or any derivatives of these data will be made.
  • No attempt will be made to compare and/or link this public data set or derivatives in part or in whole to private health information.
  • These data in part or in whole may be freely downloaded, used in analyses and repackaged in databases.
  • Redistribution of any part of these data or any material derived from the data will include a copy of this notice.
  • The data are intended for use as learning and/or research tools only.
  • This data set is not intended for direct profit of anyone who receives it and may not be resold.
  • Users are free to use the data in scientific publications if the providers of the data (Texas Cancer Research Biobank and Baylor College of Medicine Human Genome Sequencing Center) are properly acknowledged.

Criteria Considered to Select Data for Release

Rare Cancers

To help protect patient confidentiality, any rare tumors were eliminated from Open Access release. Here, rare was defined as cancers that appear in the general population at a rate of less than six per 100,000 per SEER guidelines. Any cancers that were deemed as potentially rare within the population of Texas or among the contributing clinical collection sites were also excluded.

Unusual Demographic Profiles Within the Participant Pool

Participants who reported a unique racial or ethnic profile within the Texas populace were excluded. These demographics include, but are not limited to Pacific Islanders and Ashkenazi Jews.

Age 

No participants were selected who were under the age of 18 at the time of specimen collection. All participant ages were binned to further protect confidentiality.

Data Annotation Release

The annotations within the headers of each sequence file were chosen to provide optimal research value while protecting patients from possible re-identification. The following categories of data were excluded from annotations: a) all HIPAA identifiers; b) all associated participant and collection dates (including years); c) collection location more specific than the state of Texas; and d) identification of the contributing researcher within the TCRB.

Excluded Data Elements

  • All 18 HIPAA identifiers
  • Specific participant age at time of collection
  • Representation of location more specific than the state of Texas
  • Identification of the contributing researcher/physician/hospital

Specimen Labels

Specimens used in this project were double-encoded for privacy protection. Specimen labels were encoded at the collection sites prior to shipment to the Human Genome Sequencing Center at Baylor College of Medicine to remove references to the participant. Once at the Center, the specimens were relabeled with an independent labeling system, breaking the link between the specimen label released with the genomic data and the collection site encoding.


Access the Clinical Data Annotations by Specimen Label

The table below contains a list of the data available through the BCM-HGSC SFTP server.

To access this data, you must first register for an account and verify that you have read the Conditions for Data Use.

Once you have registered, you can download the data through the web interface or SFTP. Please refer to the download instructions for more information.

Case # Sex/
Age/
Race/
Ethnicity
Prior treatment Tumor % cellularity/
TNM
Disease Morph./
Anatomic Site
Tumor Grade
1 M/
51-60/
White/
Not Hispanic or Latino
No 10%/
T3 N1 M0
8500/3: infiltrating duct adenocarcinoma/
Head of pancreas
II
2 F/
61-70/
White/
Not Hispanic or Latino
Yes 60%/
T3 N1 M0
8500/3: infiltrating duct adenocarcinoma/
Head of pancreas
II
3 M/
51-60/
White/
Not Hispanic or Latino
Yes 20%/
T3 N1 M0
8500/3: infiltrating duct adenocarcinoma/
Head of pancreas
II
4 F/
41-50/
White/
Not Hispanic or Latino
No 20%/
T2 N1 M0
8246/3: neuroendocrine carcinoma, nos/
Tail of pancreas
II
5 M/
51-60/
White/
Not Hispanic or Latino
No 5%/
T3 N1 MX
8500/3: infiltrating duct adenocarcinoma/
Head of pancreas
II
6 F/
61-70/
White/
Not Hispanic or Latino
No 80%/
T3 N0 N0
8246/3: neuroendocrine carcinoma, nos/
Pancreas
Low, IIA
7 M/
61-70/
White/
Not Hispanic or Latino
No 90%/
N/A N/A N/A
B-cell follicular lymphoma I-II