FLOSSIES

The FLOSSIES sequencing project was inspired by the need for frequencies of genomic variants from appropriate controls for breast cancer genetics studies. Existing public databases are large but not stratified by gender or age. The FLOSSIES project includes allele frequencies of all classes of variants at known and candidate genes for breast cancer susceptibility among women who are cancer free and older than age 70. The participants are Fabulous Ladies Over Seventy. We hope this data will be useful as controls for breast cancer genetics studies by many groups.

Participants

Participants in this sequencing project are women who joined the Women’s Health Initiative (WHI) between 1993 and 2005. All participants were 50-79 years old and without any history of cancer at enrollment. More than 160,000 women enrolled in WHI, contributed DNA at enrollment, and have been followed ever since. From the WHI participants who are now older than age 70 years and have remained cancer-free, approximately 10,000 women were selected at random for this sequencing project: approximately 7,000 women who self-identified as European American and 3,000 women who identified as African American ancestry. Ancestries were estimated with more than 500 ancestry-informative markers.

Genes

Genes analyzed include established breast cancer genes, both high and moderate penetrance, and genes that have been suggested to predispose to breast cancer when mutant. Strength of evidence varies for these candidate genes. The genes are:

BRCA1, BRCA2, ATM, ATR, BAP1, BARD1, BRIP1, CDH1, CHEK1, CHEK2, CTNNA1, FAM175A, FANCM, GEN1, MRE11A, NBN, PALB2, PTEN, RAD51B, RAD51C, RAD51D, RECQL, RINT1, SLX4, STK11, TP53, XRCC2

For each gene, rare and common variants in coding regions, 5’UTR, 3’UTR, donor splice sites with 6 flanking bp, and acceptor splice sites with 20 flanking bp are included. Point mutations, small indels, and CNVs are included (although CNV calls are still being updated).

Sequencing

Sequencing was carried out in the King Lab at the University of Washington, Seattle, and by Color Genomics on an Illumina HiSeq with 2x100 bp paired-end reads (King Lab) or NextSeq with 2x150 bp paired-end reads (Color) using modified versions of the BROCA panel¹. Median coverage per sample was >250x.

Paired-end sequence reads were aligned to the human reference genome (hg19) using Burrows-Wheeler Aligner 0.7.9a. Removal of PCR duplicates, sorting, and indexing were carried out with SAMtools v0.1.19. Data was excluded for 3 samples with low coverage and 2 samples with low quality. For quality control, WHI included 105 duplicates and 20 triplicates, blinded to the sequencing teams. These duplicates and triplicates were all identified and removed.

Indel realignments and base quality score recalibration were based on with Genome Analysis Tool Kit (GATK v3.0) using recommended parameters. Variants were detected with GATK Unified Genotyper. Variants were included if variant fraction was at least 0.25. Copy number variants were identified using our read-depth-based in-house pipeline.

Ancestry

Ancestry of each sample was estimated using more than 500 ancestry-informative markers and principal components analysis², calibrating to genotypes from the Human Genome Diversity Project ³.

Each sample was assigned to the population with highest estimated likelihood of ancestry. Ancestry analysis yielded estimates of 7,325 women of European American ancestry and 2,559 women of African American ancestry.

Open source

This site uses a modified version of the open source ExAC Browser, developed by The Broad Institute (Konrad Karczewski, Daniel MacArthur, Brett Thomas, Ben Weisburd) and distributed under the MIT License.

REFERENCES

1. Walsh T, Lee MK, Casadei S, et al. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing. Proc Natl Acad Sci USA. 2010;107(28):12629-33.
2. Wang C, Zhan X, Liang L, Abecasis GR, Lin X. Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am J Hum Genet. 2015;96(6):926-37.
3. Li JZ, Absher DM, Tang H, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319(5866):1100-4.