Lecture Notes
Hong Zheng / 2017-12-04
10X genomics workshop
- Sequencing depth for typical samples: 50,000 raw reads per cell is the minimal recommended sequencing depth
- Cell Ranger
- mkfastq, from bcl to fastq
- count, transcriptome alignment with STAR
- aggr
- reanalyze
Count pipeline
- BAM
- MEX-gene/barcode matrix, market exchange format
- HTML
- .cloupe files: 2D projections, clustering, DE
Filterred gene-barcode matrix: select GEMs that likely contain cells:
- sum UMI counts for each barcode
- select barccode with total UMI count >=10% of 99th percentile of the expected recoverred cells
How do we learn what works? a two-step algorithm for causal inference from real world data
Miguel Hernan, DrPH, Professor of Biostatistics and of Epidemiology
4/24/2018
Decisions must be made now, in clinical practice and publich health.
How do we estimate causal effects? Randomized trials.
If randomized trials are not feasible, then we use observational data.
The target trial concept leads to a simple two-step algorithm for causal inference
- Ask a causal question
Specify the protocol of the target trial - Answer the causal question
Conduct the target trial, or
Use observational data to explictly simulate the target trial
Example: postmenopausal hormone theray and coronary heart disease
- Observational epidermiologic studies
- >30% lower risk in current users vs. never users. Hazard ratio 0.68 in Nurse’s health study (https://www.ncbi.nlm.nih.gov/pubmed/16417416)
- Randomized trial
- >20% higher risk in initiators vs. noninitiators. Hazard ratio 1.24 in Women’s health Initiative (http://www.nejm.org/doi/full/10.1056/NEJMoa030808)
AACR 2018
Why do lung adenocarcinomas respond to kinase inhibitors while glioblastomas don’t? Contrasting patterns of tumor evolution
Matthew L. Meyerson. Dana-Farber Cancer Inst., Boston, MA
- GBM, uniformaly fatal, poor responses to therapeutics
- EGFR mutations in GBM, point mutations in 14% GBM patients
- Lung: kiniase domain
- GBM: c-terminal
- EGFR amplified in 45% GBM
The making of the pre-cancer atlas: Opportunities and challenges
Sudhir Srivastava. National Cancer Inst., Rockville, MD
Examples of precancerous lesions
- Barret’s Esophagus
- Ductal Carcinoma in situ
- Oral Submucous Fibrosis
- Leukoplakia
- Vervical dysplasia
Single-cell genomics for the analysis of premalignancy
Nicholas E. Navin. UT MD Anderson Cancer Ctr., Bellaire, TX
- Multiclonal Invasion in Breast Tumors Identified by Topographic Single Cell Sequencing http://www.cell.com/cell/fulltext/S0092-8674(17)31449-6
Genomic analysis of precancers in the GI tract
Eduardo Vilar-Sanchez. UT MD Anderson Cancer Ctr., Houston, TX
- what pre-cancers to study
- TCGA->PCA
- what populations to study from
- what type of samples to study
- what technologies to deploy
- what should be deliverables
Cancer evolution measured at single cell resolution
Sohrab Shah. BC Cancer Agency Vancouver Ctr., Vancouver, BC, Canada
- Scalable whole-genome single-cell library preparation without preamplification
https://www.nature.com/articles/nmeth.4140 - Inference of clonal gene expression through integration of single cell DNA & RNA sequencing
Can we infer clone specific gene expression?
Clonealign (Kieran Campbell)
https://github.com/kieranrcampbell/clonealign
Quantifying patient-specific evolutionary dynamics
Christina Curtis. Stanford University, Stanford, CA
- Intra-tumour diversification in colorectal cancer at the single-cell level
https://www.nature.com/articles/d41586-018-03841-x
https://www.nature.com/articles/s41586-018-0024-3
Most mutations were acquired during the final dominant clonal expansion of the cancer and resulted from mutational processes that are absent from normal colorectal cells
Highly Sensitive Tumor Detection and Classification using Methylome Analysis of Plasma cfDNA
Daniel Diniz De Carvalho. Princess Margaret Cancer Centre, Toronto, ON, Canada
- DNA methylation patterns are tissue-specific and conserved across individuals
- Deconvolutiing the tumor microenviroment form DNA methylation data (to be published)
Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin
http://www.cell.com/action/showImagesData?pii=S0092-8674%2814%2900876-9
Recent genomic analyses of pathologically defined tumor types identify “within-a-tissue” disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.Analysis of Plasma Epstein–Barr Virus DNA to Screen for Nasopharyngeal Cancer
http://www.nejm.org/doi/full/10.1056/NEJMoa1701717- pros: 100-1000 copies per cell; high sensentivity
- cons: Low specificity; presense of EBV doesn’t mean cancer
Direct detection of early-stage cancers using circulating tumor DNA
http://stm.sciencemag.org/content/9/403/eaan2415
Early detection and intervention are likely to be the most effective means for reducing morbidity and mortality of human cancer. However, development of methods for noninvasive detection of early-stage tumors has remained a challenge. We have developed an approach called targeted error correction sequencing (TEC-Seq) that allows ultrasensitive direct evaluation of sequence changes in circulating cell-free DNA using massively parallel sequencing. We have used this approach to examine 58 cancer-related genes encompassing 81 kb. Analysis of plasma from 44 healthy individuals identified genomic changes related to clonal hematopoiesis in 16% of asymptomatic individuals but no alterations in driver genes related to solid cancers. Evaluation of 200 patients with colorectal, breast, lung, or ovarian cancer detected somatic mutations in the plasma of 71, 59, 59, and 68%, respectively, of patients with stage I or II disease. Analyses of mutations in the circulation revealed high concordance with alterations in the tumors of these patients. In patients with resectable colorectal cancers, higher amounts of preoperative circulating tumor DNA were associated with disease recurrence and decreased overall survival. These analyses provide a broadly applicable approach for noninvasive detection of early-stage tumors that may be useful for screening and management of patients with cancer.- pros: presense of driver mutations strongly associated with cancer (but not always); higher specificity
- cons: one or two somatic variations per cell; not many recurrent mutations to moniter; low sensentivity
Capture-based approach
cfMeDIP-seq
1 ng of DNA input Evaluation on pancreatic cancer, RRBS vs. cfMeDIP-seq (what is the difference?)
machine-learning classifier
Applicaiton: discriminate different cancer types
Non-invasive detection of cancers in plasma with DNA methylation haplotypes
Kun Zhang. University of California, San Diego, La Jolla, CA
- Methylation haplotype analysis of bisulfite sequencing data https://github.com/dinhdiep/MONOD2
- Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374016/
Detecting early-stage CRC with Guoxiang Cai, Fudan
specificy sensentivity
healthy 93.3 -
adenoma - 75
CRC plasma, stage 1 - 73.4
CRC plasma, stage 2 - 78.7
CRC plasma, stage 3 - 88
CRC plasma, stage 4 - 100
Cancer locator: Harnessing the diagnostic potential of cell-free DNA methylation
Jasmine Xianghong Zhou. University of California at Los Angeles, Los Angeles, CA
Epigenetic traces in plasma DNA
Michael R. Speicher. Medical University of Graz, Graz, Austria
- Methylation patterns in serum DNA for early identification of disseminated breast cancer
https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-017-0499-9
Sequence based DNA methylation analysis
Martin Hirst. The University of British Columbia, Vancouver, BC, Canada (http://hirstlab.msl.ubc.ca/)
How and why look for clusters of cis-regulatory elements (COREs, aka super-enhancers) in cancer
Mathieu Lupien. Princess Margaret Cancer Ctr., Toronto, ON, Canada
CREAM: Clustering of genomic REgions Analysis Method (https://www.biorxiv.org/content/early/2018/03/20/222562)
N-of-1 networks to personalized cancer treatment
Josh Stuart, 11/18/2017
Cancers come in several forms according to the organ and tissue of origin, the type of mutagen and the impacted genetic pathways that contribute to oncogenic progression. Pan-Cancer analyses across multiple types of cancers, using multiple types of omics data, have identified molecular-based subtypes of clinical importance. Even so, patients may not respond to the usual treatment regime and carry their own unique alterations. I will discuss network integration strategies for building patient-specific networks to model the aberrant wiring in a single person’s tumor. The goal is to then strategies treatment for the person based on critical nodes in the uncovered network.
Identify closest cancer form
Identify impactful variants
Identify an explanatory network model
Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin
https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(14)00876-9
tissue-of-origin signals, 10-20% reclassified associated w/ survial
Personal regulome navigation
Howard Chang, 11/2/2017
Regulome but not genome mutation pattern predicts clinical response to HDACi
http://www.cell.com/cancer-cell/pdf/S1535-6108(17)30204-0.pdf
T-ATAC: Transcript-indexed ATAC pairs single cell RNA and ATAC-Seq
https://www.nature.com/nature/journal/v547/n7661/abs/nature22976.html
Precision leukemia diagnosis with T-ATAC
Enhancer connectome in primary T cells
http://www.nature.com/ng/journal/v49/n11/full/ng.3963.html
http://www.nature.com/nmeth/journal/v13/n11/abs/nmeth.3999.html
Discoveries and opportunities for translation using Vanderbilt’s Gene X Medical Phenome Catalog
Nancy Cox, 11/2/2017
PrediXcan
https://github.com/hakyimlab/PrediXcan
Epigenomics signatures
Joseph Ecker
Organisms with identical genomes can exhibits distinct phenotypes, such as plants (fwa-1 vs. wt), insects, mammals.
DNA methylation dynamics
https://www.ncbi.nlm.nih.gov/pubmed/27203113
http://science.sciencemag.org/content/356/6337/eaaj2239
From genomic variation to molecular mechanism
Jan Korbel
Germline determinants of the somatic mutation landscape in 2,642 cancer genomes https://www.biorxiv.org/content/biorxiv/early/2017/11/01/208330.full.pdf
Structure variant discovery by paired-end sequencing
SVs associated with repetitive DNA: model for inversion information in the human genome. They are extremely difficult to detect, and are overlooked using main-stream NGS.
Strand-Seq
- Strand specific single cell DNA sequencing
- Inverstion mapping and haplotype phasing of whole genomes
- Detecting SVs >50kb
- Illumina, Matepairs, Pacbio, Stran-Seq comparision
Big data in biology
Ewan Birney
Human: the new model organism
- Similar to most other life forms on earch
- Outbred organisms with pretty good genetics
- Huge cohorts, millions of people
- Big (lots of cells)
- Willing participants – they take themselves to hospitals to be phenotyped, and genotyped
- Popular organisms – research into them attracts a lot of funding
- A great model organism for understanding biology – including human disease
- Human is a very impressive organism for discovery
- Massive cohort size
- Well-established phenotyping schemes
- Spill-over effect from healthcare studies and practice
But,
- You can’t fix environment
- You can’t sample its development
- You can’t do whole-organism perturbation experiements
Medaka fish
- Well established model (from 1910s)
- Highly inbred strains
- Excellent genetics
- Genome sequenced
- CRISPR, GFP
- Can inbreed from the wild
Advances in Cancer Immunotheray
Edgar Engleman, 11/1/2017
Bariers to effective immunotherapy
- Tumors contain normal component (self Ag) and immune cells are reluctant to attack self due to:
- “Central tolerance”, deletion of T and B lymphocytes with receptors for self
- “Peripheral tolerance” = checkpoint molecules, targets
- Tumor microenvironment is highly immunosuppressive (TAM, Treg, MDSC, DC, IDO, adenosine, TGF-beta, IL-10, VEGF)
- Tumors not only escape most therapies that target a single tumor component, they hijack the immune system to promote their growth
Early approach: tumor-binding mAbs
- mAbs recognizing tumor associated antigens or molecules that support their growth
- Each mAbs recognizes only a single target
- Trastuzumab (Herceptin) for breast cancer. Cetuximab (Erbitux) for NSCLC. Combined with chemotherapy, only limited improvement.
- cost effecive manufacturing
- widely applicable
- safe and well toleraged
- limited efficacy as monotheray
- mAb-drug conjuages
T cells can recognize and kill tumor cells
- Tumoral CD8+ T and Th1 cells are usually positive prognostic markers for survial
Another early approach: dentritic cells to stimulate T cell mediated anti-tumor immunity
- DC based immunotherapy Circa 1992
- high cost and complexity
- moderate efficacy
Chimeric antigen receptor (CAR) T cells
- recognize highly expressed antigenes independent of MHC
- High potency against hematological malignancies
- Not restricted HLA antigens
- High cost and complexity
- Can be toxic
- No demonstrated efficacy yet towards solid tumors
Immue checkpoint blocker
- PD1, expressed in APC and tumor cells. CTLA-4, only in APC
- Sensitive cancers: melanoma, renal, bladder, lung, gastric, HNSC, ovarian. Other solid tumors are mostly resistent
- Most drastic evidence up to now: Pembrolizumab (anti-PD-1) vs. chemotherapy for PD-L1 positive NSLSC. Reck M. et al. NEHJ 2016
- cost-effective manufacturing
- well-tolerated
- durable response against diverse tumors
- many patients are tumor types are resistent
- tumors with low mutational burdens
- tumors lacking PD-1
- tumors lacking T cell infiltrate (“cold” tumors)
- Autoimmunity
Circulating tumor DNA analysis for cancer detection and monitoring
Max Diehn, 10/19/2017
Introduction
- 175 bp (one nucleosome wrap)
- 5ng/ml of plasma in healthy adults
- In cancer, but also non-malignant conditons, such as inflammation
- mostly from hematopoietic cells (80-90%)
- Half-life: 0.5-2 hours
- dynamic range: 1pg/ml-10ng/ml
Detection methods (Detection limit)
- Sanger sequencing (>10%)
- Pyrosequencing (10%)
- WES (5%)
- WGS (1%)
- Allele-specific qPCR (0.1%)
- Digital PCR & Targeted NGS (0.05%)
- CAPP-Seq (0.00025%)
Development of CAPP-seq
Challenges: limited input molecules,low fractional abundance, inter-patient heterogeneity
- ~300 genes recurrently mutated in NSCLC and SCLC
- Ultra-deep sequencing (>20000x), pre-deduping
- Analytical sensitivity: 0.02%
Improve:
- Increasing number of cfDNA molecules recovered
- Increasing number of mutations covered
- Decreasing background error rare
Molecular barcoding for error repression
Stereotyped errors in cfDNA NGS data
Decreasing sequencing errors in cfDNA sequencing, iDES-enhanced CAPP-Seq
- Barcoding
- Polising
10cc blood 5cc plasma, 30ng cfDNA, 5000 hGEs Analytic sensitivy : generalized 0.002%, personalized 0.00025%
Minimal residual disease (MRD)
Small volumns of tumor cells remaining after treatment in patients who have no clinical evidence of disease
ctDNA MRD detection has been demonstrated in breast and colon cancers. How about lung cancer?
ctDNA is prognostic in node-negative patients. Patients with non-detection MRD cfDNA has longer survial.
ctDNA detections precedes clinical relapse.
Prospect for personalization of postradiotheray adjuvant treatment
- cfDNA negative: none
- ctDNA positive: personlized adjuvant therapy
- Tyrosine kiniase inh
- Immunotherapy Chemotherapy