72 research outputs found
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
Oxygen abundance in local disk and bulge: chemical evolution with a strictly universal IMF
The empirical differential oxygen abundance distribution (EDOD) is deduced
from subsamples related to two different samples involving solar neighbourhood
(SN) thick disk, thin disk, halo, and bulge stars. The EDOD of the SN thick +
thin disk is determined by weighting the mass, for assumed SN thick to thin
disk mass ratio within the range, 0.1-0.9. Inhomogeneous models of chemical
evolution for the SN thick disk, the SN thin disk, the SN thick + thin disk,
the SN halo, and the bulge, are computed assuming the instantaneous recycling
approximation. The EDOD data are fitted, to an acceptable extent, by their TDOD
counterparts provided (i) still undetected, low-oxygen abundance thin disk
stars exist, and (ii) a single oxygen overabundant star is removed from a thin
disk subsample. In any case, the (assumed power-law) stellar initial mass
function (IMF) is universal but gas can be inhibited from, or enhanced in,
forming stars at different rates with respect to a selected reference case.
Models involving a strictly universal IMF (i.e. gas neither inhibited from, nor
enhanced in, forming stars with respect to a selected reference case) can also
reproduce the data. The existence of a strictly universal IMF makes similar
chemical enrichment within active (i.e. undergoing star formation) regions
placed in different environments, but increasing probability of a region being
active passing from SN halo to SN thick + thin disk, SN thin disk, SN thick
disk, and bulge. On the basis of the results, it is realized that the chemical
evolution of the SN thick + thin disk as a whole cannot be excluded.Comment: 26 pages, 10 tables, and 5 figures; tables out of page are splitted
in two parts in Appendix B; sects.4 and 5 rewritten for better understanding
of the results; further references added. Accepted for publication in
Astrophysics & Space Scienc
Quantitative analysis of performance on a progressive-ratio schedule: effects of reinforcer type, food deprivation and acute treatment with Δ⁹-tetrahydrocannabinol (THC)
Rats’ performance on a progressive-ratio schedule maintained by sucrose (0.6 M, 50 μl) and corn oil (100%, 25 μl) reinforcers was assessed using a model derived from Killeen’s (1994) theory of scheduled-controlled behaviour, ‘Mathematical Principles of Reinforcement’. When the rats were maintained at 80% of their free-feeding body weights, the parameter expressing incentive value, a, was greater for the corn oil than for the sucrose reinforcer; the response-time parameter, δ, did not differ between the reinforcer types, but a parameter derived from the linear waiting principle (Tₒ), indicated that the minimum post-reinforcement pause was longer for corn oil than for sucrose. When the rats were maintained under free-feeding conditions, a was reduced, indicating a reduction of incentive value, but δ was unaltered. Under the food-deprived condition, the CB1 cannabinoid receptor agonist Δ⁹-tetrahydrocannabinol (THC: 0.3, 1 and 3 mg kg-1) increased the value of sucrose; none of the other parameters was affected by THC. The results provide new information about the sensitivity of the model’s parameters to deprivation and reinforcer quality, and suggest that THC selectively enhances the incentive value of sucrose
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.
Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy
A draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample
Biodiversity recovery of Neotropical secondary forests
Old-growth tropical forests harbor an immense diversity of tree species but are rapidly being cleared, while secondary forests that regrow on abandoned agricultural lands increase in extent. We assess how tree species richness and composition recover during secondary succession across gradients in environmental conditions and anthropogenic disturbance in an unprecedented multisite analysis for the Neotropics. Secondary forests recover remarkably fast in species richness but slowly in species composition. Secondary forests take a median time of five decades to recover the species richness of old-growth forest (80% recovery after 20 years) based on rarefaction analysis. Full recovery of species composition takes centuries (only 34% recovery after 20 years). A dual strategy that maintains both old-growth forests and species-rich secondary forests is therefore crucial for biodiversity conservation in human-modified tropical landscapes. Copyright © 2019 The Authors, some rights reserved
Mapping and characterization of structural variation in 17,795 human genomes
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0–11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing
Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects
- …