Search CORE

92,101 research outputs found

Assessing the functional structure of genomic data

Author: Alfarano
Ashburner
Brauer
Brauer
Brem
Brem
Bro
Bulik
C. Huttenhower
Charikar
Chitikila
David
Davierwala
Druzdzel
Eisen
Franke
Gansner
Gasch
Gavin
Giaever
Harbison
Helliwell
Hibbs
Ho
Hughes
Huh
Huttenhower
Huttenhower
Ideker
Jansen
Jelinsky
Karaoz
Kloster
Krogan
Krogan
Lee
Martin
Myers
Myers
Myers
Neapolitan
O'Rourke
O.G. Troyanskaya
Pitkanen
Schawalder
Segal
Spellman
Stark
Tong
Troyanskaya
Yvert
Zhao
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The availability of genome-scale data has enabled an abundance of novel analysis techniques for investigating a variety of systems-level biological relationships. As thousands of such datasets become available, they provide an opportunity to study high-level associations between cellular pathways and processes. This also allows the exploration of shared functional enrichments between diverse biological datasets, and it serves to direct experimenters to areas of low data coverage or with high probability of new discoveries

CiteSeerX

Crossref

PubMed Central

Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions

Author: Campos-González Adrian I.
Freyre-González Julio A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/01/2019
Field of study

Genetic regulatory networks (GRNs) have been widely studied, yet there is a lack of understanding with regards to the final size and properties of these networks, mainly due to no network currently being complete. In this study, we analyzed the distribution of GRN structural properties across a large set of distinct prokaryotic organisms and found a set of constrained characteristics such as network density and number of regulators. Our results allowed us to estimate the number of interactions that complete networks would have, a valuable insight that could aid in the daunting task of network curation, prediction, and validation. Using state-of-the-art statistical approaches, we also provided new evidence to settle a previously stated controversy that raised the possibility of complete biological networks being random and therefore attributing the observed scale-free properties to an artifact emerging from the sampling process during network discovery. Furthermore, we identified a set of properties that enabled us to assess the consistency of the connectivity distribution for various GRNs against different alternative statistical distributions. Our results favor the hypothesis that highly connected nodes (hubs) are not a consequence of network incompleteness. Finally, an interaction coverage computed for the GRNs as a proxy for completeness revealed that high-throughput based reconstructions of GRNs could yield biased networks with a low average clustering coefficient, showing that classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Queensland eSpace

A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data

Author: Fan Ruzong
Lin Nan
Xiong Momiao
Zhu Yun
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/09/2016
Field of study

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore multiple levels of representations of genetic variants, learn their internal patterns involved in the disease development, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new framework referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the nine competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and nine other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the nine other statistics.Comment: 64 pages including 12 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Recommended from our members

The how and why of lncRNA function: An innate immune perspective.

Author: Carpenter Susan
Covarrubias Sergio
Robinson Elektra K
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Next-generation sequencing has provided a more complete picture of the composition of the human transcriptome indicating that much of the "blueprint" is a vastness of poorly understood non-protein-coding transcripts. This includes a newly identified class of genes called long noncoding RNAs (lncRNAs). The lack of sequence conservation for lncRNAs across species meant that their biological importance was initially met with some skepticism. LncRNAs mediate their functions through interactions with proteins, RNA, DNA, or a combination of these. Their functions can often be dictated by their localization, sequence, and/or secondary structure. Here we provide a review of the approaches typically adopted to study the complexity of these genes with an emphasis on recent discoveries within the innate immune field. Finally, we discuss the challenges, as well as the emergence of new technologies that will continue to move this field forward and provide greater insight into the biological importance of this class of genes. This article is part of a Special Issue entitled: ncRNA in control of gene expression edited by Kotb Abdelmohsen

eScholarship - University of California

Gene silencing and large-scale domain structure of the E. coli genome

Author: Lagomarsino Marco Cosentino
Sclavi Bianca
Zarei Mina
Publication venue
Publication date: 28/09/2012
Field of study

The H-NS chromosome-organizing protein in E. coli can stabilize genomic DNA loops, and form oligomeric structures connected to repression of gene expression. Motivated by the link between chromosome organization, protein binding and gene expression, we analyzed publicly available genomic data sets of various origins, from genome-wide protein binding profiles to evolutionary information, exploring the connections between chromosomal organization, genesilencing, pseudo-gene localization and horizontal gene transfer. We report the existence of transcriptionally silent contiguous areas corresponding to large regions of H-NS protein binding along the genome, their position indicates a possible relationship with the known large-scale features of chromosome organization

arXiv.org e-Print Archive

Recommended from our members

Predicting taxonomic and functional structure of microbial communities in acid mine drainage.

Author: Chen Linxing
He Zhili
Hua Zhengshuang
Huang Linan
Jia Pu
Kuang Jialiang
Li Jintian
Li Shengjin
Liu Jun
Shu Wensheng
Zhou Jizhong
Publication venue: eScholarship, University of California
Publication date: 01/06/2016
Field of study

Predicting the dynamics of community composition and functional attributes responding to environmental changes is an essential goal in community ecology but remains a major challenge, particularly in microbial ecology. Here, by targeting a model system with low species richness, we explore the spatial distribution of taxonomic and functional structure of 40 acid mine drainage (AMD) microbial communities across Southeast China profiled by 16S ribosomal RNA pyrosequencing and a comprehensive microarray (GeoChip). Similar environmentally dependent patterns of dominant microbial lineages and key functional genes were observed regardless of the large-scale geographical isolation. Functional and phylogenetic β-diversities were significantly correlated, whereas functional metabolic potentials were strongly influenced by environmental conditions and community taxonomic structure. Using advanced modeling approaches based on artificial neural networks, we successfully predicted the taxonomic and functional dynamics with significantly higher prediction accuracies of metabolic potentials (average Bray-Curtis similarity 87.8) as compared with relative microbial abundances (similarity 66.8), implying that natural AMD microbial assemblages may be better predicted at the functional genes level rather than at taxonomic level. Furthermore, relative metabolic potentials of genes involved in many key ecological functions (for example, nitrogen and phosphate utilization, metals resistance and stress response) were extrapolated to increase under more acidic and metal-rich conditions, indicating a critical strategy of stress adaptation in these extraordinary communities. Collectively, our findings indicate that natural selection rather than geographic distance has a more crucial role in shaping the taxonomic and functional patterns of AMD microbial community that readily predicted by modeling methods and suggest that the model-based approach is essential to better understand natural acidophilic microbial communities

eScholarship - University of California

Missense-depleted regions in population exomes implicate ras superfamily nucleotide-binding protein alteration in patients with brain malformation.

Author: Dumas Kevin
Ge Xiaoyan
Gong Henry
Grody Wayne W
Hendriks Yvonne
Kwok Pui-Yan
Lee Hane
Litwin Jessica
Nelson Stanley F
Phillips Joanna J
Shieh Joseph Tc
Stuurman Kyra E
Waisfisz Quinten
Weiss Marjan M
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Genomic sequence interpretation can miss clinically relevant missense variants for several reasons. Rare missense variants are numerous in the exome and difficult to prioritise. Affected genes may also not have existing disease association. To improve variant prioritisation, we leverage population exome data to identify intragenic missense-depleted regions (MDRs) genome-wide that may be important in disease. We then use missense depletion analyses to help prioritise undiagnosed disease exome variants. We demonstrate application of this strategy to identify a novel gene association for human brain malformation. We identified de novo missense variants that affect the GDP/GTP-binding site of ARF1 in three unrelated patients. Corresponding functional analysis suggests ARF1 GDP/GTP-activation is affected by the specific missense mutations associated with heterotopia. These findings expand the genetic pathway underpinning neurologic disease that classically includes FLNA. ARF1 along with ARFGEF2 add further evidence implicating ARF/GEFs in the brain. Using functional ontology, top MDR-containing genes were highly enriched for nucleotide-binding function, suggesting these may be candidates for human disease. Routine consideration of MDR in the interpretation of exome data for rare diseases may help identify strong genetic factors for many severe conditions, infertility/reduction in reproductive capability, and embryonic conditions contributing to preterm loss

eScholarship - University of California

Principles for the post-GWAS functional characterisation of risk loci

Author: Alvaro N. A. Monteiro
Angela Risch
Chris Carlson
Christoph Plass
Dave Duggan
Gerhard A. Coetzee
Graham Casey
Haris G. Vikis
Ian G. Mills
Jay W. Tichelaar
Mariella De Biasi
Matthew L. Freedman
Michael James
Ming You
Pengyuan Liu
Simon A. Gayther
Publication venue
Publication date: 01/11/2010
Field of study

Several challenges lie ahead in assigning functionality to susceptibility SNPs. For example, most effect sizes are small relative to effects seen in monogenic diseases, with per allele odds ratios usually ranging from 1.15 to 1.3. It is unclear whether current molecular biology methods have enough resolution to differentiate such small effects. Our objective here is therefore to provide a set of recommendations to optimize the allocation of effort and resources in order maximize the chances of elucidating the functional contribution of specific loci to the disease phenotype. It has been estimated that 88% of currently identified disease-associated SNP are intronic or intergenic. Thus, in this paper we will focus our attention on the analysis of non-coding variants and outline a hierarchical approach for post-GWAS functional studies

Crossref

Nature Precedings

Automated data integration for developmental biological research

Author: Sternberg Paul W.
Zhong Weiwei
Publication venue: 'The Company of Biologists'
Publication date: 15/09/2007
Field of study

In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

Caltech Authors