92,101 research outputs found
Assessing the functional structure of genomic data
Motivation: The availability of genome-scale data has enabled an abundance of novel analysis techniques for investigating a variety of systems-level biological relationships. As thousands of such datasets become available, they provide an opportunity to study high-level associations between cellular pathways and processes. This also allows the exploration of shared functional enrichments between diverse biological datasets, and it serves to direct experimenters to areas of low data coverage or with high probability of new discoveries
Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions
Genetic regulatory networks (GRNs) have been widely studied, yet there is a
lack of understanding with regards to the final size and properties of these
networks, mainly due to no network currently being complete. In this study, we
analyzed the distribution of GRN structural properties across a large set of
distinct prokaryotic organisms and found a set of constrained characteristics
such as network density and number of regulators. Our results allowed us to
estimate the number of interactions that complete networks would have, a
valuable insight that could aid in the daunting task of network curation,
prediction, and validation. Using state-of-the-art statistical approaches, we
also provided new evidence to settle a previously stated controversy that
raised the possibility of complete biological networks being random and
therefore attributing the observed scale-free properties to an artifact
emerging from the sampling process during network discovery. Furthermore, we
identified a set of properties that enabled us to assess the consistency of the
connectivity distribution for various GRNs against different alternative
statistical distributions. Our results favor the hypothesis that highly
connected nodes (hubs) are not a consequence of network incompleteness.
Finally, an interaction coverage computed for the GRNs as a proxy for
completeness revealed that high-throughput based reconstructions of GRNs could
yield biased networks with a low average clustering coefficient, showing that
classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio
A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data
Investigating the pleiotropic effects of genetic variants can increase
statistical power, provide important information to achieve deep understanding
of the complex genetic structures of disease, and offer powerful tools for
designing effective treatments with fewer side effects. However, the current
multiple phenotype association analysis paradigm lacks breadth (number of
phenotypes and genetic variants jointly analyzed at the same time) and depth
(hierarchical structure of phenotype and genotypes). A key issue for high
dimensional pleiotropic analysis is to effectively extract informative internal
representation and features from high dimensional genotype and phenotype data.
To explore multiple levels of representations of genetic variants, learn their
internal patterns involved in the disease development, and overcome critical
barriers in advancing the development of novel statistical methods and
computational algorithms for genetic pleiotropic analysis, we proposed a new
framework referred to as a quadratically regularized functional CCA (QRFCCA)
for association analysis which combines three approaches: (1) quadratically
regularized matrix factorization, (2) functional data analysis and (3)
canonical correlation analysis (CCA). Large-scale simulations show that the
QRFCCA has a much higher power than that of the nine competing statistics while
retaining the appropriate type 1 errors. To further evaluate performance, the
QRFCCA and nine other statistics are applied to the whole genome sequencing
dataset from the TwinsUK study. We identify a total of 79 genes with rare
variants and 67 genes with common variants significantly associated with the 46
traits using QRFCCA. The results show that the QRFCCA substantially outperforms
the nine other statistics.Comment: 64 pages including 12 figure
Recommended from our members
The how and why of lncRNA function: An innate immune perspective.
Next-generation sequencing has provided a more complete picture of the composition of the human transcriptome indicating that much of the "blueprint" is a vastness of poorly understood non-protein-coding transcripts. This includes a newly identified class of genes called long noncoding RNAs (lncRNAs). The lack of sequence conservation for lncRNAs across species meant that their biological importance was initially met with some skepticism. LncRNAs mediate their functions through interactions with proteins, RNA, DNA, or a combination of these. Their functions can often be dictated by their localization, sequence, and/or secondary structure. Here we provide a review of the approaches typically adopted to study the complexity of these genes with an emphasis on recent discoveries within the innate immune field. Finally, we discuss the challenges, as well as the emergence of new technologies that will continue to move this field forward and provide greater insight into the biological importance of this class of genes. This article is part of a Special Issue entitled: ncRNA in control of gene expression edited by Kotb Abdelmohsen
Gene silencing and large-scale domain structure of the E. coli genome
The H-NS chromosome-organizing protein in E. coli can stabilize genomic DNA
loops, and form oligomeric structures connected to repression of gene
expression. Motivated by the link between chromosome organization, protein
binding and gene expression, we analyzed publicly available genomic data sets
of various origins, from genome-wide protein binding profiles to evolutionary
information, exploring the connections between chromosomal organization,
genesilencing, pseudo-gene localization and horizontal gene transfer. We report
the existence of transcriptionally silent contiguous areas corresponding to
large regions of H-NS protein binding along the genome, their position
indicates a possible relationship with the known large-scale features of
chromosome organization
Recommended from our members
Predicting taxonomic and functional structure of microbial communities in acid mine drainage.
Predicting the dynamics of community composition and functional attributes responding to environmental changes is an essential goal in community ecology but remains a major challenge, particularly in microbial ecology. Here, by targeting a model system with low species richness, we explore the spatial distribution of taxonomic and functional structure of 40 acid mine drainage (AMD) microbial communities across Southeast China profiled by 16S ribosomal RNA pyrosequencing and a comprehensive microarray (GeoChip). Similar environmentally dependent patterns of dominant microbial lineages and key functional genes were observed regardless of the large-scale geographical isolation. Functional and phylogenetic β-diversities were significantly correlated, whereas functional metabolic potentials were strongly influenced by environmental conditions and community taxonomic structure. Using advanced modeling approaches based on artificial neural networks, we successfully predicted the taxonomic and functional dynamics with significantly higher prediction accuracies of metabolic potentials (average Bray-Curtis similarity 87.8) as compared with relative microbial abundances (similarity 66.8), implying that natural AMD microbial assemblages may be better predicted at the functional genes level rather than at taxonomic level. Furthermore, relative metabolic potentials of genes involved in many key ecological functions (for example, nitrogen and phosphate utilization, metals resistance and stress response) were extrapolated to increase under more acidic and metal-rich conditions, indicating a critical strategy of stress adaptation in these extraordinary communities. Collectively, our findings indicate that natural selection rather than geographic distance has a more crucial role in shaping the taxonomic and functional patterns of AMD microbial community that readily predicted by modeling methods and suggest that the model-based approach is essential to better understand natural acidophilic microbial communities
Missense-depleted regions in population exomes implicate ras superfamily nucleotide-binding protein alteration in patients with brain malformation.
Genomic sequence interpretation can miss clinically relevant missense variants for several reasons. Rare missense variants are numerous in the exome and difficult to prioritise. Affected genes may also not have existing disease association. To improve variant prioritisation, we leverage population exome data to identify intragenic missense-depleted regions (MDRs) genome-wide that may be important in disease. We then use missense depletion analyses to help prioritise undiagnosed disease exome variants. We demonstrate application of this strategy to identify a novel gene association for human brain malformation. We identified de novo missense variants that affect the GDP/GTP-binding site of ARF1 in three unrelated patients. Corresponding functional analysis suggests ARF1 GDP/GTP-activation is affected by the specific missense mutations associated with heterotopia. These findings expand the genetic pathway underpinning neurologic disease that classically includes FLNA. ARF1 along with ARFGEF2 add further evidence implicating ARF/GEFs in the brain. Using functional ontology, top MDR-containing genes were highly enriched for nucleotide-binding function, suggesting these may be candidates for human disease. Routine consideration of MDR in the interpretation of exome data for rare diseases may help identify strong genetic factors for many severe conditions, infertility/reduction in reproductive capability, and embryonic conditions contributing to preterm loss
Principles for the post-GWAS functional characterisation of risk loci
Several challenges lie ahead in assigning functionality to susceptibility SNPs. For example, most effect sizes are small relative to effects seen in monogenic diseases, with per allele odds ratios usually ranging from 1.15 to 1.3. It is unclear whether current molecular biology methods have enough resolution to differentiate such small effects. Our objective here is therefore to provide a set of recommendations to optimize the allocation of effort and resources in order maximize the chances of elucidating the functional contribution of specific loci to the disease phenotype. It has been estimated that 88% of currently identified disease-associated SNP are intronic or intergenic. Thus, in this paper we will focus our attention on the analysis of non-coding variants and outline a hierarchical approach for post-GWAS functional studies
Automated data integration for developmental biological research
In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research
- …