13 research outputs found

    ISMB/ECCB 2015

    Get PDF

    Allele specific expression analysis in bovine muscle tissue.

    Get PDF
    Imprinted genes have been target of many studies, mainly in human and mouse, and lately in bovines due to the interest of understanding the epigenetic mechanisms underlying important meat quality phenotypes and the possibility of applying it in animal breeding programs in the future. Genomic DNA from 146 steers was genotyped using the Illumina BovineHD BeadChip in order to identify heterozygous individuals with known allele origin.ISMB/ECCB 2015. Pôster D17

    SNPs and INDELs in genes involved in lipid metabolism of mammary gland of Zebu breeds identified by whole genome sequencing.

    Get PDF
    In this context, the objective of this study was to sequence and to map the genome of three Guzerá bulls and three Gir bulls in order to identify zebu-specific variations involved in the lipid metabolism of the mammary gland.ISMB/ECCB 2015. Pôster G28

    Identification of causal genes for complex traits.

    Get PDF
    MotivationAlthough genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations.ResultsIn this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2.Availability and implementationSoftware is freely available for download at genetics.cs.ucla.edu/caviar

    Gene network inference by fusing data from diverse distributions

    Get PDF
    Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies

    Comparing genomes with rearrangements and segmental duplications

    Get PDF
    Motivation: Large-scale evolutionary events such as genomic rearrange. ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. Results: We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons

    Modeling Functional Modules Using Statistical and Machine Learning Methods

    Get PDF
    Understanding the aspects of the cell functionality that account for disease or drug action mechanisms is the main challenge for precision medicine. In spite of the increasing availability of genomic and transcriptomic data, there is still a gap between the detection of perturbations in gene expression and the understanding of their contribution to the molecular mechanisms that ultimately account for the phenotype studied. Over the last decade, different computational and mathematical models have been proposed for pathway analysis. However, they are not taking into account the dynamic mechanisms contained by pathways as represented in their layout and the interactions between genes and proteins. In this thesis, I present two slightly different mathematical models to integrate human transcriptomic data with prior knowledge of signalling and metabolic pathways to estimate the Mechanistic Pathway Activities (MPAs). MPAs are continuous and individual level values that can be used with machine learning and statistical methods to determine biomarkers for the early diagnosis and subtype classification of the diseases, and also to suggest potential therapeutic targets for individualized therapeutic interventions. The overall objective is, developing new and advanced systems biology approaches to propose functional hypotheses that help us to understand and interpret the complex mechanism of the diseases. These mechanisms are crucial for robust personalized drug treatments and predict clinical outcomes. First, I contributed to the development of a method which is designed to extract elementary sub-pathways from a signalling pathway and to estimate their activity. Second, this algorithm adapted to metabolic modules and it is implemented as a webtool. Third, the method used to reveal a pan-cancer metabolic landscape. In this study, I analyzed the metabolic module profile of 25 different cancer types and the method is also validated using different computational and experimental approaches. Each method developed in this thesis was benchmarked against the existing similar methods, evaluated for their sensitivity and specificity, experimentally validated when it is possible and used to predict clinical outcomes of different cancer types. The research described in this thesis and the results obtained were published in different systems biology and cancer-related peer-reviewed journals and also in national newspapers

    Machine learning for large and small data biomedical discovery

    Get PDF
    In modern biomedicine, the role of computation becomes more crucial in light of the ever-increasing growth of biological data, which requires effective computational methods to integrate them in a meaningful way and unveil previously undiscovered biological insights. In this dissertation, we introduce a series of machine learning algorithms for biomedical discovery. Focused on protein functions in the context of system biology, these machine learning algorithms learn representations of protein sequences, structures, and networks in both the small- and large-data scenarios. First, we present a deep learning model that learns evolutionary contexts integrated representations of protein sequence and assists to discover protein variants with enhanced functions in protein engineering. Second, we describe a geometric deep learning model that learns representations of protein and compound structures to inform the prediction of protein-compound binding affinity. Third, we introduce a machine learning algorithm to integrate heterogeneous networks by learning compact network representations and to achieve drug repurposing by predicting novel drug-target interaction. We also present new scientific discoveries enabled by these machine learning algorithms. Taken together, this dissertation demonstrates the potential of machine learning to address the small- and large-data challenges of biomedical data and transform data into actionable insights and new discoveries