2,602 research outputs found
A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition
Explainable AI for Bioinformatics: Methods, Tools, and Applications
Artificial intelligence(AI) systems based on deep neural networks (DNNs) and
machine learning (ML) algorithms are increasingly used to solve critical
problems in bioinformatics, biomedical informatics, and precision medicine.
However, complex DNN or ML models that are unavoidably opaque and perceived as
black-box methods, may not be able to explain why and how they make certain
decisions. Such black-box models are difficult to comprehend not only for
targeted users and decision-makers but also for AI developers. Besides, in
sensitive areas like healthcare, explainability and accountability are not only
desirable properties of AI but also legal requirements -- especially when AI
may have significant impacts on human lives. Explainable artificial
intelligence (XAI) is an emerging field that aims to mitigate the opaqueness of
black-box models and make it possible to interpret how AI systems make their
decisions with transparency. An interpretable ML model can explain how it makes
predictions and which factors affect the model's outcomes. The majority of
state-of-the-art interpretable ML methods have been developed in a
domain-agnostic way and originate from computer vision, automated reasoning, or
even statistics. Many of these methods cannot be directly applied to
bioinformatics problems, without prior customization, extension, and domain
adoption. In this paper, we discuss the importance of explainability with a
focus on bioinformatics. We analyse and comprehensively overview of
model-specific and model-agnostic interpretable ML methods and tools. Via
several case studies covering bioimaging, cancer genomics, and biomedical text
mining, we show how bioinformatics research could benefit from XAI methods and
how they could help improve decision fairness
05361 Abstracts Collection -- Algorithmic Aspects of Large and Complex Networks
From 04.09.05 to 09.09.05, the Dagstuhl Seminar 05361 ``Algorithmic Aspects of Large and Complex Networks\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Family classification without domain chaining
Motivation: Classification of gene and protein sequences into homologous families, i.e. sets of sequences that share common ancestry, is an essential step in comparative genomic analyses. This is typically achieved by construction of a sequence homology network, followed by clustering to identify dense subgraphs corresponding to families. Accurate classification of single domain families is now within reach due to major algorithmic advances in remote homology detection and graph clustering. However, classification of multidomain families remains a significant challenge. The presence of the same domain in sequences that do not share common ancestry introduces false edges in the homology network that link unrelated families and stymy clustering algorithms
Genomic data analysis using grid-based computing
Microarray experiments generate a plethora of genomic data; therefore we need techniques and architectures to analyze this data more quickly. This thesis presents a solution for reducing the computation time of a highly computationally intensive data analysis part of a genomic application. The application used is the Stanford Microarray Database (SMD). SMD\u27s implementation, working, and analysis features are described. The reasons for choosing the computationally intensive problems of the SMD, and the background importance of these problems are presented. This thesis presents an effective parallel solution to the computational problem, including the difficulties faced with the parallelization of the problem and the results achieved. Finally, future research directions for achieving even greater speedups are presented
- âŠ