69 research outputs found

    Bioinformática y biomedicina

    Get PDF
    Los científicos confían en la potencia de cálculo de los ordenadores para desarrollar métodos rápidos y baratos, que en el futuro permitan a un individuo secuenciar su propio genoma. Enormes volúmenes de datos que son una nueva meta pra la ciencia y ahora también pra la UMA : el Big-Data problem

    Pairwise and incremental multi-stage alignment of metagenomes: A new proposal

    Get PDF
    Traditional comparisons between metagenomes are often performed using reference databases as intermediary templates from which to obtain distance metrics. However, in order to fully exploit the potential of the information contained within metagenomes, it becomes of interest to remove any intermediate agent that is prone to introduce errors or biased results. In this work, we perform an analysis over the state of the art methods and deduce that it is necessary to employ fine-grained methods in order to assess similarity between metagenomes. In addition, we propose our developed method for accurate and fast matching of reads.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Magallanes: a web services discovery and automatic workflow composition tool

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools that can work together to solve real complex problems.</p> <p>Results</p> <p>Magallanes (Magellan) is a versatile, platform-independent Java library of algorithms aimed at discovering bioinformatics web services and associated data types. A second important feature of Magallanes is its ability to connect available and compatible web services into workflows that can process data sequentially to reach a desired output given a particular input. Magallanes' capabilities can be exploited both as an API or directly accessed through a graphic user interface.</p> <p>The Magallanes' API is freely available for academic use, and together with Magallanes application has been tested in MS-Windowsâ„¢ XP and Unix-like operating systems. Detailed implementation information, including user manuals and tutorials, is available at <url>http://www.bitlab-es.com/magallanes</url>.</p> <p>Conclusion</p> <p>Different implementations of the same client (web page, desktop applications, web services, etc.) have been deployed and are currently in use in real installations such as the National Institute of Bioinformatics (Spain) and the ACGT-EU project. This shows the potential utility and versatility of the software library, including the integration of novel tools in the domain and with strong evidences in the line of facilitate the automatic discovering and composition of workflows.</p

    MAPI: towards the integrated exploitation of bioinformatics Web Services

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity.</p> <p>Results</p> <p>To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (<it>MAPI</it>) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of <it>MAPI </it>is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client.</p> <p>Conclusions</p> <p>The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. <it>GRID</it>-based, SOAP, <it>BioMOBY</it>, <it>R-bioconductor</it>, and others).</p

    Analyzing the differences between reads and contigs when performing a taxonomic assignment comparison in metagenomics

    Get PDF
    Metagenomics is an inherently complex field in which one of the primary goals is to determine the compositional organisms present in an environmental sample. Thereby, diverse tools have been developed that are based on the similarity search results obtained from comparing a set of sequences against a database. However, to achieve this goal there still are affairs to solve such as dealing with genomic variants and detecting repeated sequences that could belong to different species in a mixture of uneven and unknown representation of organisms in a sample. Hence, the question of whether analyzing a sample with reads provides further understanding of the metagenome than with contigs arises. The assembly yields larger genomic fragments but bears the risk of producing chimeric contigs. On the other hand, reads are shorter and therefore their statistical significance is harder to asses, but there is a larger number of them. Consequently, we have developed a workflow to assess and compare the quality of each of these alternatives. Synthetic read datasets beloging to previously identified organisms are generated in order to validate the results. Afterwards, we assemble these into a set of contigs and perform a taxonomic analysis on both datasets. The tools we have developed demonstrate that analyzing with reads provide a more trustworthy representation of the species in a sample than contigs especially in cases that present a high genomic variability.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Workflows and service discovery: a mobile device approach

    Get PDF
    Bioinformatics has moved from command-line standalone programs to web-service based environments. Such trend has resulted in an enormous amount of online resources which can be hard to find and identify, let alone execute and exploit. Furthermore, these resources are aimed -in general- to solve specific tasks. Usually, this tasks need to be combined in order to achieve the desired results. In this line, finding the appropriate set of tools to build up a workflow to solve a problem with the services available in a repository is itself a complex exercise. Issues such as services discovering, composition and representation appear. On the technological side, mobile devices have experienced an incredible growth in the number of users and technical capabilities. Starting from this reality, in the present paper, we propose a solution for service discovering and workflow generation while distinct approaches of representing workflows in a mobile environment are reviewed and discussed. As a proof of concept, a specific use case has been developed: we have embedded an expanded version of our Magallanes search engine into mORCA, our mobile client for bioinformatics. Such composition delivers a powerful and ubiquitous solution that provides the user with a handy tool for not only generate and represent workflows, but also services, data types, operations and service types discoveryUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Irregular alignment of arbitrarily long DNA sequences on GPU

    Get PDF
    The use of Graphics Processing Units to accelerate computational applications is increasingly being adopted due to its affordability, flexibility and performance. However, achieving top performance comes at the price of restricted data-parallelism models. In the case of sequence alignment, most GPU-based approaches focus on accelerating the Smith-Waterman dynamic programming algorithm due to its regularity. Nevertheless, because of its quadratic complexity, it becomes impractical when comparing long sequences, and therefore heuristic methods are required to reduce the search space. We present GPUGECKO, a CUDA implementation for the sequential, seed-and-extend sequence-comparison algorithm, GECKO. Our proposal includes optimized kernels based on collective operations capable of producing arbitrarily long alignments while dealing with heterogeneous and unpredictable load. Contrary to other state-of-the-art methods, GPUGECKO employs a batching mechanism that prevents memory exhaustion by not requiring to fit all alignments at once into the device memory, therefore enabling to run massive comparisons exhaustively with improved sensitivity while also providing up to 6x average speedup w.r.t. the CUDA acceleration of BLASTN.Funding for open access publishing: Universidad Málaga/CBUA /// This work has been partially supported by the European project ELIXIR-EXCELERATE (grant no. 676559), the Spanish national project Plataforma de Recursos Biomoleculares y Bioinformáticos (ISCIII-PT13.0001.0012 and ISCIII-PT17.0009.0022), the Fondo Europeo de Desarrollo Regional (UMA18-FEDERJA-156, UMA20-FEDERJA-059), the Junta de Andalucía (P18-FR-3130), the Instituto de Investigación Biomédica de Málaga IBIMA and the University of Málaga

    PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way.</p> <p>Results</p> <p>PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled.</p> <p>Conclusion</p> <p>PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.</p

    Sma3s: A three-step modular annotator for large sequence datasets

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution License.Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes.This work has been partially financed by the National Institute for Bioinformatics (www.inab.org), a platform of Genoma España and the EC project ‘Advancing Clinico-Genomic Trials on Cancer’ (contract no. 026996).Peer Reviewe

    Towards the intelligent diagnosis of hematological diseases

    Get PDF
    In traditional medicine, patient diagnosis usually implies an in depth study of its state and symptoms that a specialist has to carry out. The adaptation and customization of the medical treatment to those individual characteristics of each patient is what we know as Precision Medicine. Furthermore, in the case of multidisciplinary fields such as haematology, the identification of several diseases usually implies complex analyses in order to have a high degree of certainty in the diagnosis. A better understanding of the clinical tests and their relationship and the finding of new patterns between them will enable us to avoid a significant amount of such tests by supporting the specialist with new information. In this line, Artificial Intelligence has proven to be a useful methodology for data analytics in general whose main drawback is the need of huge amounts of data to achieve high accuracy. In the particular case of clinical data, it is widely generated in hospitals but the lack of standardization and the difficulties of availability require complex preprocessing. Therefore, we have collected 100,000 complete blood counts and developed a method to 1) automatically label textual diagnosis using deep neural networks with Long short-term memory cells. In this approach, a group of specialists has manually labelled 1,000 CBCs through a mobile application, which have then been used to feed the network in order to learn to interpret the diagnosis, and 2) to make an intelligent diagnosis of new samples in which a subset of 10,000 CBCs has been used as an input to a Support Vector Machine. In summary, in this work we present two different prototypes of architectures in order to define methods for the collection, preprocessing and intelligent classification of clinical data, focusing in haematological disease. Our proposal presents encouraging results with accuracies greater than 90% in both cases.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
    • …
    corecore