6 research outputs found
Bioinformática y biomedicina
Los cientĂficos confĂan en la potencia de cálculo de los ordenadores para desarrollar mĂ©todos rápidos y baratos, que en el futuro permitan a un individuo secuenciar su propio genoma. Enormes volĂşmenes de datos que son una nueva meta pra la ciencia y ahora tambiĂ©n pra la UMA : el Big-Data problem
Pairwise and incremental multi-stage alignment of metagenomes: A new proposal
Traditional comparisons between metagenomes are often performed using reference databases as intermediary templates from which to obtain distance metrics. However, in order to fully exploit the potential of the information contained within metagenomes, it becomes of interest to remove any intermediate agent that is prone to introduce errors or biased results. In this work, we perform an analysis over the state of the art methods and deduce that it is necessary to employ fine-grained methods in order to assess similarity between metagenomes. In addition, we propose our developed method for accurate and fast matching of reads.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Irregular alignment of arbitrarily long DNA sequences on GPU
The use of Graphics Processing Units to accelerate computational applications is increasingly being adopted due to its affordability, flexibility and performance. However, achieving top performance comes at the price of restricted data-parallelism models. In the case of sequence alignment, most GPU-based approaches focus on accelerating the Smith-Waterman dynamic programming algorithm due to its regularity. Nevertheless, because of its quadratic complexity, it becomes impractical when comparing long sequences, and therefore heuristic methods are required to reduce the search space. We present GPUGECKO, a CUDA implementation for the sequential, seed-and-extend sequence-comparison algorithm, GECKO. Our proposal includes optimized kernels based on collective operations capable of producing arbitrarily long alignments while dealing with heterogeneous and unpredictable load. Contrary to other state-of-the-art methods, GPUGECKO employs a batching mechanism that prevents memory exhaustion by not requiring to fit all alignments at once into the device memory, therefore enabling to run massive comparisons exhaustively with improved sensitivity while also providing up to 6x average speedup w.r.t. the CUDA acceleration of BLASTN.Funding for open access publishing: Universidad Málaga/CBUA /// This work has been partially supported by the European project ELIXIR-EXCELERATE (grant no. 676559), the Spanish national project Plataforma de Recursos Biomoleculares y Bioinformáticos (ISCIII-PT13.0001.0012 and ISCIII-PT17.0009.0022), the Fondo Europeo de Desarrollo Regional (UMA18-FEDERJA-156, UMA20-FEDERJA-059), the Junta de AndalucĂa (P18-FR-3130), the Instituto de InvestigaciĂłn BiomĂ©dica de Málaga IBIMA and the University of Málaga
Analyzing the differences between reads and contigs when performing a taxonomic assignment comparison in metagenomics
Metagenomics is an inherently complex field in which one of
the primary goals is to determine the compositional organisms present
in an environmental sample. Thereby, diverse tools have been developed
that are based on the similarity search results obtained from comparing
a set of sequences against a database. However, to achieve this goal
there still are affairs to solve such as dealing with genomic variants and
detecting repeated sequences that could belong to different species in a
mixture of uneven and unknown representation of organisms in a sample.
Hence, the question of whether analyzing a sample with reads provides
further understanding of the metagenome than with contigs arises. The
assembly yields larger genomic fragments but bears the risk of producing
chimeric contigs. On the other hand, reads are shorter and therefore
their statistical significance is harder to asses, but there is a larger number
of them. Consequently, we have developed a workflow to assess and
compare the quality of each of these alternatives. Synthetic read datasets
beloging to previously identified organisms are generated in order to validate
the results. Afterwards, we assemble these into a set of contigs and
perform a taxonomic analysis on both datasets. The tools we have developed
demonstrate that analyzing with reads provide a more trustworthy
representation of the species in a sample than contigs especially in cases
that present a high genomic variability.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Workflows and service discovery: a mobile device approach
Bioinformatics has moved from command-line standalone
programs to web-service based environments. Such trend has resulted
in an enormous amount of online resources which can be hard to find
and identify, let alone execute and exploit. Furthermore, these resources
are aimed -in general- to solve specific tasks. Usually, this tasks need to
be combined in order to achieve the desired results. In this line, finding
the appropriate set of tools to build up a workflow to solve a problem
with the services available in a repository is itself a complex exercise. Issues
such as services discovering, composition and representation appear.
On the technological side, mobile devices have experienced an incredible
growth in the number of users and technical capabilities. Starting from
this reality, in the present paper, we propose a solution for service discovering
and workflow generation while distinct approaches of representing
workflows in a mobile environment are reviewed and discussed. As a
proof of concept, a specific use case has been developed: we have embedded
an expanded version of our Magallanes search engine into mORCA,
our mobile client for bioinformatics. Such composition delivers a powerful
and ubiquitous solution that provides the user with a handy tool for
not only generate and represent workflows, but also services, data types,
operations and service types discoveryUniversidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Towards the intelligent diagnosis of hematological diseases
In traditional medicine, patient diagnosis usually implies an in depth study of its state and symptoms that a specialist has to carry out. The adaptation and customization of the medical treatment to those individual characteristics of each patient is what we know as Precision Medicine.
Furthermore, in the case of multidisciplinary fields such as haematology, the identification of several diseases usually implies complex analyses in order to have a high degree of certainty in the diagnosis. A better understanding of the clinical tests and their relationship and the finding of new patterns between them will enable us to avoid a significant amount of such tests by supporting the specialist with new information.
In this line, Artificial Intelligence has proven to be a useful methodology for data analytics in general whose main drawback is the need of huge amounts of data to achieve high accuracy. In the particular case of clinical data, it is widely generated in hospitals but the lack of standardization and the difficulties of availability require complex preprocessing. Therefore, we have collected 100,000 complete blood counts and developed a method to 1) automatically label textual diagnosis using deep neural networks with Long short-term memory cells. In this approach, a group of specialists has manually labelled 1,000 CBCs through a mobile application, which have then been used to feed the network in order to learn to interpret the diagnosis, and 2) to make an intelligent diagnosis of new samples in which a subset of 10,000 CBCs has been used as an input to a Support Vector Machine.
In summary, in this work we present two different prototypes of architectures in order to define methods for the collection, preprocessing and intelligent classification of clinical data, focusing in haematological disease. Our proposal presents encouraging results with accuracies greater than 90% in both cases.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech