13 research outputs found
Optimización de la Entrada Salida mediante librerías y lenguajes paralelos
Uno de los grandes retos de la HPC (High Performance Computing) consiste en
optimizar el subsistema de Entrada/Salida, (E/S), o I/O (Input/Output). Ken
Batcher resume este hecho en la siguiente frase: "Un supercomputador es un
dispositivo que convierte los problemas limitados por la potencia de cálculo en
problemas limitados por la E/S" ("A Supercomputer is a device for turning compute-bound
problems into I/O-bound problems") . En otras palabras, el cuello de botella ya no
reside tanto en el procesamiento de los datos como en la disponibilidad de los
mismos. Además, este problema se exacerbará con la llegada del Exascale y la
popularización de las aplicaciones Big Data.
En este contexto, esta tesis contribuye a mejorar el rendimiento y la facilidad
de uso del subsistema de E/S de los sistemas de supercomputación.
Principalmente se proponen dos contribuciones al respecto: i) una interfaz de
E/S desarrollada para el lenguaje Chapel que mejora la productividad del
programador a la hora de codificar las operaciones de E/S; y ii) una
implementación optimizada del almacenamiento de datos de secuencias genéticas.
Con más detalle, la primera contribución estudia y analiza distintas
optimizaciones de la E/S en Chapel, al tiempo que provee a los usuarios de una
interfaz simple para el acceso paralelo y distribuido a los datos contenidos en
ficheros. Por tanto, contribuimos tanto a aumentar la productividad de los
desarrolladores, como a que la implementación sea lo más óptima posible.
La segunda contribución también se enmarca dentro de los problemas de E/S, pero
en este caso se centra en mejorar el almacenamiento de los datos de secuencias
genéticas, incluyendo su compresión, y en permitir un uso eficiente de esos
datos por parte de las aplicaciones existentes, permitiendo una recuperación
eficiente tanto de forma secuencial como aleatoria. Adicionalmente, proponemos
una implementación paralela basada en Chapel
miRNA as biomarker in lung cancer
Lung cancer has a high prevalence and mortality due to its late diagnosis and limited treatment, so it is essential to find biomarkers that allow a faster diagnosis and improve the survival of these patients. In this sense, biomarkers based on miRNAs have supposed a considerable advance. miRNAs, which are small RNA sequences, can regulate gene expression, so they play an essential role not only as a diagnostic biomarker but also as a therapeutic and prognostic one. Also, miRNA biomarkers can be obtained from liquid biopsies, which are less intrusive than lung biopsies, and have better accessibil-ity, safety and repeatability, which allows using those biomarkers both for diagnosis and monitoring of patients. In this review, we highlight the importance of miRNAs and collect the existing evidence of their relationship with lung cancer.Funding for open access charge: Universidad de Málaga / CBUA
Funding for open access publishing: Universidad Málaga / CBU
Advancements in long-read genome sequencing technologies and algorithms
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford
Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in
sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to
the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored
by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and
throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this
dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to
orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay
between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of
genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these
advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal
strategies for achieving robust and insightful genome assemblies.Funding for open access charge: Universidad de Málaga / CBU
Biomarker potential of repetitive-element transcriptome in lung cancer
Since repetitive elements (REs) account for nearly 53% of the human genome, profiling its transcription after an oncogenic change might help in the search for new biomarkers. Lung cancer was selected as target since it is the most frequent cause of cancer death. A bioinformatic workflow based on well-established bioinformatic tools (such as RepEnrich, RepBase, SAMTools, edgeR and DESeq2) has been developed to identify differentially expressed RNAs from REs. It was trained and tested with public RNA- seq data from matched sequencing of tumour and healthy lung tissues from the same patient to reveal differential expression within the RE transcriptome. Healthy lung tissues express a specific set of REs whose expression, after an oncogenic process, is strictly and specifically changed. Discrete sets of differentially expressed REs were found for lung adenocarcinoma, for small-cell lung cancer, and for both cancers. Differential expression affects more HERV-than LINE-derived REs and seems biased towards down- regulation in cancer cells. REs behaving consistently in all patients were tested in a different patient cohort to validate the proposed biomarkers. Down-regulation of AluYg6 and LTR18B was confirmed as potential lung cancer biomarkers, while up- regulation of HERVK11D-Int is specific for lung adenocarcinoma and up-regulation of UCON88 is specific for small cell lung cancer. Hence, the study of RE transcriptome might be considered another research target in cancer, making REs a promising source of lung cancer biomarkers
Whole-Genome Assembly: An Experimental Study of Computational Costs and Architectural Opportunities
Whole-genome sequencing (WGS) pro- vides a huge amount of reads from which a comple- te genome could be assembled. The recent advent of long read sequencing technologies, such as PacBio and Oxford Nanopore, and the subsequent appearance of high quality long reads (single molecule high-fidelity, or HiFi) have improved the scaffolding of the genome. However, both biology and computing communities still face great challenges in terms of computational cost. Thus, it is essential a high precision characte- rization of the methods for a correct identification of the main computing bottlenecks. This study will allow us to design new methods to mitigate compu- tational costs without losing accuracy and to adapt such methods to fully exploit new architectures that provide support to handle big amounts of data. In this paper, we experimentally study and characterize the most used whole-genome assemblers in order to design new approaches in this field.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Comparing assembly strategies for third-generation sequencing technologies across different genomes
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.This work has been partially supported by the Spanish MINECO PID2019-105396RB-I00, Junta de Andalucia JA2018 P18-FR-3433, and UMA18-FEDERJA-197 projects. Funding for open access charge: Universidad de Málaga/CBUA.Peer ReviewedPostprint (published version
Comparing assembly strategies for third-generation sequencing technologies across different genomes
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.Funding for open access charge: Universidad de Málaga / CBU
CARB-ES-19 Multicenter Study of Carbapenemase-Producing Klebsiella pneumoniae and Escherichia coli From All Spanish Provinces Reveals Interregional Spread of High-Risk Clones Such as ST307/OXA-48 and ST512/KPC-3
ObjectivesCARB-ES-19 is a comprehensive, multicenter, nationwide study integrating whole-genome sequencing (WGS) in the surveillance of carbapenemase-producing K. pneumoniae (CP-Kpn) and E. coli (CP-Eco) to determine their incidence, geographical distribution, phylogeny, and resistance mechanisms in Spain.MethodsIn total, 71 hospitals, representing all 50 Spanish provinces, collected the first 10 isolates per hospital (February to May 2019); CPE isolates were first identified according to EUCAST (meropenem MIC > 0.12 mg/L with immunochromatography, colorimetric tests, carbapenem inactivation, or carbapenem hydrolysis with MALDI-TOF). Prevalence and incidence were calculated according to population denominators. Antibiotic susceptibility testing was performed using the microdilution method (EUCAST). All 403 isolates collected were sequenced for high-resolution single-nucleotide polymorphism (SNP) typing, core genome multilocus sequence typing (cgMLST), and resistome analysis.ResultsIn total, 377 (93.5%) CP-Kpn and 26 (6.5%) CP-Eco isolates were collected from 62 (87.3%) hospitals in 46 (92%) provinces. CP-Kpn was more prevalent in the blood (5.8%, 50/853) than in the urine (1.4%, 201/14,464). The cumulative incidence for both CP-Kpn and CP-Eco was 0.05 per 100 admitted patients. The main carbapenemase genes identified in CP-Kpn were blaOXA–48 (263/377), blaKPC–3 (62/377), blaVIM–1 (28/377), and blaNDM–1 (12/377). All isolates were susceptible to at least two antibiotics. Interregional dissemination of eight high-risk CP-Kpn clones was detected, mainly ST307/OXA-48 (16.4%), ST11/OXA-48 (16.4%), and ST512-ST258/KPC (13.8%). ST512/KPC and ST15/OXA-48 were the most frequent bacteremia-causative clones. The average number of acquired resistance genes was higher in CP-Kpn (7.9) than in CP-Eco (5.5).ConclusionThis study serves as a first step toward WGS integration in the surveillance of carbapenemase-producing Enterobacterales in Spain. We detected important epidemiological changes, including increased CP-Kpn and CP-Eco prevalence and incidence compared to previous studies, wide interregional dissemination, and increased dissemination of high-risk clones, such as ST307/OXA-48 and ST512/KPC-3
MOESM6 of Automated identification of reference genes based on RNA-seq data
Additional file 6. Best candidate RGs for normal and malignant lung samples according to Fig. 6b, ranked by CV. They were obtained with CV < 20% and minimum counted reads of 10,000. Transcript_id: human transcript identifiers in ENSEMBL database