4,184 research outputs found

    ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems

    Get PDF
    [Abstract]: Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.This work was supported by the Ministry of Economy, Industry and Competitiveness of Spain and FEDER funds of the European Union [grant TIN2016-75845-P (AEI/FEDER/UE)], as well as by Xunta de Galicia (Centro Singular de Investigacion de Galicia accreditation 2016-2019, ref. EDG431G/01).Xunta de Galicia; EDG431G/0

    Accelerating binary biclustering on platforms with CUDA-enabled GPUs

    Get PDF
    © 2018 Elsevier B.V. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Information Sciences. The Version of Record is available online at https://doi.org/10.1016/j.ins.2018.05.025This is a version of: J. González-Domínguez and R. R. Expósito, "Accelerating binary biclustering on platforms with CUDA-enabled GPUs", Information Sciences, Vol. 496, Sept. 2019, pp. 317-325, https://doi.org/10.1016/j.ins.2018.05.025[Abstract]: Data mining is nowadays essential in many scientific fields to extract valuable information from large input datasets and transform it into an understandable structure. For instance, biclustering techniques are very useful in identifying subsets of two-dimensional data where both rows and columns are correlated. However, some biclustering techniques have become extremely time-consuming when processing very large datasets, which nowadays prevents their use in many areas of research and industry (such as bioinformatics) that have experienced an explosive growth on the amount of available data. In this work we present CUBiBit, a tool that accelerates the search for relevant biclusters on binary data by exploiting the computational capabilities of CUDA-enabled GPUs as well as the several CPU cores available in most current systems. The experimental evaluation has shown that CUBiBit is up to 116 times faster than the fastest state-of-the-art tool, BiBit, in a system with two Intel Sandy Bridge processors (16 CPU cores) and three NVIDIA K20 GPUs. CUBiBit is publicly available to download from https://sourceforge.net/projects/cubibitThis work was supported by the Ministry of Economy, Industry and Competitiveness of Spain and FEDER funds of the European Union [grant TIN2016-75845-P (AEI/FEDER/UE)], as well as by Xunta de Galicia (Centro Singular de Investigacion de Galicia accreditation 2016-2019, ref. EDG431G/01).Xunta de Galicia; EDG431G/0

    MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud

    Get PDF
    This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record Roberto R. Expósito, Jorge Veiga, Jorge González-Domínguez, Juan Touriño; MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud, Bioinformatics, Volume 33, Issue 17, 1 September 2017, Pages 2762–2764 is available online at: https://doi.org/10.1093/bioinformatics/btx307[Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool.Ministerio de Economia y Competitividad; TIN2016-75845-PMinisterio de Educación; FPU014/0280

    SMusket: Spark-based DNA error correction on distributed-memory systems

    Get PDF
    ©2020 Elsevier B.V. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Future Generation Computer Systems. The Version of Record is available online at https://doi.org/10.1016/j.future.2019.10.038This is the accepted version of: R. R. Expósito, J. González-Domínguez, and J. Touriño, "SMusket: Sparkbased DNA error correction on distributed-memory systems", Future Generation Computer Systems, vol. 111, pp. 698-713, 2020, https://doi.org/10.1016/j.future.2019.10.038[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction in NGS datasets is considered an important preprocessing step in many workflows as sequencing errors can severely affect the quality of downstream analysis. Although current error correction approaches provide reasonably high accuracies, their computational cost can be still unacceptable when processing large datasets. In this paper we propose SparkMusket (SMusket), a Big Data tool built upon the open-source Apache Spark cluster computing framework to boost the performance of Musket, one of the most widely adopted and top-performing multithreaded correctors. Our tool efficiently exploits Spark features to implement a scalable error correction algorithm intended for distributed-memory systems built using commodity hardware. The experimental evaluation on a 16-node cluster using four publicly available datasets has shown that SMusket is up to 15.3 times faster than previous state-of-the-art MPI-based tools, also providing a maximum speedup of 29.8 over its multithreaded counterpart. SMusket is publicly available under an open-source license at https://github.com/rreye/smusketThis work was supported by the Ministry of Economy, Industry and Competitiveness of Spain and FEDER, Spain funds of the European Union (project TIN2016-75845-P, AEI/FEDER/EU); and by Xunta de Galicia, Spain (projects ED431G/01 and ED431C 2017/04).Xunta de galicia; ED431G/01Xunta de Galicia; ED431C 2017/0

    Fast local search for fuzzy job shop scheduling

    Get PDF
    In the sequel, we propose a new neighbourhood structure for local search for the fuzzy job shop scheduling problem. This is a variant of the well-known job shop problem, with uncertainty in task durations modelled using fuzzy numbers and where the goal is to minimise the expected makespan of the resulting schedule. The new neighbourhood structure is based in changing the relative order of subsequences of tasks within critical blocks. We study its theoretical properties and provide a makespan estimate which allows to select only feasible neighbours while covering a greater portion of the search space than a previous neighbourhood from the literature. Despite its larger search domain, experimental results show that this new structure notably reduces the computational load of local search with respect to the previous neighbourhood while maintaining or even improving solution quality

    Colored semi-transparent Cu-Si oxide thin films prepared by magnetron sputtering

    Get PDF
    Colored semi-transparent Cu-Si oxide thin films have been prepared by reactive magnetron sputtering from a single cathode of copper-silicon composition. Thin films of different composition and optical response were obtained by changing process parameters like the relative amount of copper in the target and the O2/Ar mixture of the reactive plasma gas. The film characteristics were analyzed by several techniques. Their optical properties (refractive index, absorption coefficient, color) have been correlated with the process parameters used in the film preparation as well as with the film stoichiometry and chemistry.Ministerio de Ciencia e Innovación CEN-20072014, MAT2010-18447, MAT2010-21228, CSD2008–00023Junta de Andalucía P09-TEP5283, CTS-518

    SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets

    Get PDF
    [Abstract] This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) that can be applied to DNA/RNA reads in FASTQ/FASTA formats to improve subsequent downstream analyses, while providing a simple and user-friendly graphical interface for non-expert users. Furthermore, SeQual takes full advantage of Big Data technologies to process massive datasets on distributed-memory systems such as clusters by relying on the open-source Apache Spark cluster computing framework. Our scalable Spark-based implementation allows to reduce the runtime from more than three hours to less than 20 minutes when processing a paired-end dataset with 251 million reads per input file on an 8-node multi-core cluster.10.13039/501100004837-Ministry of Science and Innovation of Spain (Grant Number: TIN2016-75845-P and PID2019-104184RB-I00) 10.13039/501100004837-AEI/FEDER/EU (Grant Number: 10.13039/501100011033) 10.13039/501100010801-Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019–2022 and the Consolidation Program of Competitive Reference Groups) (Grant Number: ED431G 2019/01 and ED431C 2017/04)Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2017/0

    CUDA-JMI: Acceleration of feature selection on heterogeneous systems

    Get PDF
    ©2019 Elsevier B.V. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Future Generation Computer Systems. The Version of Record is available online at https://doi.org/10.1016/j.future.2019.08.031Versión final aceptada de: J. González-Domínguez, R. R. Expósito, and V. Bolón-Canedo, "CUDA-JMI: Acceleration of feature selection on heterogeneous systemss", Future Generation Computer Systems, Vol. 102, pp. 426-436, Jan. 2020, https://doi.org/10.1016/j.future.2019.08.031[Abstract]: Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have focused on developing new methods that increase the global relevance of the subset of selected features while reducing the redundancy of information. However, those methods that select features with high relevance and low redundancy are extremely time-consuming when processing large datasets. In this work we present CUDA-JMI, a tool based on Joint Mutual Information that accelerates feature selection by exploiting the computational capabilities of modern heterogeneous systems that contain several CPU cores and GPU devices. The experimental evaluation has been carried out in three systems with different type and amount of CPUs and GPUs using five publicly available datasets from different fields. These results show that CUDA-JMI is significantly faster than its original sequential counterpart for all systems and input datasets. For instance, the runtime of CUDA-JMI is up to 52 times faster than an existing sequential JMI-based implementation in a machine with 24 CPU cores and two NVIDIA M60 boards (four GPUs). CUDA-JMI is publicly available to download from https://sourceforge.net/projects/cuda-jmiThis research has been partially funded by projects TIN2016-75845-P and TIN-2015-65069-C2-1-R of the Ministry of Economy, Industry and Competitiveness of Spain, as well as by Xunta de Galicia, Spain projects ED431D R2016/045, ED431G/01 and GRC2014/035, all of them partially funded by FEDER, Spain funds of the European Union.Xunta de Galicia; ED431D R2016/045Xunta de Galicia; ED431G/01Xunta de Galicia; GRC2014/03

    Effects of Post-Fire Deadwood Management on Soil Macroarthropod Communities

    Get PDF
    Dead wood comprises a vast amount of biological legacies that set the scene for ecological regeneration after wildfires, yet its removal is the most frequent management strategy worldwide. Soil-dwelling organisms are conspicuous, and they provide essential ecosystem functions, but their possible affection by different post-fire management strategies has so far been neglected. We analyzed the abundance, richness, and composition of belowground macroarthropod communities under two contrasting dead-wood management regimes after a large wildfire in the Sierra Nevada Natural and National Park (Southeast Spain). Two plots at different elevation were established, each containing three replicates of two experimental treatments: partial cut, where trees were cut and their branches lopped off and left over the ground, and salvage logging, where all the trees were cut, logs were piled, branches were mechanically masticated, and slash was spread on the ground. Ten years after the application of the treatments, soil cores were extracted from two types of microhabitat created by these treatments: bare-soil (in both treatments) and under-logs (in the partial cut treatment only). Soil macroarthropod assemblages were dominated by Hemiptera and Hymenoptera (mostly ants) and were more abundant and richer in the lowest plot. The differences between dead-wood treatments were most evident at the scale of management interventions: abundance and richness were lowest after salvage logging, even under similar microhabitats (bare-soil). However, there were no significant differences between microhabitat types on abundance and richness within the partial cut treatment. Higher abundance and richness in the partial cut treatment likely resulted from higher resource availability and higher plant diversity after natural regeneration.This study was supported by Project 10/2005 from the Organismo Autónomo de Parques Nacionales (Spanish Government), CGL2008–01671 from the Spanish Ministerio de Ciencia e Innovación, and P12-RNM-2705 from Junta de Andalucía. Programa Nacional de incentivo a investigadores (PRONII) and Programa de vinculación de científicos y tecnólogos from Comisión Nacional de Ciencia y Tecnología (Paraguay). A.B.L. acknowledges a postdoctoral grant from the Alexander von Humboldt Foundation. C.R.M-G. had a Ph.D. grant from the National University of Asunción (Paraguay) and Carolina Foundation (Spain)

    Ansiedad pre-competitiva y conductas de autocontrol en jugadores de futbol

    Get PDF
    Partiendo de la conceptualización de la ansiedad dentro de la teoría de rasgo-estado, se estudia la ansiedad precompetitiva de un equipo de fútbol de tercera división. Durante cinco partidos, se aplicó el cuestionario CSAI-2 momentos antes de las competiciones, registrándose la ansiedad cognitiva, somática y autoconfianza precompetitiva de los jugadores. Los partidos se filmaron y posteriormente se estudiaron las conductas no reglamentarias tales como agresiones físicas, protestas al árbitro, protestas a los compañeros de juego, respuestas dadas ante una agresión, tarjetas amarillas, faltas señaladas al jugador y errores garrafales cometidos en el área. De esta forma, se intenta predecir si los jugadores con altas puntuaciones en el cuestionario CSAI-2 mostrarían conductas agresivas durante las competiciones. Los resultados del estudio no muestran correlaciones estadísticamente significativas entre las puntuaciones en CSAI-2 y la aparición de las conductas antideportivas. Finalmente, se discuten diferentes aspectos de la capacidad predictiva del cuestionario de MartensUsing the State-Trait Anxiety Theory, a study about pre competitive anxiety in a 3rd division soccer team players is presented. During five soccer games, the CSAI-2 was individually administrated, a few minutes before starting, and the scores of cognitive, somatic and self-reliance anxiety were taken. All of the soccer games were video taped and then several impartial observers transcribed illicit or/and erroneous sport behaviors during the games: aggression, complains to the referee, serious sport mistakes, penalties, etc. In this analysis we try to assess if the score in a pre competitive anxiety inventory predicts some of these behaviors during the competition. No statistically significant correlation between CSAI-2 scores and sport illicit behaviors were foun
    corecore