240 research outputs found

    RUbioSeq+: A multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data

    Full text link
    This is the peer reviewed version of the following article: Computer Methods and Programs in Biomedine 138 (2016): 73-81, which has been published in final form at http://dx.doi.org/10.1016/j.cmpb.2016.10.008Background and objective To facilitate routine analysis and to improve the reproducibility of the results, next-generation sequencing (NGS) analysis requires intuitive, efficient and integrated data processing pipelines. Methods We have selected well-established software to construct a suite of automated and parallelized workflows to analyse NGS data for DNA-seq (single-nucleotide variants (SNVs) and indels), CNA-seq, bisulfite-seq and ChIP-seq experiments. Results Here, we present RUbioSeq+, an updated and extended version of RUbioSeq, a multiplatform application that incorporates a suite of automated and parallelized workflows to analyse NGS data. This new version includes: (i) an interactive graphical user interface (GUI) that facilitates its use by both biomedical researchers and bioinformaticians, (ii) a new pipeline for ChIP-seq experiments, (iii) pair-wise comparisons (case–control analyses) for DNA-seq experiments, (iv) and improvements in the parallelized and multithreaded execution options. Results generated by our software have been experimentally validated and accepted for publication. Conclusions RUbioSeq+ is free and open to all users at http://rubioseq.bioinfo.cnio.es/.M.R-C is funded by the BLUEPRINT Consortium (FP7/ 2007-2013) under grant agreement number 282510. J.M.F is funded by the INB Node 2 - CNIO, a member of Proteored - PRB2-ISCIII and is supported by grant PT13/0001, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. H.L-F is funded by a postdoctoral fellowship from the Xunta de Galicia. F.F-R and D.G-P are funded by the European Union's Seventh Framework Programme FP7/REGPOT 2012 2013.1 under grant agreement n° 316265 (BIOCAPS) and the "Platform of integration of intelligent techniques for analysis of biomedical information" project (TIN2013-47153-C3-3-R) financed by the Spanish Ministry of Economy and Competitiveness C.FT is funded by the "Spanish National Youth Guarantee Implementation Plan” (2013/2016) financed by the Spanish Ministry of Economy and Competitivenes

    A Survey paper on Cloud Environment for Backup and Data Storage

    Get PDF
    The use of the disks of the nodes of a cluster as worldwide stockpiling framework is a reasonable answer for a cloud situation. The requirement for the accessible of data from anyplace is expanding; this speaks to an issue for some clients who use applications, for example, databases, media, individual document, records, and so forth. The I/O information requests of these applications get higher as they get bigger. So as to enhance execution of these applications can utilize parallel document frameworks. PVFS2 is a free parallel record framework grew by a multi-organization group of parallel I/O, systems administration and capacity specialists. In this overview of the configuration of an execution for cloud environment for ready to store and move down information by utilizing remote servers that can be gotten to through the Internet. The execution expects to expand the accessibility of information and lessen in loss of data. DOI: 10.17762/ijritcc2321-8169.16047

    Real time web-based toolbox for computer vision

    Get PDF
    The last few years have been strongly marked by the presence of multimedia data (images and videos) in our everyday lives. These data are characterized by a fast frequency of creation and sharing since images and videos can come from different devices such as cameras, smartphones or drones. The latter are generally used to illustrate objects in different situations (airports, hospitals, public areas, sport games, etc.). As result, image and video processing algorithms have got increasing importance for several computer vision applications such as motion tracking, event detection and recognition, multimedia indexation and medical computer-aided diagnosis methods. In this paper, we propose a real time cloud-based toolbox (platform) for computer vision applications. This platform integrates a toolbox of image and video processing algorithms that can be run in real time and in a secure way. The related libraries and hardware drivers are automatically integrated and configured in order to offer to users an access to the different algorithms without the need to download, install and configure software or hardware. Moreover, the platform offers the access to the integrated applications from multiple users thanks to the use of Docker (Merkel, 2014) containers and images. Experimentations were conducted within three kinds of algorithms: 1. image processing toolbox. 2. Video processing toolbox. 3. 3D medical methods such as computer-aided diagnosis for scoliosis and osteoporosis.  These experimentations demonstrated the interest of our platform for sharing our scientific contributions related to computer vision domain. The scientific researchers could be able to develop and share easily their applications fastly and in a safe way

    I/O Burst Prediction for HPC Clusters using Darshan Logs

    Full text link
    Understanding cluster-wide I/O patterns of large-scale HPC clusters is essential to minimize the occurrence and impact of I/O interference. Yet, most previous work in this area focused on monitoring and predicting task and node-level I/O burst events. This paper analyzes Darshan reports from three supercomputers to extract system-level read and write I/O rates in five minutes intervals. We observe significant (over 100x) fluctuations in read and write I/O rates in all three clusters. We then train machine learning models to estimate the occurrence of system-level I/O bursts 5 - 120 minutes ahead. Evaluation results show that we can predict I/O bursts with more than 90% accuracy (F-1 score) five minutes ahead and more than 87% accuracy two hours ahead. We also show that the ML models attain more than 70% accuracy when estimating the degree of the I/O burst. We believe that high-accuracy predictions of I/O bursts can be used in multiple ways, such as postponing delay-tolerant I/O operations (e.g., checkpointing), pausing nonessential applications (e.g., file system scrubbers), and devising I/O-aware job scheduling methods. To validate this claim, we simulated a burst-aware job scheduler that can postpone the start time of applications to avoid I/O bursts. We show that the burst-aware job scheduling can lead to an up to 5x decrease in application runtime.Comment: 10 pages, 11 figures, 2 table

    Enabling portable I/O analysis of commercially sensitive HPC applications through workload replication

    Get PDF
    Benchmarking and analyzing I/O performance across high performance computing (HPC) platforms is necessary to identify performance bottlenecks and guide effective use of new and existing storage systems. Doing this with large production applications, which can often be commercially sensitive and lack portability, is not a straightforward task and the availability of a representative proxy for I/O workloads can help to provide a solution. We use Darshan I/O characterization and the MACSio proxy application to replicate five production workloads, showing how these can be used effectively to investigate I/O performance when migrating between HPC systems ranging from small local clusters to leadership scale machines. Preliminary results indicate that it is possible to generate datasets that match the target application with a good degree of accuracy. This enables a predictive performance analysis study of a representative workload to be conducted on five different systems. The results of this analysis are used to identify how workloads exhibit different I/O footprints on a file system and what effect file system configuration can have on performance
    • …
    corecore