72 research outputs found

    SEAL: a distributed short read mapping and duplicate removal tool

    Get PDF
    Summary: SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode

    Is right angular gyrus involved in the metric component of the mental body representation in touch and vision? A tdcs study

    Get PDF
    Several studies have found in the sense of touch a good sensory modality by which to study body representation. Here, we address the “metric component of body representation”, a specific function developed to process the discrimination of tactile distances on the body. The literature suggests the involvement of the right angular gyrus (rAG) in processing the tactile metricity on the body. The question of this study is the following: is the rAG also responsible for the visual metric component of body representation? We used tDCS (anodal and sham) in 20 subjects who were administered an on-body distance discrimination task with both tactile and visual stimuli. They were also asked to perform the same task in a near-body condition. The results allow us to confirm the role of rAG in the estimation of tactile distances. Further, we also showed that rAG might be involved in the discrimination of distances on the body not only in tactile but also in visual modality. Finally, based on the significant effects of anodal stimulation even in a near-body visual discrimination task, we proposed a higher-order function of the AG in terms of a supramodal com-parator of quantities

    Cohort profile: the Turin prostate cancer prognostication (TPCP) cohort

    Get PDF
    Introduction: Prostate cancer (PCa) is the most frequent tumor among men in Europe and has both indolent and aggressive forms. There are several treatment options, the choice of which depends on multiple factors. To further improve current prognostication models, we established the Turin Prostate Cancer Prognostication (TPCP) cohort, an Italian retrospective biopsy cohort of patients with PCa and long-term follow-up. This work presents this new cohort with its main characteristics and the distributions of some of its core variables, along with its potential contributions to PCa research. Methods: The TPCP cohort includes consecutive non-metastatic patients with first positive biopsy for PCa performed between 2008 and 2013 at the main hospital in Turin, Italy. The follow-up ended on December 31st 2021. The primary outcome is the occurrence of metastasis; death from PCa and overall mortality are the secondary outcomes. In addition to numerous clinical variables, the study’s prognostic variables include histopathologic information assigned by a centralized uropathology review using a digital pathology software system specialized for the study of PCa, tumor DNA methylation in candidate genes, and features extracted from digitized slide images via Deep Neural Networks. Results: The cohort includes 891 patients followed-up for a median time of 10 years. During this period, 97 patients had progression to metastatic disease and 301 died; of these, 56 died from PCa. In total, 65.3% of the cohort has a Gleason score less than or equal to 3 + 4, and 44.5% has a clinical stage cT1. Consistent with previous studies, age and clinical stage at diagnosis are important prognostic factors: the crude cumulative incidence of metastatic disease during the 14-years of follow-up increases from 9.1% among patients younger than 64 to 16.2% for patients in the age group of 75-84, and from 6.1% for cT1 stage to 27.9% in cT3 stage. Discussion: This study stands to be an important resource for updating existing prognostic models for PCa on an Italian cohort. In addition, the integrated collection of multi-modal data will allow development and/or validation of new models including new histopathological, digital, and molecular markers, with the goal of better directing clinical decisions to manage patients with PCa

    A markov classification model for metabolic pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response.</p> <p>Results</p> <p>We compared the performance of HME3M with logistic regression and support vector machines (SVM) for both simulated pathways and on two metabolic networks, glycolysis and the pentose phosphate pathway for <it>Arabidopsis thaliana</it>. We use AltGenExpress microarray data and focus on the pathway differences in the developmental stages and stress responses of <it>Arabidopsis</it>. The results clearly show that HME3M outperformed the comparison methods in the presence of increasing network complexity and pathway noise. Furthermore an analysis of the paths identified by HME3M for each metabolic network confirmed known biological responses of <it>Arabidopsis</it>.</p> <p>Conclusions</p> <p>This paper clearly shows HME3M to be an accurate and robust method for classifying metabolic pathways. HME3M is shown to outperform all comparison methods and further is capable of identifying known biologically active pathways within microarray data.</p

    PPCAS: Implementation of a Probabilistic Pairwise Model for Consistency-Based Multiple Alignment in Apache Spark

    Get PDF
    Large-scale data processing techniques, currently known as Big-Data, are used to manage the huge amount of data that are generated by sequencers. Although these techniques have significant advantages, few biological applications have adopted them. In the Bioinformatic scientific area, Multiple Sequence Alignment (MSA) tools are widely applied for evolution and phylogenetic analysis, homology and domain structure prediction. Highly-rated MSA tools, such as MAFFT, ProbCons and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage in order to improve the final accuracy. In this paper, a novel approach named PPCAS (Probabilistic Pairwise model for Consistency-based multiple alignment in Apache Spark) is presented. PPCAS is based on the MapReduce processing paradigm in order to enable large datasets to be processed with the aim of improving the performance and scalability of the original algorithm.This work was supported by the MEyC-Spain [contract TIN2014-53234-C2-2-R]

    Molecular-biology-driven treatment for metastatic colorectal cancer

    Get PDF
    Background: Metastatic CRC (mCRC) is a molecular heterogeneous disease. The aim of this review is to give an overview of molecular-driven treatment of mCRC patients. Methods: A review of clinical trials, retrospective studies and case reports was performed regarding molecular biomarkers with therapeutic implications. Results: RAS wild-type status was confirmed as being crucial for anti-epidermal growth factor receptor (EGFR) monoclonal antibodies and for rechallenge strategy. Antiangiogenic therapies improve survival in first- and second-line settings, irrespective of RAS status, while tyrosine kinase inhibitors (TKIs) remain promising in refractory mCRC. Promising results emerged from anti-HER2 drugs trials in HER2-positive mCRC. Target inhibitors were successful for BRAFV600E mutant mCRC patients, while immunotherapy was successful for microsatellite instability-high/defective mismatch repair (MSI-H/dMMR) or DNA polymerase epsilon catalytic subunit (POLE-1) mutant patients. Data are still lacking on NTRK, RET, MGMT, and TGF-β, which require further research. Conclusion: Several molecular biomarkers have been identified for the tailored treatment of mCRC patients and multiple efforts are currently ongoing to increase the therapeutic options. In the era of precision medicine, molecular-biology-driven treatment is the key to impro patient selection and patient outcomes. Further research and large phase III trials are required to ameliorate the therapeutic management of these patients

    Interoperable and scalable data analysis with microservices: applications in metabolomics.

    Get PDF
    Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary data are available at Bioinformatics online

    ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses.</p> <p>Results</p> <p>ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering) and pathways (e.g. pathway hole identification), to search a genome for <it>de novo </it>prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make <it>de novo </it>enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper), and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network) visualization tools can be used to get context information that is complementary to conventional KEGG map representation.</p> <p>Conclusion</p> <p>ComPath is an interactive workbench for pathway reconstruction, annotation, and analysis where experts can perform various sequence, domain, context analysis, using an intuitive and interactive spreadsheet-style interface. </p

    HSRA: Hadoop-based spliced read aligner for RNA sequencing data

    Get PDF
    [Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PXunta de Galicia; ED431G/0
    corecore