53 research outputs found

    Empowering precision medicine through high performance computing clusters

    Get PDF
    The role of High Performance Computing (HPC) in Medicine is greatly increase in these last years, moving from basic research to the clinics. With the advent of Next Generation Sequencing (NGS) technologies, diverse areas of human health have been investigated through different omics techniques. The extensive use of these NGS platforms to high throughput profile human health issues in a cost-efficient manner, is generating huge amount of sequencing data pushing " (https://www.facebook.com/pages/Oatext/1439466783004774) # $ % (https://www.youtube.com/user/users/oatext) â—‹ â—‹ â—‹ Article Article Info Author Info Figures & Data bioinformatic research in the big-data field. Speed, accuracy and reproducibility of massively sequencing analysis have allowed to transfer molecular biology knowledge into precision medicine. Furthermore, Molecular Dynamics (MD) earned a great importance in aiding genome research. Sequencing studies of cancer have allowed to detect and characterize mutated genes that drive tumorigenesis. As a complementary approach, from a biophysical perspective, MD simulations, executed on HPC architectures, have permitted to investigate the role played by pathological mutations on the molecular mechanism of activation

    Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines

    Get PDF
    Background: Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. Results: In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. Conclusions: We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets

    ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments.

    Get PDF
    Abstract Summary: ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context. Availability: ExpEdit is freely available on the web at http://www.caspur.it/ExpEdit/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    The PMDB Protein Model Database

    Get PDF
    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data

    The MEPS server for identifying protein conformational epitopes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the most interesting problems in molecular immunology is epitope mapping, i.e. the identification of the regions of interaction between an antigen and an antibody. The solution to this problem, even if approximate, would help in designing experiments to precisely map the residues involved in the interaction and could be instrumental both in designing peptides able to mimic the interacting surface of the antigen and in understanding where immunologically important regions are located in its three-dimensional structure. From an experimental point of view, both genetically encoded and chemically synthesised peptide libraries can be used to identify sequences recognized by a given antibody. The problem then arises of which region of a folded protein the selected peptides correspond to.</p> <p>Results</p> <p>We have developed a method able to find the surface region of a protein that can be effectively mimicked by a peptide, given the structure of the protein and the maximum number of side chains deemed to be required for recognition. The method is implemented as a publicly available server. It can also find and report all peptide sequences of a specified length that can mimic the surface of a given protein and store them in a database.</p> <p>The immediate application of the server is the mapping of antibody epitopes, however the system is sufficiently flexible for allowing other questions to be asked, for example one can compare the peptides representing the surface of two proteins known to interact with the same macromolecule to find which is the most likely interacting region.</p> <p>Conclusion</p> <p>We believe that the MEPS server, available at <url>http://www.caspur.it/meps</url>, will be a useful tool for immunologists and structural and computational biologists. We plan to use it ourselves to implement a database of "surface mimicking peptides" for all proteins of known structure and proteins that can be reliably modelled by comparative modelling.</p

    Novel cDNAs encoding salivary proteins from the malaria vector Anopheles gambiae

    Get PDF
    AbstractSeveral genes encoding salivary components of the mosquito Anopheles gambiae were identified using a selective trapping approach. Among these, five corresponded to genes expressed specifically in female glands and their role may possibly be linked to blood-feeding. Our collection included a fourth member of the D7 protein family and two polypeptides that showed weak similarity to anti-coagulants from distantly related species. Moreover, we identified two additional members of a novel group of proteins that we named glandins. The isolation of tissue-specific genes represents a first step toward a deeper molecular analysis of mosquito salivary secretions

    ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization

    Get PDF
    Alternative splicing (AS) is now emerging as a major mechanism contributing to the expansion of the transcriptome and proteome complexity of multicellular organisms. The fact that a single gene locus may give rise to multiple mRNAs and protein isoforms, showing both major and subtle structural variations, is an exceptionally versatile tool in the optimization of the coding capacity of the eukaryotic genome. The huge and continuously increasing number of genome and transcript sequences provides an essential information source for the computational detection of genes AS pattern. However, much of this information is not optimally or comprehensively used in gene annotation by current genome annotation pipelines. We present here a web resource implementing the ASPIC algorithm which we developed previously for the investigation of AS of user submitted genes, based on comparative analysis of available transcript and genome data from a variety of species. The ASPIC web resource provides graphical and tabular views of the splicing patterns of all full-length mRNA isoforms compatible with the detected splice sites of genes under investigation as well as relevant structural and functional annotation. The ASPIC web resource—available at —is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility

    Missense mutations of NCPAG gene affect calving ease in Piedmontese cattle: preliminary evidences

    Get PDF
    A previous genome scan on 323 Piedmontese individuals identified a cluster of 13 SNPs significantly associated with direct calving ease and centred on the three genes LAP3, LCORL and NCAPG in chromosome 6. We investigated missense mutations affecting calving ease in Piedmontese cattle in the identified region using sequences from the whole exome in eight Piedmontese individuals chosen from the extremes of the direct calving ease estimated breeding values distribution for this trait. The present study has not found missense variants in LAP3 and LCORL, while two were identified on NCAPG by three different variant calling methods. Other gene candidates in the same region harbour missense mutations, such as PPM1K, PKD2, SPP1 and MEPE, but both SIFT analysis and chi-square test on frequency of alleles make us hypothesise that NCAPG is the single gene responsible for the trait variation. The two SNPs on NCAPG are in complete linkage disequilibrium in our samples; therefore, further investigations are needed in order to discriminate their role

    REDIportal: millions of novel A-to-I RNA editing events from thousands of RNAseq experiments

    Get PDF
    RNA editing is a relevant epitranscriptome phenomenon able to increase the transcriptome and proteome diversity of eukaryotic organisms. ADAR mediated RNA editing is widespread in humans in which millions of A-to-I changes modify thousands of primary transcripts. RNA editing has pivotal roles in the regulation of gene expression or modulation of the innate immune response or functioning of several neurotransmitter receptors. Massive transcriptome sequencing has fostered the research in this field. Nonetheless, different aspects of the RNA editing biology are still unknown and need to be elucidated. To support the study of A-to-I RNA editing we have updated our REDIportal catalogue raising its content to about 16 millions of events detected in 9642 human RNAseq samples from the GTEx project by using a dedicated pipeline based on the HPC version of the REDItools software. REDIportal now allows searches at sample level, provides overviews of RNA editing profiles per each RNAseq experiment, implements a Gene View module to look at individual events in their genic context and hosts the CLAIRE database. Starting from this novel version, REDIportal will start collecting non-human RNA editing changes for comparative genomics investigations. The database is freely available at http://srv00.recas.ba.infn.it/atlas/index.html

    parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

    Get PDF
    BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF
    • …
    corecore