37 research outputs found

    A fuzzy method for RNA-Seq differential expression analysis in presence of multireads

    Get PDF
    Background: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. Results: We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. Conclusions: The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools

    Explaining Ovarian Cancer Gene Expression Profiles with Fuzzy Rules and Genetic Algorithms

    Get PDF
    The analysis of gene expression data is a complex task, and many tools and pipelines are available to handle big sequencing datasets for case-control (bivariate) studies. In some cases, such as pilot or exploratory studies, the researcher needs to compare more than two groups of samples consisting of a few replicates. Both standard statistical bioinformatic pipelines and innovative deep learning models are unsuitable for extracting interpretable patterns and information from such datasets. In this work, we apply a combination of fuzzy rule systems and genetic algorithms to analyze a dataset composed of 21 samples and 6 classes, useful for approaching the study of expression profiles in ovarian cancer, compared to other ovarian diseases. The proposed method is capable of performing a feature selection among genes that is guided by the genetic algorithm, and of building a set of if-then rules that explain how classes can be distinguished by observing changes in the expression of selected genes. After testing several parameters, the final model consists of 10 genes involved in the molecular pathways of cancer and 10 rules that correctly classify all samples

    Interactome-Seq: A Protocol for Domainome Library Construction, Validation and Selection by Phage Display and Next Generation Sequencing

    Get PDF
    Folding reporters are proteins with easily identifiable phenotypes, such as antibiotic resistance, whose folding and function is compromised when fused to poorly folding proteins or random open reading frames. We have developed a strategy where, by using TEM-1 \u3b2-lactamase (the enzyme conferring ampicillin resistance) on a genomic scale, we can select collections of correctly folded protein domains from the coding portion of the DNA of any intronless genome. The protein fragments obtained by this approach, the so called "domainome", will be well expressed and soluble, making them suitable for structural/functional studies. By cloning and displaying the "domainome" directly in a phage display system, we have showed that it is possible to select specific protein domains with the desired binding properties (e.g., to other proteins or to antibodies), thus providing essential experimental information for gene annotation or antigen identification. The identification of the most enriched clones in a selected polyclonal population can be achieved by using novel next-generation sequencing technologies (NGS). For these reasons, we introduce deep sequencing analysis of the library itself and the selection outputs to provide complete information on diversity, abundance and precise mapping of each of the selected fragment. The protocols presented here show the key steps for library construction, characterization, and validation

    BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at <url>http://beat.ba.itb.cnr.it</url>.</p> <p>Results</p> <p>BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza.</p> <p>To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a multivariate AS analysis.</p> <p>Conclusions</p> <p>Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysis tools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendly platform for a comprehensive study of AS events in human diseases, displaying the analysis results with easily interpretable and interactive tables and graphics.</p

    InteractomeSeq: a web server for the identification and profiling of domains and epitopes from phage display and next generation sequencing data

    Get PDF
    High-Throughput Sequencing technologies are transforming many research fields, including the analysis of phage display libraries. The phage display technology coupled with deep sequencing was introduced more than a decade ago and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones. However, from a bioinformatics point of view, the analysis of this kind of data was always performed by adapting tools designed for other purposes, thus not considering the noise background typical of the 'interactome sequencing' approach and the heterogeneity of the data. InteractomeSeq is a web server allowing data analysis of protein domains ('domainome') or epitopes ('epitome') from either Eukaryotic or Prokaryotic genomic phage libraries generated and selected by following an Interactome sequencing approach. InteractomeSeq allows users to upload raw sequencing data and to obtain an accurate characterization of domainome/epitome profiles after setting the parameters required to tune the analysis. The release of this tool is relevant for the scientific and clinical community, because InteractomeSeq will fill an existing gap in the field of large-scale biomarkers profiling, reverse vaccinology, and structural/functional studies, thus contributing essential information for gene annotation or antigen identification. InteractomeSeq is freely available at https://InteractomeSeq.ba.itb.cnr.it/

    Dysregulation of MicroRNAs and Target Genes Networks in Peripheral Blood of Patients With Sporadic Amyotrophic Lateral Sclerosis

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a progressive and fatal neurodegenerative disease. While genetics and other factors contribute to ALS pathogenesis, critical knowledge is still missing and validated biomarkers for monitoring the disease activity have not yet been identified. To address those aspects we carried out this study with the primary aim of identifying possible miRNAs/mRNAs dysregulation associated with the sporadic form of the disease (sALS). Additionally, we explored miRNAs as modulating factors of the observed clinical features. Study included 56 sALS and 20 healthy controls (HCs). We analyzed the peripheral blood samples of sALS patients and HCs with a high-throughput next-generation sequencing followed by an integrated bioinformatics/biostatistics analysis. Results showed that 38 miRNAs (let-7a-5p, let-7d-5p, let-7f-5p, let-7g-5p, let-7i-5p, miR-103a-3p, miR-106b-3p, miR-128-3p, miR-130a-3p, miR-130b-3p, miR-144-5p, miR-148a-3p, miR-148b-3p, miR-15a-5p, miR-15b-5p, miR-151a-5p, miR-151b, miR-16-5p, miR-182-5p, miR-183-5p, miR-186-5p, miR-22-3p, miR-221-3p, miR-223-3p, miR-23a-3p, miR-26a-5p, miR-26b-5p, miR-27b-3p, miR-28-3p, miR-30b-5p, miR-30c-5p, miR-342-3p, miR-425-5p, miR-451a, miR-532-5p, miR-550a-3p, miR-584-5p, miR-93-5p) were significantly downregulated in sALS. We also found that different miRNAs profiles characterized the bulbar/spinal onset and the progression rate. This observation supports the hypothesis that miRNAs may impact the phenotypic expression of the disease. Genes known to be associated with ALS (e.g., PARK7, C9orf72, ALS2, MATR3, SPG11, ATXN2) were confirmed to be dysregulated in our study. We also identified other potential candidate genes like LGALS3 (implicated in neuroinflammation) and PRKCD (activated in mitochondrial-induced apoptosis). Some of the downregulated genes are involved in molecular bindings to ions (i.e., metals, zinc, magnesium) and in ions-related functions. The genes that we found upregulated were involved in the immune response, oxidation–reduction, and apoptosis. These findings may have important implication for the monitoring, e.g., of sALS progression and therefore represent a significant advance in the elucidation of the disease’s underlying molecular mechanisms. The extensive multidisciplinary approach we applied in this study was critically important for its success, especially in complex disorders such as sALS, wherein access to genetic background is a major limitation

    IPSC‐based modeling of THD recapitulates disease phenotypes and reveals neuronal malformation

    Get PDF
    Tyrosine hydroxylase deficiency (THD) is a rare genetic disorder leading to dopaminergic depletion and early-onset Parkinsonism. Affected children present with either a severe form that does not respond to L-Dopa treatment (THD-B) or a milder L-Dopa responsive form (THD-A). We generated induced pluripotent stem cells (iPSCs) from THD patients that were differentiated into dopaminergic neurons (DAn) and compared with control-DAn from healthy individuals and gene-corrected isogenic controls. Consistent with patients, THD iPSC-DAn displayed lower levels of DA metabolites and reduced TH expression, when compared to controls. Moreover, THD iPSC-DAn showed abnormal morphology, including reduced total neurite length and neurite arborization defects, which were not evident in DAn differentiated from control-iPSC. Treatment of THD-iPSC-DAn with L-Dopa rescued the neuronal defects and disease phenotype only in THDA-DAn. Interestingly, L-Dopa treatment at the stage of neuronal precursors could prevent the alterations in THDB-iPSC-DAn, thus suggesting the existence of a critical developmental window in THD. Our iPSC-based model recapitulates THD disease phenotypes and response to treatment, representing a promising tool for investigating pathogenic mechanisms, drug screening, and personalized management

    21_1.21-22.mi_and_pl.bam

    No full text
    Alignment of smallRNA data on Yeast genome</p
    corecore