155 research outputs found

    A variational piecewise smooth model for identification of chromosomal imbalances in cancer

    Get PDF
    Monitoring of changes at the DNA level enables the characterization of the underlying structure of genetic diseases. In particular, copy number alterations (CNAs) are increasingly being recognized as an important component of genetic variations in cancer: oncogenes may be enhanced by DNA amplification and tumor suppressor genes may be inactivated by physical deletion. Encouraged by the advent of array comparative genomic hybridization technology, several biological studies have been designed to look for chromosomal aberrations involved in cancer. Hence, the development of algorithms aimed at the identification of CNAs is a current challenge in bioinformatics. Despite the amount of proposed approaches, identification of CNAs is yet an open problem. Here we propose a new approach for detection of CNAs that extends a previously published algorithm where a popular image segmentation variational model was used. The proposed algorithm, called Vega Multi-Channel (VegaMC), starts from the assumption that copy number profiles are piecewise constant and finds the optimal segmentation by minimizing a functional energy that represents a compromise between accuracy and parsimony of the boundaries. We applied VegaMC on a published gastrointestinal stromal tumor aCGH dataset, showing the ability of the proposed approach in the identification of well-known cytogenetic mutations, and eventually discover new ones

    IRIS: a method for reverse engineering of regulatory relations in gene networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ultimate aim of systems biology is to understand and describe how molecular components interact to manifest collective behaviour that is the sum of the single parts. Building a network of molecular interactions is the basic step in modelling a complex entity such as the cell. Even if gene-gene interactions only partially describe real networks because of post-transcriptional modifications and protein regulation, using microarray technology it is possible to combine measurements for thousands of genes into a single analysis step that provides a picture of the cell's gene expression. Several databases provide information about known molecular interactions and various methods have been developed to infer gene networks from expression data. However, network topology alone is not enough to perform simulations and predictions of how a molecular system will respond to perturbations. Rules for interactions among the single parts are needed for a complete definition of the network behaviour. Another interesting question is how to integrate information carried by the network topology, which can be derived from the literature, with large-scale experimental data.</p> <p>Results</p> <p>Here we propose an algorithm, called inference of regulatory interaction schema (IRIS), that uses an iterative approach to map gene expression profile values (both steady-state and time-course) into discrete states and a simple probabilistic method to infer the regulatory functions of the network. These interaction rules are integrated into a factor graph model. We test IRIS on two synthetic networks to determine its accuracy and compare it to other methods. We also apply IRIS to gene expression microarray data for the <it>Saccharomyces cerevisiae </it>cell cycle and for human B-cells and compare the results to literature findings.</p> <p>Conclusions</p> <p>IRIS is a rapid and efficient tool for the inference of regulatory relations in gene networks. A topological description of the network and a matrix of gene expression profiles are required as input to the algorithm. IRIS maps gene expression data onto discrete values and then computes regulatory functions as conditional probability tables. The suitability of the method is demonstrated for synthetic data and microarray data. The resulting network can also be embedded in a factor graph model.</p

    VegaMC: a R/bioconductor package for fast downstream analysis of large array comparative genomic hybridization datasets

    Get PDF
    Abstract Summary: Identification of genetic alterations of tumor cells has become a common method to detect the genes involved in development and progression of cancer. In order to detect driver genes, several samples need to be simultaneously analyzed. The Cancer Genome Atlas (TCGA) project provides access to a large amount of data for several cancer types. TGCA is an invaluable source of information, but analysis of this huge dataset possess important computational problems in terms of memory and execution times. Here, we present a R/package, called VegaMC (Vega multi-channel), that enables fast and efficient detection of significant recurrent copy number alterations in very large datasets. VegaMC is integrated with the output of the common tools that convert allele signal intensities in log R ratio and B allele frequency. It also enables the detection of loss of heterozigosity and provides in output two web pages allowing a rapid and easy navigation of the aberrant genes. Synthetic data and real datasets are used for quantitative and qualitative evaluation purposes. In particular, we demonstrate the ability of VegaMC on two large TGCA datasets: colon adenocarcinoma and glioblastoma multiforme. For both the datasets, we provide the list of aberrant genes which contain previously validated genes and can be used as basis for further investigations. Availability: VegaMC is a R/Bioconductor Package, available at http://bioconductor.org/packages/release/bioc/html/VegaMC.html. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online

    Finding recurrent copy number alterations preserving within-sample homogeneity

    Get PDF
    Abstract Motivation: Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published. Results: We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study. Availability: GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia Contact: [email protected]; [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online

    TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of main aims of Molecular Biology is the gain of knowledge about how molecular components interact each other and to understand gene function regulations. Using microarray technology, it is possible to extract measurements of thousands of genes into a single analysis step having a picture of the cell gene expression. Several methods have been developed to infer gene networks from steady-state data, much less literature is produced about time-course data, so the development of algorithms to infer gene networks from time-series measurements is a current challenge into bioinformatics research area. In order to detect dependencies between genes at different time delays, we propose an approach to infer gene regulatory networks from time-series measurements starting from a well known algorithm based on information theory.</p> <p>Results</p> <p>In this paper we show how the ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) algorithm can be used for gene regulatory network inference in the case of time-course expression profiles. The resulting method is called TimeDelay-ARACNE. It just tries to extract dependencies between two genes at different time delays, providing a measure of these dependencies in terms of mutual information. The basic idea of the proposed algorithm is to detect time-delayed dependencies between the expression profiles by assuming as underlying probabilistic model a stationary Markov Random Field. Less informative dependencies are filtered out using an auto calculated threshold, retaining most reliable connections. TimeDelay-ARACNE can infer small local networks of time regulated gene-gene interactions detecting their versus and also discovering cyclic interactions also when only a medium-small number of measurements are available. We test the algorithm both on synthetic networks and on microarray expression profiles. Microarray measurements concern <it>S. cerevisiae </it>cell cycle, <it>E. coli </it>SOS pathways and a recently developed network for in vivo assessment of reverse engineering algorithms. Our results are compared with ARACNE itself and with the ones of two previously published algorithms: Dynamic Bayesian Networks and systems of ODEs, showing that TimeDelay-ARACNE has good accuracy, recall and <it>F</it>-score for the network reconstruction task.</p> <p>Conclusions</p> <p>Here we report the adaptation of the ARACNE algorithm to infer gene regulatory networks from time-course data, so that, the resulting network is represented as a directed graph. The proposed algorithm is expected to be useful in reconstruction of small biological directed networks from time course data.</p

    Short inverted repeats contribute to localized mutability in human somatic cells.

    Get PDF
    Selected repetitive sequences termed short inverted repeats (SIRs) have the propensity to form secondary DNA structures called hairpins. SIRs comprise palindromic arm sequences separated by short spacer sequences that form the hairpin stem and loop respectively. Here, we show that SIRs confer an increase in localized mutability in breast cancer, which is domain-dependent with the greatest mutability observed within spacer sequences (∼1.35-fold above background). Mutability is influenced by factors that increase the likelihood of formation of hairpins such as loop lengths (of 4-5 bp) and stem lengths (of 7-15 bp). Increased mutability is an intrinsic property of SIRs as evidenced by how almost all mutational processes demonstrate a higher rate of mutagenesis of spacer sequences. We further identified 88 spacer sequences showing enrichment from 1.8- to 90-fold of local mutability distributed across 283 sites in the genome that intriguingly, can be used to inform the biological status of a tumor

    VEGAWES: variational segmentation on whole exome sequencing for copy number detection

    Get PDF
    Background Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. Results We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. Conclusions In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome

    GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.

    Get PDF
    Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community

    The genome as a record of environmental exposure.

    Get PDF
    Whole genome sequencing of human tumours has revealed distinct patterns of mutation that hint at the causative origins of cancer. Experimental investigations of the mutations and mutation spectra induced by environmental mutagens have traditionally focused on single genes. With the advent of faster cheaper sequencing platforms, it is now possible to assess mutation spectra in experimental models across the whole genome. As a proof of principle, we have examined the whole genome mutation profiles of mouse embryo fibroblasts immortalised following exposure to benzo[a]pyrene (BaP), ultraviolet light (UV) and aristolochic acid (AA). The results reveal that each mutagen induces a characteristic mutation signature: predominantly G→T mutations for BaP, C→T and CC→TT for UV and A→T for AA. The data are not only consistent with existing knowledge but also provide additional information at higher levels of genomic organisation. The approach holds promise for identifying agents responsible for mutations in human tumours and for shedding light on the aetiology of human cancer

    Visualization of Genomic Changes by Segmented Smoothing Using an L0 Penalty

    Get PDF
    Copy number variations (CNV) and allelic imbalance in tumor tissue can show strong segmentation. Their graphical presentation can be enhanced by appropriate smoothing. Existing signal and scatterplot smoothers do not respect segmentation well. We present novel algorithms that use a penalty on the norm of differences of neighboring values. Visualization is our main goal, but we compare classification performance to that of VEGA
    • …
    corecore