Search CORE

16 research outputs found

Discriminative Topological Features Reveal Biological Network Mechanisms

Author: Adams Carter
Chen Linda
Hom Jen
Koytcheff Robin
Levovitz Chaya
Middendorf Manuel
Wiggins Chris
Woods Gregory
Ziv Etay
Publication venue
Publication date: 01/01/2004
Field of study

Recent genomic and bioinformatic advances have motivated the development of numerous random network models purporting to describe graphs of biological, technological, and sociological origin. The success of a model has been evaluated by how well it reproduces a few key features of the real-world data, such as degree distributions, mean geodesic lengths, and clustering coefficients. Often pairs of models can reproduce these features with indistinguishable fidelity despite being generated by vastly different mechanisms. In such cases, these few target features are insufficient to distinguish which of the different models best describes real world networks of interest; moreover, it is not clear a priori that any of the presently-existing algorithms for network generation offers a predictive description of the networks inspiring them. To derive discriminative classifiers, we construct a mapping from the set of all graphs to a high-dimensional (in principle infinite-dimensional) ``word space.'' This map defines an input space for classification schemes which allow us for the first time to state unambiguously which models are most descriptive of the networks they purport to describe. Our training sets include networks generated from 17 models either drawn from the literature or introduced in this work, source code for which is freely available. We anticipate that this new approach to network analysis will be of broad impact to a number of communities.Comment: supplemental website: http://www.columbia.edu/itc/applied/wiggins/netclass

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Geoseq: a tool for dissecting deep-sequencing datasets

Author: Cancio Anthony
George Ajish
Gurtowski James
Homann Robert
Levovitz Chaya
Sachidanandam Ravi
Shah Hardik
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Gurtowski J, Cancio A, Shah H, et al. Geoseq: a tool for dissecting deep-sequencing datasets. BMC Bioinformatics. 2010;11(1): 506.Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool

Crossref

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA.

Author: Chaya Levovitz
Claudia Haferlach
Constance Baer
Filippo Utro
Kahn Rhrissorrakrai
Laxmi Parida
Manja Meggendorfer
Niroshan Nadarajah
Stephan Hutter
Sven Twardziok
Torsten Haferlach
Wencke Walter
Wolfgang Kern
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/08/2019
Field of study

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention

Directory of Open Access Journals

Discriminative topological features reveal biological network mechanisms-0

Author: Carter Adams (5345)
Chaya Levovitz (5348)
Chris Wiggins (5351)
Etay Ziv (5344)
Gregory Woods (5349)
Jen Hom (5346)
Linda Chen (5350)
Manuel Middendorf (5343)
Robin Koytcheff (5347)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Discriminative topological features reveal biological network mechanisms"BMC Bioinformatics 2004;5():181-181.Published online 22 Nov 2004PMCID:PMC535926.Copyright © 2004 Middendorf et al; licensee BioMed Central Ltd.and the Grindrod [17] model. is robustly classified as a Middendorf-Ziv network. The Grindrod model is the runner-up. We here show data for a word that especially the Middendorf-Ziv model over the Grindrod model. The histograms of the word over the training data are shown along with their associated densities calculated from the data by Gaussian kernel density estimation. The densities give the following log--values at the word value for the network: log() = -376, log() = -6.23

The Francis Crick Institute

Discriminative topological features reveal biological network mechanisms-1

Author: Carter Adams (5345)
Chaya Levovitz (5348)
Chris Wiggins (5351)
Etay Ziv (5344)
Gregory Woods (5349)
Jen Hom (5346)
Linda Chen (5350)
Manuel Middendorf (5343)
Robin Koytcheff (5347)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Discriminative topological features reveal biological network mechanisms"BMC Bioinformatics 2004;5():181-181.Published online 22 Nov 2004PMCID:PMC535926.Copyright © 2004 Middendorf et al; licensee BioMed Central Ltd.rapivsky-Bianconi [18, 14] model. is robustly classified as a Kumar network. The Krapivsky-Bianconi model is the runner-up. We here show data for a word that especially the Kumar model over the Krapivsky-Bianconi model. The histograms of the word over the training data are shown along with their associated densities calculated from the data by Gaussian kernel density estimation. The densities give the following log--values at the word value for the network: log() = -4.22, log() = -12.0

The Francis Crick Institute

TGF

Author: Andrew G. Sikora
Bhairavabhotla
Chaya Levovitz
Dan Chen
Diaz-Sanchez
Emma Ivansson
Eric E. Schadt
Eric M. Genden
French
Holzer
Iancu
John P. Finnigan
Liu
Marshal R. Posner
Paolo Boffetta
Perez-Vera
Sara Alshawish
Ulf Gyllensten
Ushijima
Weijia Zhang
Publication venue: 'American Association for Cancer Research (AACR)'
Publication date
Field of study

Crossref