Search CORE

170 research outputs found

Refining interaction search through signed iterative Random Forests

Author: Basu Sumanta
Brown James B.
Celniker Susan
Kumbier Karl
Yu Bin
Publication venue
Publication date: 16/10/2018
Field of study

Advances in supervised learning have enabled accurate prediction in biological systems governed by complex interactions among biomolecules. However, state-of-the-art predictive algorithms are typically black-boxes, learning statistical interactions that are difficult to translate into testable hypotheses. The iterative Random Forest algorithm took a step towards bridging this gap by providing a computationally tractable procedure to identify the stable, high-order feature interactions that drive the predictive accuracy of Random Forests (RF). Here we refine the interactions identified by iRF to explicitly map responses as a function of interacting features. Our method, signed iRF, describes subsets of rules that frequently occur on RF decision paths. We refer to these rule subsets as signed interactions. Signed interactions share not only the same set of interacting features but also exhibit similar thresholding behavior, and thus describe a consistent functional relationship between interacting features and responses. We describe stable and predictive importance metrics to rank signed interactions. For each SPIM, we define null importance metrics that characterize its expected behavior under known structure. We evaluate our proposed approach in biologically inspired simulations and two case studies: predicting enhancer activity and spatial gene expression patterns. In the case of enhancer activity, s-iRF recovers one of the few experimentally validated high-order interactions and suggests novel enhancer elements where this interaction may be active. In the case of spatial gene expression patterns, s-iRF recovers all 11 reported links in the gap gene network. By refining the process of interaction recovery, our approach has the potential to guide mechanistic inquiry into systems whose scale and complexity is beyond human comprehension

arXiv.org e-Print Archive

Drosophila by the dozen

Author: Celniker Susan E
Hoskins Roger A
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

A report of the 48th Annual Drosophila Research Conference, Philadelphia, USA, 7-11 March 2007

Crossref

PubMed Central

eScholarship - University of California

UNT Digital Library

Sequence analysis of the cis-regulatory regions of the bithorax complex of Drosophila

Author: Celniker Susan E.
Knafels John D.
Lewis Edward B.
Mathog David R.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 29/08/1995
Field of study

The bithorax complex (BX-C) of Drosophila, one of two complexes that act as master regulators of the body plan of the fly, has now been entirely sequenced and comprises approximate to 315,000 bp, only 1.4% of which codes for protein. Analysis of this sequence reveals significantly overrepresented DNA motifs of unknown, as well as known, functions in the nonprotein-coding portion of the sequence. The following types of motifs in that portion are analyzed: (i) concatamers of mono-, di-, and trinucleotides; (ii) tightly clustered hexanucleotides (spaced less than or equal to 5 bases apart); (iii) direct and reverse repeats longer than 20 bp; and (iv) a number of motifs known from biochemical studies to play a role in the regulation of the BX-C. The hexanucleotide AGATAC is remarkably overrepresented and is surmised to play a role in chromosome pairing. The positions of sites of highly overrepresented motifs are plotted for those that occur at more than five sites in the sequence, when <0.5 case is expected. Expected values are based on a third-order Markov chain, which is the optimal order for representing the BXCALL sequence

Caltech Authors

Recommended from our members

Complete Genome Sequence of the Citrobacter freundii Type Strain.

Author: Booth Benjamin W
Celniker Susan E
Hess Becky M
Neff Michael J
Park Soo
Wan Kenneth H
Publication venue: eScholarship, University of California
Publication date: 01/05/2020
Field of study

Citrobacter freundii is a species of facultative anaerobic Gram-negative bacteria of the family Enterobacteriaceae The complete genome is composed of a single chromosomal circle of 4,957,773 bp with a G+C content of 52%

eScholarship - University of California

Characterization of MtnE, the fifth metallothionein member in Drosophila

Author: Atanesyan Lilit
Celniker Susan
Georgiev Oleg
Günther Viola
Schaffner Walter
Publication venue
Publication date: 18/06/2018
Field of study

Metallothioneins (MTs) constitute a family of cysteine-rich, low molecular weight metal-binding proteins which occur in almost all forms of life. They bind physiological metals, such as zinc and copper, as well as nonessential, toxic heavy metals, such as cadmium, mercury, and silver. MT expression is regulated at the transcriptional level by metal-regulatory transcription factor1 (MTF-1), which binds to the metal-response elements (MREs) in the enhancer/promoter regions of MT genes. Drosophila was thought to have four MT genes, namely, MtnA, MtnB, MtnC, and MtnD. Here we characterize a new fifth member of Drosophila MT gene family, coding for metallothionein E (MtnE). The MtnE transcription unit is located head-to-head with the one of MtnD. The intervening sequence contains four MREs which bind, with different affinities, to MTF-1. Both of the divergently transcribed MT genes are completely dependent on MTF-1, whereby MtnE is consistently more strongly transcribed. MtnE expression is induced in response to heavy metals, notably copper, mercury, and silver, and is upregulated in a genetic background where the other four MTs are missin

RERO DOC Digital Library

KAAS: an automatic genome annotation and pathway reconstruction server

Author: Booth Benjamin W
Celniker Susan E
Hammonds Ann S
Park Soo
Wan Kenneth H
Yu Charles
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith–Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Global analysis of patterns of gene expression during Drosophila embryogenesis

Author: Beaton Amy
Berman Benjamin P
Celniker Susan E
Hartenstein Volker
Kwan Elaine
Rubin Gerald M
Tomancak Pavel
Weiszmann Richard
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Embryonic expression patterns for 6,003 (44%) of the 13,659 protein-coding genes identified in the Drosophila melanogaster genome were documented, of which 40% show tissue-restricted expression

Crossref

PubMed Central

MPG.PuRe

Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura

Author: Berman Benjamin P
Celniker Susan E
Eisen Michael B
Laverty Todd R
Pfeiffer Barret D
Rubin Gerald M
Salzberg Steven L
Publication venue: BioMed Central
Publication date: 20/08/2004
Field of study

BACKGROUND: The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. RESULTS: We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. CONCLUSIONS: Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity

Springer - Publisher Connector

PubMed Central

An Extracellular Interactome of Immunoglobulin and LRR Proteins Reveals Receptor-Ligand Networks

Author: Carrillo Robert A.
Celniker Susan E.
Eastman Catharine L.
Garcia K. Christopher
Johnson Karl G.
Waghray Deepa
Weiszmann Richard
Zinn Kai
Özkan Engin
Publication venue: 'Elsevier BV'
Publication date: 03/07/2013
Field of study

Extracellular domains of cell surface receptors and ligands mediate cell-cell communication, adhesion, and initiation of signaling events, but most existing protein-protein “interactome” data sets lack information for extracellular interactions. We probed interactions between receptor extracellular domains, focusing on a set of 202 proteins composed of the Drosophila melanogaster immunoglobulin superfamily (IgSF), fibronectin type III (FnIII), and leucine-rich repeat (LRR) families, which are known to be important in neuronal and developmental functions. Out of 20,503 candidate protein pairs tested, we observed 106 interactions, 83 of which were previously unknown. We “deorphanized” the 20 member subfamily of defective-in-proboscis-response IgSF proteins, showing that they selectively interact with an 11 member subfamily of previously uncharacterized IgSF proteins. Both subfamilies interact with a single common “orphan” LRR protein. We also observed interactions between Hedgehog and EGFR pathway components. Several of these interactions could be visualized in live-dissected embryos, demonstrating that this approach can identify physiologically relevant receptor-ligand pairs

Elsevier - Publisher Connector

Caltech Authors

Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape

Author: Ann S Hammonds
Broihier HT
Erwin Frise
Hartenstein V
Hartenstein V
Ju T
Kumar S
Peng H
Reuter R
Su MT
Susan E Celniker
Publication venue: Nature Publishing Group
Publication date: 01/01/2010
Field of study

We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes

Crossref

Directory of Open Access Journals

PubMed Central