170 research outputs found
Refining interaction search through signed iterative Random Forests
Advances in supervised learning have enabled accurate prediction in
biological systems governed by complex interactions among biomolecules.
However, state-of-the-art predictive algorithms are typically black-boxes,
learning statistical interactions that are difficult to translate into testable
hypotheses. The iterative Random Forest algorithm took a step towards bridging
this gap by providing a computationally tractable procedure to identify the
stable, high-order feature interactions that drive the predictive accuracy of
Random Forests (RF). Here we refine the interactions identified by iRF to
explicitly map responses as a function of interacting features. Our method,
signed iRF, describes subsets of rules that frequently occur on RF decision
paths. We refer to these rule subsets as signed interactions. Signed
interactions share not only the same set of interacting features but also
exhibit similar thresholding behavior, and thus describe a consistent
functional relationship between interacting features and responses. We describe
stable and predictive importance metrics to rank signed interactions. For each
SPIM, we define null importance metrics that characterize its expected behavior
under known structure. We evaluate our proposed approach in biologically
inspired simulations and two case studies: predicting enhancer activity and
spatial gene expression patterns. In the case of enhancer activity, s-iRF
recovers one of the few experimentally validated high-order interactions and
suggests novel enhancer elements where this interaction may be active. In the
case of spatial gene expression patterns, s-iRF recovers all 11 reported links
in the gap gene network. By refining the process of interaction recovery, our
approach has the potential to guide mechanistic inquiry into systems whose
scale and complexity is beyond human comprehension
Drosophila by the dozen
A report of the 48th Annual Drosophila Research Conference, Philadelphia, USA, 7-11 March 2007
Sequence analysis of the cis-regulatory regions of the bithorax complex of Drosophila
The bithorax complex (BX-C) of Drosophila, one of two complexes that act as master regulators of the body plan of the fly, has now been entirely sequenced and comprises approximate to 315,000 bp, only 1.4% of which codes for protein. Analysis of this sequence reveals significantly overrepresented DNA motifs of unknown, as well as known, functions in the nonprotein-coding portion of the sequence. The following types of motifs in that portion are analyzed: (i) concatamers of mono-, di-, and trinucleotides; (ii) tightly clustered hexanucleotides (spaced less than or equal to 5 bases apart); (iii) direct and reverse repeats longer than 20 bp; and (iv) a number of motifs known from biochemical studies to play a role in the regulation of the BX-C. The hexanucleotide AGATAC is remarkably overrepresented and is surmised to play a role in chromosome pairing. The positions of sites of highly overrepresented motifs are plotted for those that occur at more than five sites in the sequence, when <0.5 case is expected. Expected values are based on a third-order Markov chain, which is the optimal order for representing the BXCALL sequence
Recommended from our members
Complete Genome Sequence of the Citrobacter freundii Type Strain.
Citrobacter freundii is a species of facultative anaerobic Gram-negative bacteria of the family Enterobacteriaceae The complete genome is composed of a single chromosomal circle of 4,957,773 bp with a G+C content of 52%
Characterization of MtnE, the fifth metallothionein member in Drosophila
Metallothioneins (MTs) constitute a family of cysteine-rich, low molecular weight metal-binding proteins which occur in almost all forms of life. They bind physiological metals, such as zinc and copper, as well as nonessential, toxic heavy metals, such as cadmium, mercury, and silver. MT expression is regulated at the transcriptional level by metal-regulatory transcription factor1 (MTF-1), which binds to the metal-response elements (MREs) in the enhancer/promoter regions of MT genes. Drosophila was thought to have four MT genes, namely, MtnA, MtnB, MtnC, and MtnD. Here we characterize a new fifth member of Drosophila MT gene family, coding for metallothionein E (MtnE). The MtnE transcription unit is located head-to-head with the one of MtnD. The intervening sequence contains four MREs which bind, with different affinities, to MTF-1. Both of the divergently transcribed MT genes are completely dependent on MTF-1, whereby MtnE is consistently more strongly transcribed. MtnE expression is induced in response to heavy metals, notably copper, mercury, and silver, and is upregulated in a genetic background where the other four MTs are missin
KAAS: an automatic genome annotation and pathway reconstruction server
The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith–Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database
Global analysis of patterns of gene expression during Drosophila embryogenesis
Embryonic expression patterns for 6,003 (44%) of the 13,659 protein-coding genes identified in the Drosophila melanogaster genome were documented, of which 40% show tissue-restricted expression
Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura
BACKGROUND: The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. RESULTS: We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. CONCLUSIONS: Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity
An Extracellular Interactome of Immunoglobulin and LRR Proteins Reveals Receptor-Ligand Networks
Extracellular domains of cell surface receptors and ligands mediate cell-cell communication, adhesion, and initiation of signaling events, but most existing protein-protein “interactome” data sets lack information for extracellular interactions. We probed interactions between receptor extracellular domains, focusing on a set of 202 proteins composed of the Drosophila melanogaster immunoglobulin superfamily (IgSF), fibronectin type III (FnIII), and leucine-rich repeat (LRR) families, which are known to be important in neuronal and developmental functions. Out of 20,503 candidate protein pairs tested, we observed 106 interactions, 83 of which were previously unknown. We “deorphanized” the 20 member subfamily of defective-in-proboscis-response IgSF proteins, showing that they selectively interact with an 11 member subfamily of previously uncharacterized IgSF proteins. Both subfamilies interact with a single common “orphan” LRR protein. We also observed interactions between Hedgehog and EGFR pathway components. Several of these interactions could be visualized in live-dissected embryos, demonstrating that this approach can identify physiologically relevant receptor-ligand pairs
Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape
We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes
- …