Search CORE

58,556 research outputs found

Maximal Extraction of Biological Information from Genetic Interaction Data

Author: AH Tong
AM Dudley
BL Drees
D Segre
David J. Galas
DJ Galas
DR Shook
Gregory W. Carter
GW Carter
H Sinha
HD Madhani
JM Gancedo
Joel S. Bader
KB Lengeler
KD Entian
L Avery
LM Steinmetz
LV Zhang
M Ashburner
M Li
M Schuldiner
O Carlborg
PD Grunwald
R Kelley
R Milo
RJ Taylor
RP Onge
S Jana
SR Collins
T Ideker
Timothy Galitski
W Zhong
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Targeted genetic perturbation is a powerful tool for inferring gene function in model organisms. Functional relationships between genes can be inferred by observing the effects of multiple genetic perturbations in a single strain. The study of these relationships, generally referred to as genetic interactions, is a classic technique for ordering genes in pathways, thereby revealing genetic organization and gene-to-gene information flow. Genetic interaction screens are now being carried out in high-throughput experiments involving tens or hundreds of genes. These data sets have the potential to reveal genetic organization on a large scale, and require computational techniques that best reveal this organization. In this paper, we use a complexity metric based in information theory to determine the maximally informative network given a set of genetic interaction data. We find that networks with high complexity scores yield the most biological information in terms of (i) specific associations between genes and biological functions, and (ii) mapping modules of co-functional genes. This information-based approach is an automated, unsupervised classification of the biological rules underlying observed genetic interactions. It might have particular potential in genetic studies in which interactions are complex and prior gene annotation data are sparse

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

PubMed Central

Coding limits on the number of transcription factors

Author: Alon Uri
Itzkovitz Shalev
Tlusty Tsvi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarWorks@UNIST

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Chi-square-based scoring function for categorization of MEDLINE citations

Author: Hristovski Dimitar
Kastrin Andrej
Peterlin Borut
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2010
Field of study

Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

Author: Patil Ashwini
Srihari Sriganesh
Wong Limsoon
Yong Chern Han
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

arXiv.org e-Print Archive

Elsevier - Publisher Connector

University of Queensland eSpace

Virus isolation studies suggest short-term variations in abundance in natural cyanophage populations of the Indian Ocean

Author: Clokie Martha R. J.
Mann Nicholas H.
Mehta Jaytry Y.
Millard Andrew D.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 10/04/2006
Field of study

Cyanophage abundance has been shown to fluctuate over long timescales and with depth, but little is known about how it varies over short timescales. Previous short-term studies have relied on counting total virus numbers and therefore the phages which infect cyanobacteria cannot be distinguished from the total count. In this study, an isolation-based approach was used to determine cyanophage abundance from water samples collected over a depth profile for a 24 h period from the Indian Ocean. Samples were used to infect Synechococcus sp. WH7803 and the number of plaque forming units (pfu) at each time point and depth were counted. At 10 m phage numbers were similar for most time-points, but there was a distinct peak in abundance at 0100 hours. Phage numbers were lower at 25 m and 50 m and did not show such strong temporal variation. No phages were found below this depth. Therefore, we conclude that only the abundance of phages in surface waters showed a clear temporal pattern over a short timescale. Fifty phages from a range of depths and time points were isolated and purified. The molecular diversity of these phages was estimated using a section of the phage-encoded psbD gene and the results from a phylogenetic analysis do not suggest that phages from the deeper waters form a distinct subgroup

Crossref

Warwick Research Archives Portal Repository

Combinatorial CRISPR-Cas9 screens for de novo mapping of genetic interactions.

Author: A Baryshnikova
A Baryshnikova
A DeLean
Aaron N Chang
AE Briner
Alex N Beckett
Alex Thomas
AM Kabadi
Amanda Birmingham
Ana Bojorquez-Gomez
ASL Wong
Assen Roguev
B Vogelstein
C Laufer
Chih-Chung Kuo
CJ Lord
Dan Du
Daniel Pekin
Dongxin Zhao
F Lori
H Xu
J Shi
Jason F Kreisberg
JD Storey
Jens Luebeck
JG Doench
John Paul Shen
Katherine Licon
Kristin Klepper
KS Pollard
Kyle Salinas Sanchez
L Avery
LA Gilbert
Lei Qi
M Costanzo
M Martin
MC Bassik
MS Neshat
Nathan E Lewis
NE Sanjana
Nevan Krogan
NJ Krogan
P Mali
P Mali
Prashant Mali
R Chari
R Mani
R Srivas
Roman Sasik
S Bandyopadhyay
SR Collins
T Hart
T Horn
Trey Ideker
VG Tusher
Y Benjamini
Y Dang
Publication venue: eScholarship, University of California
Publication date: 01/06/2017
Field of study

We developed a systematic approach to map human genetic networks by combinatorial CRISPR-Cas9 perturbations coupled to robust analysis of growth kinetics. We targeted all pairs of 73 cancer genes with dual guide RNAs in three cell lines, comprising 141,912 tests of interaction. Numerous therapeutically relevant interactions were identified, and these patterns replicated with combinatorial drugs at 75% precision. From these results, we anticipate that cellular context will be critical to synthetic-lethal therapies

Crossref

eScholarship - University of California