Search CORE

129 research outputs found

Choosing negative examples for the prediction of protein-protein interactions

Author: Ben-Hur Asa
Noble William Stafford
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions

Crossref

Springer - Publisher Connector

PubMed Central

Probabilistic analysis of a differential equation for linear programming

Author: Ben-Hur Asa
Feinberg Joshua
Fishman Shmuel
Siegelmann Hava T.
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we address the complexity of solving linear programming problems with a set of differential equations that converge to a fixed point that represents the optimal solution. Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the distribution of the convergence rate to the attracting fixed point. Using the framework of Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic limit of large problem size. In this limit, we find that the distribution of the convergence rate is a scaling function, namely it is a function of one variable that is a combination of three parameters: the number of variables, the number of constraints and the convergence rate, rather than a function of these parameters separately. We also estimate numerically the distribution of computation times, namely the time required to reach a vicinity of the attracting fixed point, and find that it is also a scaling function. Using the problem size dependence of the distribution functions, we derive high probability bounds on the convergence rates and on the computation times.Comment: 1+37 pages, latex, 5 eps figures. Version accepted for publication in the Journal of Complexity. Changes made: Presentation reorganized for clarity, expanded discussion of measure of complexity in the non-asymptotic regime (added a new section

arXiv.org e-Print Archive

Elsevier - Publisher Connector

ScholarWorks@UMass Amherst

Amino acid composition predicts prion activity

Author: Ben-Hur Asa
Minhas Fayyaz ul Amir Afsar
Ross Eric D.
Publication venue: Public Library of Science
Publication date: 01/04/2017
Field of study

Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136

Crossref

Directory of Open Access Journals

Warwick Research Archives Portal Repository

FigShare

Amino acid composition predicts prion activity

Author: Ben-Hur Asa
Minhas Fayyaz ul Amir Afsar
Ross Eric D.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2017
Field of study

Directory of Open Access Journals

Warwick Research Archives Portal Repository

SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data

Author: Ben-Hur Asa
Reddy Anireddy SN
Rogers Mark F
Thomas Julie
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

We propose a method for predicting splice graphs that enhances curated gene models using evidence from RNA-Seq and EST alignments. Results obtained using RNA-Seq experiments in Arabidopsis thaliana show that predictions made by our SpliceGrapher method are more consistent with current gene models than predictions made by TAU and Cufflinks. Furthermore, analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing. SpliceGrapher is available for download at http://SpliceGrapher.sf.net

Springer - Publisher Connector

PubMed Central

Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii

Author: Ben-Hur Asa
Labadorf Adam
Link Alicia
Reddy Anireddy SN
Rogers Mark F
Thomas Julie
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome-wide computational analysis of alternative splicing (AS) in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. <it>Chlamydomonas</it>, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in <it>Chlamydomonas reinhardtii </it>using its recently completed genome sequence and all available ESTs and cDNAs. Results Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at <url>http://combi.cs.colostate.edu/as/chlamy</url>. Conclusions The extent of AS in <it>Chlamydomonas </it>that we observed is much smaller than observed in land plants, but is much higher than in simple unicellular heterotrophic eukaryotes. The percentage of different alternative splicing events is similar to flowering plants. Prevalence of constitutive and alternative splicing in <it>Chlamydomonas</it>, together with its simplicity, many available public resources, and well developed genetic and molecular tools for this organism make it an excellent model system to elucidate the mechanisms involved in regulated splicing in photosynthetic eukaryotes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deciphering the Plant Splicing Code: Experimental and Computational Approaches for Predicting Alternative Splicing and Splicing Regulatory Elements

Author: Ben-Hur Asa
Hamilton Michael
Reddy Anireddy S. N.
Richardson Dale N.
Rogers Mark F.
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2012
Field of study

Extensive alternative splicing (AS) of precursor mRNAs (pre-mRNAs) in multicellular eukaryotes increases the protein-coding capacity of a genome and allows novel ways to regulate gene expression. In flowering plants, up to 48% of intron-containing genes exhibit AS. However, the full extent of AS in plants is not yet known, as only a few high-throughput RNA-Seq studies have been performed. As the cost of obtaining RNA-Seq reads continues to fall, it is anticipated that huge amounts of plant sequence data will accumulate and help in obtaining a more complete picture of AS in plants. Although it is not an onerous task to obtain hundreds of millions of reads using high-throughput sequencing technologies, computational tools to accurately predict and visualize AS are still being developed and refined. This review will discuss the tools to predict and visualize transcriptome-wide AS in plants using short-reads and highlight their limitations. Comparative studies of AS events between plants and animals have revealed that there are major differences in the most prevalent types of AS events, suggesting that plants and animals differ in the way they recognize exons and introns. Extensive studies have been performed in animals to identify cis-elements involved in regulating AS, especially in exon skipping. However, few such studies have been carried out in plants. Here, we review the current state of research on splicing regulatory elements (SREs) and briefly discuss emerging experimental and computational tools to identify cis-elements involved in regulation of AS in plants. The availability of curated alternative splice forms in plants makes it possible to use computational tools to predict SREs involved in AS regulation, which can then be verified experimentally. Such studies will permit identification of plant-specific features involved in AS regulation and contribute to deciphering the splicing code in plants

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Probabilistic analysis of the phase space flow for linear programming

Author: Abrahams
Anderson
Anderson
Asa Ben-Hur
Ben-Hur
Ben-Hur
Ben-Hur
Blum
Bohigas
Bohigas
Branicky
Bray
Brockett
Brody
Cicuta
Faybusovich
Faybusovich
Feinberg
Gardner
Gilpin
Hava T. Siegelmann
Helmke
Hertz
Hogg
Joshua Feinberg
May
McMurtrie
Mead
Mehta
Ott
Pan
Papadimitriou
Roberts
Shamir
Shmuel Fishman
Siegelmann
Siegelmann
Smale
Smale
Todd
Todd
Wilson
Ye
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

The phase space flow of a dynamical system leading to the solution of Linear Programming (LP) problems is explored as an example of complexity analysis in an analog computation framework. An ensemble of LP problems with

n

variables and

m

constraints (

n>m

), where all elements of the vectors and matrices are normally distributed is studied. The convergence time of a flow to the fixed point representing the optimal solution is computed. The cumulative distribution

{\cal F}^{(n,m)}(\Delta)

of the convergence rate

\Delta_{min}

to this point is calculated analytically, in the asymptotic limit of large

(n,m)

, in the framework of Random Matrix Theory. In this limit

{\cal F}^{(n,m)}(\Delta)

is found to be a scaling function, namely it is a function of one variable that is a combination of

n

m

and

\Delta

rather then a function of these three variables separately. From numerical simulations also the distribution of the computation times is calculated and found to be a scaling function as well.Comment: 8 pages, latex, 2 eps figures; final published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarWorks@UMass Amherst