8 research outputs found

    Explore Biological Pathways from Noisy Array Data by Directed Acyclic Boolean Networks

    Get PDF
    We consider the structure of directed acyclic Boolean (DAB) networks as a tool for exploring biological pathways. In a DAB network, the basic objects are binary elements and their Boolean duals. A DAB is characterized by two kinds of pairwise relations: similarity and prerequisite. The latter is a partial order relation, namely, the on-status of one element is necessary for the on-status of another element. A DAB network is uniquely determined by the state space of its elements. We arrange samples from the state space of a DAB network in a binary array and introduce a random mechanism of measurement error. Our inference strategy consists of two stages. First, we consider each pair of elements and try to identify their most likely relation. In the meantime, we assign a score, s-p-score, to this relation. Second, we rank the s-p-scores obtained from the first stage. We expect that relations with smaller s-p-scores are more likely to be true, and those with larger s-p-scores are more likely to be false. The key idea is the definition of s-scores (referring to similarity), p-scores (referring to prerequisite), and s-p-scores. As with classical statistical tests, control of false negatives and false positives are our primary concerns. We illustrate the method by a simulated example, the classical arginine biosynthetic pathway, and show some exploratory results on a published microarray expression dataset of yeast Saccharomyces cerevisiae obtained from experiments with activation and genetic perturbation of the pheromone response MAPK pathway

    Algorithms for Selecting Informative Marker Panels for Population Assignment

    Full text link
    Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species—carp, cat, chicken, dog, fly, grayling, human, and maize—this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species—dog— producing this level of accuracy with only three markers, and the eighth species—human— requiring ∼13–16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63393/1/cmb.2005.12.1183.pd

    Finding Motifs in Promoter Regions

    Full text link

    J Comput Biol

    No full text
    Physiological concentrations of metabolites can partly be explained by their molecular structure. We hypothesize that substances containing certain chemical groups show increased or decreased concentration in cells. We consider here, as chemical groups, local atomic configurations, describing an atom, its bonds, and its direct neighbor atoms. To test our hypothesis, we fitted a linear statistical model that relates experimentally determined logarithmic concentrations to feature vectors containing count numbers of the chemical groups. In order to determine chemical groups that have a clear effect on the concentration, we use a regularized (lasso) regression. In a dataset on 41 substances of central metabolism in different organisms, we found that the physical concentrations are increased by the occurrence of amino and hydroxyl groups, while aldehydes, ketones, and phosphates show decreased concentrations. The model explains about 22% of the variance of the logarithmic mean concentrations
    corecore