52 research outputs found

    Methodology for the inference of gene function from phenotype data.

    Get PDF
    BackgroundBiomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.ResultsWe have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.ConclusionsWe show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. BMC Bioinformatics 2014; 15:405

    Quantifying the local adaptive landscape of a nascent bacterial community

    No full text
    Fitness landscapes largely shape the dynamics of evolution, but it is unclear how they shift upon ecological diversification. By engineering genome-wide knockout libraries of a nascent bacterial community, Ascensao et al. show how ecological and epistatic patterns combine to shape adaptive landscapes

    A Hidden Markov Model with continuous hidden and observed states (a Kalman filter) for inferring genetic drift and measurement noise from lineage frequency time series.

    No full text
    (a) Illustration of how genetic drift and measurement noise affect the observed frequency time series. Muller plot of lineage frequencies from Wright-Fisher simulations with effective population size 500 and 5000, with and without measurement noise. In simulations with measurement noise, 100 sequences were sampled per week with the measurement noise overdispersion parameter ct = 5 (parameter defined in text). All simulations were initialized with 50 lineages at equal frequency. A lower effective population size leads to larger frequency fluctuations whose variances add over time, whereas measurement noise leads to increased frequency fluctuations whose variances do not add over time. (b) Schematic of Hidden Markov Model describing frequency trajectories. ft is the true frequency at time t (hidden states) and is the observed frequency at time t (observed states). The inferred parameters are , the effective population size scaled by the generation time, and ct, the overdispersion in measurement noise (ct = 1 corresponds to uniform sampling of sequences from the population). (c-f) Validation of method using Wright-Fisher simulations of frequency trajectories with time-varying effective population size and measurement noise. (c) Simulated number of sequences. (d) Simulated lineage frequency trajectories. (e) Inferred scaled effective population size () on simulated data compared to true values. (f) Inferred measurement noise (ct) on simulated data compared to true values. In (e) the shaded region shows the 95% confidence interval calculated using the posterior, and in (f) the shaded region shows the 95% confidence interval calculated using bootstrapping (see Methods).</p

    The inferred effective population size when cutting the tree at different depths to test the effect of combining lineages with other more closely related lineages in forming the coarse-grained lineages.

    No full text
    The inferred effective population size when cutting the tree at different depths to test the effect of combining lineages with other more closely related lineages in forming the coarse-grained lineages.</p

    Inferred effective population size in regions of England.

    No full text
    (Top panels) Inferred of pre-B.1.177 lineages, B.1.177, Alpha, and Delta for each region of England. The inferred for England as a whole is shown for reference. Shaded regions show 95% confidence intervals (see Methods). (Bottom panels) The ratio between the inferred of England and that of the region for each variant. A horizontal dashed line indicates a ratio of 1 (i.e. is the same in that region of England and England as a whole). Shared regions show the minimum and maximum possible values of the ratio from the combined error intervals of the numerator and denominator (thus, not corresponding to a specific confidence interval range). (PDF)</p

    Inferred scaled effective population size compared to the SIR model scaled population size calculated using the observed number of positive individuals in England (see Methods).

    No full text
    Inferred scaled effective population size compared to the SIR model scaled population size calculated using the observed number of positive individuals in England (see Methods).</p

    Potential mechanisms that can generate a low effective population size.

    No full text
    (a) Superspreading, where the distribution of the number of secondary cases (Z) from a single infected individual is broadly distributed (variance greater than mean). The superspreading individuals are indicated in blue. (b) Deme structure without superspreading, due to heterogeneity in the host network structure, where the distribution of the number of secondary cases is not broadly distributed (variance approximately equal to mean). (c) The ratio between the (the scaled population size calculated from an SIR model using the number of observed positive individuals and the observed effective reproduction number) and the inferred for each variant. Only data where the error in the SIR model is less than 3 times the value are shown, because larger error bars make it challenging to interpret the results. The inferred is lower than the (which assumes well-mixed dynamics and no superspreading) by a factor of 16 to 589, indicating high levels of genetic drift. The variance in offspring number from the literature does not entirely explain the discrepancy between the true and effective population sizes. (d) Simulations of deme structure without superspreading can generate high levels of genetic drift via jackpot events. SEIR dynamics are simulated within demes (with Rt = 10, i.e. deterministic transmission) and Poisson transmission is simulated between demes (Rt ≪ 1, i.e. stochastic transmission) such that the population Rt ∼ 1 (see Methods). Simulation parameters are: mean transition rate from exposed to infected γE = (2.5 days)−1, mean transition rate from infected to recovered γI = (6.5 days)−1, total number of demes Dtotal = 5.6 × 105. The ratio between the number of infected individuals and the inferred effective population size is found to scale linearly with the deme size and not with the number of infected demes. This scaling results because of jackpot events where a lineage that happens to infect a susceptible deme grows rapidly until all susceptible individuals in the deme are infected.</p
    • …
    corecore