17 research outputs found

    Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes

    Get PDF
    Demographic events shape a population's genetic diversity, a process described by the coalescent-with-recombination model that relates demography and genetics by an unobserved sequence of genealogies along the genome. As the space of genealogies over genomes is large and complex, inference under this model is challenging. Formulating the coalescent-with-recombination model as a continuous-time and -space Markov jump process, we develop a particle filter for such processes, and use waypoints that under appropriate conditions allow the problem to be reduced to the discrete-time case. To improve inference, we generalise the Auxiliary Particle Filter for discrete-time models, and use Variational Bayes to model the uncertainty in parameter estimates for rare events, avoiding biases seen with Expectation Maximization. Using real and simulated genomes, we show that past population sizes can be accurately inferred over a larger range of epochs than was previously possible, opening the possibility of jointly analyzing multiple genomes under complex demographic models. Code is available at https://github.com/luntergroup/smcsmc.

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Stochastic tree models and probabilistic modelling of gene trees of given species networks

    Get PDF
    In the pre-genomic era, the relationships among species and their evolutionary histories were often determined by examining the fossil records. In the genomic era, these relationships are identified by analysing the genetic data, which also enables us to take a close-up view of the differences between the individual samples. Nevertheless, these relationships are often described by a tree-like structure or a network. In this thesis, we investigate some of the models that are used to describe these relationships. This thesis can be divided into two main parts. The first part focuses on investigating the theoretical properties of several neutral tree models that are often considered in phylogenetics and population genetics studies, such as the Yule–Harding model, the proportional to distinguishable arrangements and the Kingman coalescent models. In comparison to the first part, the other half of the thesis is more computationally oriented: we focus on developing and implementing methods of calculating gene tree probabilities of given species networks, and simulating genealogies within species networks

    Bioactive lipid lysophosphatidic acid species are associated with disease progression in idiopathic pulmonary fibrosis

    No full text
    Idiopathic pulmonary fibrosis (IPF) is a progressive disease with significant mortality. Prognostic biomarkers to identify rapid progressors are urgently needed to improve patient management. Since the lysophosphatidic acid (LPA) pathway has been implicated in lung fibrosis in preclinical models and identified as a potential therapeutic target, we aimed to investigate if bioactive lipid LPA species could be prognostic biomarkers that predict IPF disease progression. LPAs and lipidomics were measured in baseline placebo plasma of a randomized IPF-controlled trial. The association of lipids with disease progression indices were assessed using statistical models. Compared to healthy, IPF patients had significantly higher levels of five LPAs (LPA16:0, 16:1, 18:1, 18:2, 20:4) and reduced levels of two triglycerides species (TAG48:4-FA12:0, -FA18:2) (false discovery rate 2). Patients with higher levels of LPAs had greater declines in diffusion capacity of carbon monoxide over 52 weeks (P < 0.01); additionally, LPA20:4-high (≥median) patients had earlier time to exacerbation compared to LPA20:4-low (<median) patients (hazard ratio (95% CI)): 5.71 (1.17–27.72) (P = 0.031). Higher baseline LPAs were associated with greater increases in fibrosis in lower lungs as quantified by high-resolution computed tomography at week 72 (P < 0.05). Some of these LPAs were positively associated with biomarkers of profibrotic macrophages (CCL17, CCL18, OPN, and YKL40) and lung epithelial damage (SPD and sRAGE) (P < 0.05). In summary, our study established the association of LPAs with IPF disease progression, further supporting the role of the LPA pathway in IPF pathobiology
    corecore