42 research outputs found

    Novel Algorithms and Methodology to Help Unravel Secrets that Next Generation Sequencing Data Can Tell

    Get PDF
    The genome of an organism is its complete set of DNA nucleotides, spanning all of its genes and also of its non-coding regions. It contains most of the information necessary to build and maintain an organism. It is therefore no surprise that sequencing the genome provides an invaluable tool for the scientific study of an organism. Via the inference of an evolutionary (phylogenetic) tree, DNA sequences can be used to reconstruct the evolutionary history of a set of species. DNA sequences, or genotype data, has also proven useful for predicting an organisms’ phenotype (i. e. observed traits) from its genotype. This is the objective of association studies. While methods for finding the DNA sequence of an organism have existed for decades, the recent advent of Next Generation Sequencing (NGS) has meant that the availability of such data has increased to such an extent that the computational challenges that now form an integral part of biological studies can no longer be ignored. By focusing on phylogenetics and Genome-Wide Association Studies (GWAS), this thesis aims to help address some of these challenges. As a consequence this thesis is in two parts with the first one centring on phylogenetics and the second one on GWAS. In the first part, we present theoretical insights for reconstructing phylogenetic trees from incomplete distances. This problem is important in the context of NGS data as incomplete pairwise distances between organisms occur frequently with such input and ignoring taxa for which information is missing can introduce undesirable bias. In the second part we focus on the problem of inferring population stratification between individuals in a dataset due to reproductive isolation. While powerful methods for doing this have been proposed in the literature, they tend to struggle when faced with the sheer volume of data that comes with NGS. To help address this problem we introduce the novel PSIKO software and show that it scales very well when dealing with large NGS datasets

    A Novel and Fast Approach for Population Structure Inference Using Kernel-PCA and Optimization (PSIKO)

    Get PDF
    Population structure is a confounding factor in Genome Wide Association Studies, increasing the rate of false positive associations. In order to correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by Next Generation Sequencing (NGS) techniques. To address this, non-model based approaches such as SNMF and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel non-model based approach, PSIKO, which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state of the art methods such as SNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko

    PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS

    Get PDF
    Genome-Wide Association Studies are an invaluable tool for identifying genotypic loci linked with agriculturally important traits or certain diseases. The signal on which such studies rely upon can however be obscured by population stratification making it necessary to account for it in some way. Population stratification is dependent on when admixture happend and thus can occur at various levels. To aid in its inference at the genome-level, we recently introduced PSIKO and comparison with leading methods indicate that it has attractive properties. However uptil now it could not be used for local ancestry inference (LAI) which is preferable in cases of recent admixture as the genome level tends to be too coarse to properly account for processes acting on small segments of a genome.To also bring the powerful ideas underpinning PSIKO to bear in such studies, we extended it to PSIKO2 which we introduce here. Availability: Source code, binaries, and user manual are freely available at \url{https://www.uea.ac.uk/computing/psiko}. Contact: [email protected], [email protected]

    OSF-Builder: A new tool for constructing and representing evolutionary histories involving introgression

    Get PDF
    Introgression is an evolutionary process which provides an important source of innovation for evolution. Although various methods have been used to detect introgression, very few methods are currently available for constructing evolutionary histories involving introgression. In this paper we propose a new method for constructing such evolutionary histories whose starting point is a species forest (consisting of a collection of lineage trees, usually arising as a collection of clades or monophyletic groups in a species tree), and a gene tree for a specific allele of interest, or allele tree for short. Our method is based on representing introgression in terms of a certain 'overlay' of the allele tree over the lineage trees, called an overlaid species forest (OSF). OSFs are similar to phylogenetic networks although a key difference is that they typically have multiple roots because each monophyletic group in the species tree has a different point of origin. Employing a new model for introgression, we derive an efficient algorithm for building OSFs called OSF-Builder that is guaranteed to return an optimal OSF in the sense that the number of potential introgression events is minimized. As well as using simulations to assess the performance of OSF-Builder, we illustrate its use on a butterfly dataset in which introgression has been previously inferred. The OSF-Builder software is available for download from https://www.uea.ac.uk/computing/software/OSF-Builde

    Apparent close approaches between near-Earth asteroids and quasars. Precise astrometry and frame linking

    No full text
    Reproduced with permission. Copyright ESO. Article published by EDP Sciences and available at www.aanda.org.International audienceAims. We investigate the link between the International Celestial Reference Frame (ICRF) and the dynamical reference frame realized by the ephemerides of the Solar System bodies. Methods. We propose a procedure that implies a selection of events for asteroids with accurately determined orbits crossing the CCD field containing selected quasars. Using a Bulirsch-Stoer numerical integrator, we constructed 8-years (2010-2018) ephemerides for a set of 836 numbered near-Earth asteroids (NEAs). We searched for close encounters (within a typical field of view of groundbased telescopes) between our selected set of asteroids and quasars with high-accuracy astrometric positions extracted from the Large Quasars Astrometric Catalog (LQAC). Results. In the designated period (2010-2018), we found a number of 2924, 14 257, and 6972 close approaches (within 10') between asteroids with a minimum solar elongation value of 60◦and quasars from the ICRF-Ext2, the Very Large Baseline Array Calibrator Survey (VLBA-CS), and the Very Large Array (VLA), respectively. This large number of close encounters provides the observational basis needed to investigate the link between the dynamical reference frame and the ICRF

    Lassoing and corraling rooted phylogenetic trees

    Full text link
    The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information which raises the following fundamental question. For what subset of its leaf set can we reconstruct uniquely the dendogram from the distances that it induces on that subset. By formalizing a dendogram in terms of an edge-weighted, rooted phylogenetic tree on a pre-given finite set X with |X|>2 whose edge-weighting is equidistant and a set of partial distances on X in terms of a set L of 2-subsets of X, we investigate this problem in terms of when such a tree is lassoed, that is, uniquely determined by the elements in L. For this we consider four different formalizations of the idea of "uniquely determining" giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary then all four types of lasso must coincide

    First Bio-Anthropological Evidence for Yamnaya Horsemanship.

    Get PDF
    The origins of horseback riding remain elusive. Scientific studies show that horses were kept for their milk similar to 3500 to 3000 BCE, widely accepted as indicating domestication. However, this does not confirm them to be ridden. Equipment used by early riders is rarely preserved, and the reliability of equine dental and mandibular pathol-ogies remains contested. However, horsemanship has two interacting components: the horse as mount and the human as rider. Alterations associated with riding in human skeletons therefore possibly provide the best source of information. Here, we report five Yamnaya individuals well-dated to 3021 to 2501 calibrated BCE from kurgans in Romania, Bulgaria, and Hungary, displaying changes in bone morphology and distinct pathologies associated with horseback riding. These are the oldest humans identified as riders so far.Peer reviewe

    Impact of clinical phenotypes on management and outcomes in European atrial fibrillation patients: a report from the ESC-EHRA EURObservational Research Programme in AF (EORP-AF) General Long-Term Registry

    Get PDF
    Background: Epidemiological studies in atrial fibrillation (AF) illustrate that clinical complexity increase the risk of major adverse outcomes. We aimed to describe European AF patients\u2019 clinical phenotypes and analyse the differential clinical course. Methods: We performed a hierarchical cluster analysis based on Ward\u2019s Method and Squared Euclidean Distance using 22 clinical binary variables, identifying the optimal number of clusters. We investigated differences in clinical management, use of healthcare resources and outcomes in a cohort of European AF patients from a Europe-wide observational registry. Results: A total of 9363 were available for this analysis. We identified three clusters: Cluster 1 (n = 3634; 38.8%) characterized by older patients and prevalent non-cardiac comorbidities; Cluster 2 (n = 2774; 29.6%) characterized by younger patients with low prevalence of comorbidities; Cluster 3 (n = 2955;31.6%) characterized by patients\u2019 prevalent cardiovascular risk factors/comorbidities. Over a mean follow-up of 22.5 months, Cluster 3 had the highest rate of cardiovascular events, all-cause death, and the composite outcome (combining the previous two) compared to Cluster 1 and Cluster 2 (all P <.001). An adjusted Cox regression showed that compared to Cluster 2, Cluster 3 (hazard ratio (HR) 2.87, 95% confidence interval (CI) 2.27\u20133.62; HR 3.42, 95%CI 2.72\u20134.31; HR 2.79, 95%CI 2.32\u20133.35), and Cluster 1 (HR 1.88, 95%CI 1.48\u20132.38; HR 2.50, 95%CI 1.98\u20133.15; HR 2.09, 95%CI 1.74\u20132.51) reported a higher risk for the three outcomes respectively. Conclusions: In European AF patients, three main clusters were identified, differentiated by differential presence of comorbidities. Both non-cardiac and cardiac comorbidities clusters were found to be associated with an increased risk of major adverse outcomes

    Clinical complexity and impact of the ABC (Atrial fibrillation Better Care) pathway in patients with atrial fibrillation: a report from the ESC-EHRA EURObservational Research Programme in AF General Long-Term Registry

    Get PDF
    Background: Clinical complexity is increasingly prevalent among patients with atrial fibrillation (AF). The ‘Atrial fibrillation Better Care’ (ABC) pathway approach has been proposed to streamline a more holistic and integrated approach to AF care; however, there are limited data on its usefulness among clinically complex patients. We aim to determine the impact of ABC pathway in a contemporary cohort of clinically complex AF patients. Methods: From the ESC-EHRA EORP-AF General Long-Term Registry, we analysed clinically complex AF patients, defined as the presence of frailty, multimorbidity and/or polypharmacy. A K-medoids cluster analysis was performed to identify different groups of clinical complexity. The impact of an ABC-adherent approach on major outcomes was analysed through Cox-regression analyses and delay of event (DoE) analyses. Results: Among 9966 AF patients included, 8289 (83.1%) were clinically complex. Adherence to the ABC pathway in the clinically complex group reduced the risk of all-cause death (adjusted HR [aHR]: 0.72, 95%CI 0.58–0.91), major adverse cardiovascular events (MACEs; aHR: 0.68, 95%CI 0.52–0.87) and composite outcome (aHR: 0.70, 95%CI: 0.58–0.85). Adherence to the ABC pathway was associated with a significant reduction in the risk of death (aHR: 0.74, 95%CI 0.56–0.98) and composite outcome (aHR: 0.76, 95%CI 0.60–0.96) also in the high-complexity cluster; similar trends were observed for MACEs. In DoE analyses, an ABC-adherent approach resulted in significant gains in event-free survival for all the outcomes investigated in clinically complex patients. Based on absolute risk reduction at 1 year of follow-up, the number needed to treat for ABC pathway adherence was 24 for all-cause death, 31 for MACEs and 20 for the composite outcome. Conclusions: An ABC-adherent approach reduces the risk of major outcomes in clinically complex AF patients. Ensuring adherence to the ABC pathway is essential to improve clinical outcomes among clinically complex AF patients

    Impact of renal impairment on atrial fibrillation: ESC-EHRA EORP-AF Long-Term General Registry

    Get PDF
    Background: Atrial fibrillation (AF) and renal impairment share a bidirectional relationship with important pathophysiological interactions. We evaluated the impact of renal impairment in a contemporary cohort of patients with AF. Methods: We utilised the ESC-EHRA EORP-AF Long-Term General Registry. Outcomes were analysed according to renal function by CKD-EPI equation. The primary endpoint was a composite of thromboembolism, major bleeding, acute coronary syndrome and all-cause death. Secondary endpoints were each of these separately including ischaemic stroke, haemorrhagic event, intracranial haemorrhage, cardiovascular death and hospital admission. Results: A total of 9306 patients were included. The distribution of patients with no, mild, moderate and severe renal impairment at baseline were 16.9%, 49.3%, 30% and 3.8%, respectively. AF patients with impaired renal function were older, more likely to be females, had worse cardiac imaging parameters and multiple comorbidities. Among patients with an indication for anticoagulation, prescription of these agents was reduced in those with severe renal impairment, p <.001. Over 24 months, impaired renal function was associated with significantly greater incidence of the primary composite outcome and all secondary outcomes. Multivariable Cox regression analysis demonstrated an inverse relationship between eGFR and the primary outcome (HR 1.07 [95% CI, 1.01–1.14] per 10 ml/min/1.73 m2 decrease), that was most notable in patients with eGFR <30 ml/min/1.73 m2 (HR 2.21 [95% CI, 1.23–3.99] compared to eGFR ≥90 ml/min/1.73 m2). Conclusion: A significant proportion of patients with AF suffer from concomitant renal impairment which impacts their overall management. Furthermore, renal impairment is an independent predictor of major adverse events including thromboembolism, major bleeding, acute coronary syndrome and all-cause death in patients with AF
    corecore