21 research outputs found

    Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits.

    Get PDF
    To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii, isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions

    Accurate detection of complex structural variations using single-molecule sequencing

    Get PDF
    Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings

    Exploring crash-risk factors using Bayes’ theorem and an optimization routine

    No full text
    Regression models used to analyse crash counts are associated with some kinds of data aggregation (either spatial, or temporal or both) that may result in inconsistent or incorrect outcomes. This paper introduces a new non-regression approach for analysing risk factors affecting crash counts without aggregating crashes. The method is an application of the Bayes’ Theorem that enables to compare the distribution of the prevailing traffic conditions on a road network (i.e. a priori) with the distribution of traffic conditions just before crashes (i.e. a posteriori). By making use of Bayes’ Theorem, the probability densities of continuous explanatory variables are estimated using kernel density estimation and a posterior log likelihood is maximised by an optimisation routine (Maximum Likelihood Estimation). The method then estimates the parameters that define the crash risk that is associated with each of the examined crash contributory factors. Both simulated and real-world data were employed to demonstrate and validate the developed theory in which, for example, two explanatory traffic variables speed and volume were employed. Posterior kernel densities of speed and volume at the location and time of crashes have found to be different that prior kernel densities of the same variables. The findings are logical as higher traffic volumes increase the risk of all crashes independently of collision type, severity and time of occurrence. Higher speeds were found to decrease the risk of multiple-vehicle crashes at peak-times and not to affect significantly multiple-vehicle crash occurrences during off-peak times. However, the risk of single vehicle crashes always increases while speed increases

    Identification of secondary incidents inside congested areas using analytical methods

    No full text
    141 σ.Στις ελεύθερες λεωφόρους, έπειτα από ένα συμβάν –για παράδειγμα ένα τροχαίο ατύχημα- είναι συχνό φαινόμενο ο σχηματισμός περιοχών συμφόρησης ανάντη του συμβάντος. Αυτές οι περιοχές, χαρακτηρίζονται από συνθήκες χαμηλών ταχυτήτων και υψηλής πυκνότητας, και μπορεί να εκτείνονται σε μεγάλα τμήματα της οδού για μακρά χρονικά διαστήματα. Σε τέτοιες συνθήκες συμφόρησης είναι αυξημένη η πιθανότητα εμφάνισης νέων συμβάντων –ως αποτέλεσμα των πρωτογενών- τα οποία ονομάζονται δευτερογενή. Σκοπός της παρούσας διπλωματικής είναι η ανάπτυξη και αξιολόγηση μιας δυναμικής μεθοδολογίας εντοπισμού δευτερογενών συμβάντων σε ελεύθερες λεωφόρους. Στο πλαίσιο της διπλωματικής αναπτύσσονται και αξιολογούνται αναλυτικές μέθοδοι, που βασίζονται σε μετρήσεις μακροσκοπικών κυκλοφοριακών μεγεθών (μέση ταχύτητα διαδρομής, φόρτος και πυκνότητα). Αρχικά, ανιχνεύονται οι περιοχές συμφόρησης που σχηματίστηκαν έπειτα από ατυχήματα που έλαβαν χώρα στην περιοχή μελέτης, με δυναμικές, αναλυτικές και εμπειρικές, μεθόδους. Στη συνέχεια, γίνεται στατιστική ανάλυση με στόχο πρώτον τη σύγκριση των μεθόδων και δεύτερον τη συσχέτιση άλλων χαρακτηριστικών μεγεθών των περιοχών συμφόρησης. Τέλος, , συγκρίνονται οι στατικές και δυναμικές μέθοδοι ως προς τη δυνατότητα εντοπισμού δευτερογενών ατυχημάτων.In freeways, when an incident e.g. a car crash occurs, it is common to observe traffic congestion upstream of the incident. Congestion that builds up over upstream spatial road segments, may persist for long time, with significant impact to travel times and level of service. Such traffic conditions may also increase the probability of new incidents’ occurrence, the so-called secondary incidents. The aim of this thesis is to develop and evaluate a dynamic methodology for the identification of secondary incidents, using both empirical and analytical methods based on the macroscopic traffic variables (speed, volume, density). First, congested road segments caused by primary incidents are both empirically and analytically identified from loop detector data. Following, statistical analysis is conducted to compare the different methods and to reveal correlations between characteristic measures of congested regions. Finally, static and dynamic methods are employed for the identification of secondary incidents and are compared with respect to their results.Μαρία-Ιωάννα Μ. Ιμπριάλο

    Wasserstein Generative Adversarial Network to Address the Imbalanced Data Problem in Real-Time Crash Risk Prediction

    No full text
    Real-time crash risk prediction models aim to identify pre-crash conditions as part of active traffic safety management. However, traditional models which were mainly developed through matched case-control sampling have been criticised due to their biased estimations. In this study, the state-of-art class balancing method known as the Wasserstein Generative Adversarial Network (WGAN) was introduced to address the class imbalance problem in the model development. An extremely imbalanced dataset consisted of 257 crashes and over 10 million non-crash cases from M1 Motorway in United Kingdom for 2017 was then utilized to evaluate the proposed method. The real-time crash prediction model was developed by employing Deep Neural Network (DNN) and Logistic Regression (LR). Crash predictions were performed under different crash to non-crash ratios where synthetic crashes were generated by Wasserstein Generative Adversarial Network (WGAN), Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) sampling respectively. Comparisons were then made with algorithmic-level class balancing methods such as cost-sensitive learning and ensemble methods. Our findings suggest that WGAN clearly outperforms other oversampling methods in terms of handling the extremely imbalanced sample and the DNN model subsequently produces a crash prediction sensitivity of about 70% with a 5% false alarm rate. Based on the findings of this study, proactive traffic management strategies including Variable Speed Limit (VSL) and Dynamic Messing Signs (DMS) could be deployed to reduce the probability of crash occurrence. © 2000-2011 IEEE

    Wasserstein generative adversarial network to address the imbalanced data problem in real-time crash risk prediction

    No full text
    Real-time crash risk prediction models aim to identify pre-crash conditions as part of active traffic safety management. However, traditional models which were mainly developed through matched case-control sampling have been criticised due to their biased estimations. In this study, the state-of-art class balancing method known as the Wasserstein Generative Adversarial Network (WGAN) was introduced to address the class imbalance problem in the model development. An extremely imbalanced dataset consisted of 257 crashes and over 10 million non-crash cases from M1 Motorway in United Kingdom for 2017 was then utilized to evaluate the proposed method. The real-time crash prediction model was developed by employing Deep Neural Network (DNN) and Logistic Regression (LR). Crash predictions were performed under different crash to non-crash ratios where synthetic crashes were generated by Wasserstein Generative Adversarial Network (WGAN), Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) sampling respectively. Comparisons were then made with algorithmic-level class balancing methods such as cost-sensitive learning and ensemble methods. Our findings suggest that WGAN clearly outperforms other oversampling methods in terms of handling the extremely imbalanced sample and the DNN model subsequently produces a crash prediction sensitivity of about 70% with a 5% false alarm rate. Based on the findings of this study, proactive traffic management strategies including Variable Speed Limit (VSL) and Dynamic Messing Signs (DMS) could be deployed to reduce the probability of crash occurrence

    Systems-genetics identifies a macrophage cholesterol network associated with physiological wound healing

    No full text
    Among other cells, macrophages regulate the inflammatory and reparative phases during wound healing but genetic determinants and detailed molecular pathways that modulate these processes are not fully elucidated. Here, we took advantage of normal variation in wound healing in 1,378 genetically outbred mice, and carried out macrophage RNA-sequencing profiling of mice with extreme wound healing phenotypes (i.e., slow and fast healers, n = 146 in total). The resulting macrophage coexpression networks were genetically mapped and led to the identification of a unique module under strong trans-acting genetic control by the Runx2 locus. This macrophage-mediated healing network was specifically enriched for cholesterol and fatty acid biosynthetic processes. Pharmacological blockage of fatty acid synthesis with cerulenin resulted in delayed wound healing in vivo, and increased macrophage infiltration in the wounded skin, suggesting the persistence of an unresolved inflammation. We show how naturally occurring sequence variation controls transcriptional networks in macrophages, which in turn regulate specific metabolic pathways that could be targeted in wound healing
    corecore