37 research outputs found

    A Comparison of Clustering Techniques for Meteorological Analysis

    Get PDF
    Present work proposes the application of several clustering techniques (k-means, SOM k-means, k-medoids, and agglomerative hierarchical) to analyze the climatological conditions in different places. To do so, real-life data from data acquisition stations in Spain are analyzed, provided by AEMET (Spanish Meteorological Agency). Some of the main meteorological variables daily acquired by these stations are studied in order to analyse the variability of the environmental conditions in the selected places. Additionally, it is intended to characterize the stations according to their location, which could be applied for any other station. A comprehensive analysis of four different clustering techniques is performed, giving interesting results for a meteorological analysis

    Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mitochondrial DNA is an ideal source of information to conduct evolutionary and phylogenetic studies due to its extraordinary properties and abundance. Many insights can be gained from these, including but not limited to screening genetic variation to identify potentially deleterious mutations. However, such advances require efficient solutions to very difficult computational problems, a need that is hampered by the very plenty of data that confers strength to the analysis.</p> <p>Results</p> <p>We develop a systematic, automated methodology to overcome these difficulties, building from readily available, public sequence databases to high-quality alignments and phylogenetic trees. Within each stage in an autonomous workflow, outputs are carefully evaluated and outlier detection rules defined to integrate expert knowledge and automated curation, hence avoiding the manual bottleneck found in past approaches to the problem. Using these techniques, we have performed exhaustive updates to the human mitochondrial phylogeny, illustrating the power and computational scalability of our approach, and we have conducted some initial analyses on the resulting phylogenies.</p> <p>Conclusions</p> <p>The problem at hand demands careful definition of inputs and adequate algorithmic treatment for its solutions to be realistic and useful. It is possible to define formal rules to address the former requirement by refining inputs directly and through their combination as outputs, and the latter are also of help to ascertain the performance of chosen algorithms. Rules can exploit known or inferred properties of datasets to simplify inputs through partitioning, therefore cutting computational costs and affording work on rapidly growing, otherwise intractable datasets. Although expert guidance may be necessary to assist the learning process, low-risk results can be fully automated and have proved themselves convenient and valuable.</p

    Holding it together: rapid evolution and positive selection in the synaptonemal complex of Drosophila

    Get PDF
    Background The synaptonemal complex (SC) is a highly conserved meiotic structure that functions to pair homologs and facilitate meiotic recombination in most eukaryotes. Five Drosophila SC proteins have been identified and localized within the complex: C(3)G, C(2)M, CONA, ORD, and the newly identified Corolla. The SC is required for meiotic recombination in Drosophila and absence of these proteins leads to reduced crossing over and chromosomal nondisjunction. Despite the conserved nature of the SC and the key role that these five proteins have in meiosis in D. melanogaster, they display little apparent sequence conservation outside the genus. To identify factors that explain this lack of apparent conservation, we performed a molecular evolutionary analysis of these genes across the Drosophila genus. Results For the five SC components, gene sequence similarity declines rapidly with increasing phylogenetic distance and only ORD and C(2)M are identifiable outside of the Drosophila genus. SC gene sequences have a higher dN/dS (ω) rate ratio than the genome wide average and this can in part be explained by the action of positive selection in almost every SC component. Across the genus, there is significant variation in ω for each protein. It further appears that ω estimates for the five SC components are in accordance with their physical position within the SC. Components interacting with chromatin evolve slowest and components comprising the central elements evolve the most rapidly. Finally, using population genetic approaches, we demonstrate that positive selection on SC components is ongoing. Conclusions SC components within Drosophila show little apparent sequence homology to those identified in other model organisms due to their rapid evolution. We propose that the Drosophila SC is evolving rapidly due to two combined effects. First, we propose that a high rate of evolution can be partly explained by low purifying selection on protein components whose function is to simply hold chromosomes together. We also propose that positive selection in the SC is driven by its sex-specificity combined with its role in facilitating both recombination and centromere clustering in the face of recurrent bouts of drive in female meiosis

    An Empirical Evaluation of Consensus Rules for Molecular Sequences

    No full text

    The complexity of the median procedure for binary trees

    No full text

    Non-shared Edges

    No full text

    Detecting Change of Patterns in Landslide Displacements Using Machine Learning, an Example Application

    No full text
    Machine learning and signal processing can support the definition of landslide alert/alarm systems based on monitoring data. The possibility to rely on a straightfor- ward and automatic procedure to identify hazardous situations could be very useful for risk management and decision makers. In this work, we propose a hierarchical clustering algorithm to identify changes of pattern in the displacements of monitored landslides. Our test site is a large, active Deep-seated Gravitational Slope Deforma- tion (DGSD) in which secondary movements provide sediment for debris flows that threaten downstream settlements. An Automated Total Station (ATS) has been installed in 2012 to measure the three-dimensional displacements of several benchmarks distributed on the source area and to trigger alarms if superficial movements potentially leading to collapses are detected. Results show that the procedure allows to group benchmarks with similar displacement patterns. The unsupervised defini- tion of homogenous areas from a kinematic viewpoint supports an unbiased geomorphological characterization of the large landslide. Moreover, the method allows to trigger alert warnings if some monitored points change displacement pattern. The identification of possible hazardous situation is performed without imposing fixed and arbitrary thresholds and without calibration. The recognition of areas with new types of activity supports the definition of the sediment volumes available for transport for the next debris flow event and assists the definition of reliable risk scenarios
    corecore