151,065 research outputs found

    DTW-Global Constraint Learning Using Tabu Search Algorithm

    Get PDF
    AbstractMany methods have been proposed to measure the similarity between time series data sets, each with advantages and weaknesses. It is to choose the most appropriate similarity measure depending on the intended application domain and data considered. The performance of machine learning algorithms depends on the metric used to compare two objects. For time series, Dynamic Time Warping (DTW) is the most appropriate distance measure used. Many variants of DTW intended to accelerate the calculation of this distance are proposed. The distance learning is a subject already well studied. Indeed Data Mining tools, such as the algorithm of k-Means clustering, and K-Nearest Neighbor classification, require the use of a similarity/distance measure. This measure must be adapted to the application domain. For this reason, it is important to have and develop effective methods of computation and algorithms that can be applied to a large data set integrating the constraints of the specific field of study. In this paper a new hybrid approach to learn a global constraint of DTW distance is proposed. This approach is based on Large Margin Nearest Neighbors classification and Tabu Search algorithm. Experiments show the effectiveness of this approach to improve time series classification results

    Automatic Taxonomy Generation - A Use-Case in the Legal Domain

    Get PDF
    A key challenge in the legal domain is the adaptation and representation of the legal knowledge expressed through texts, in order for legal practitioners and researchers to access this information easier and faster to help with compliance related issues. One way to approach this goal is in the form of a taxonomy of legal concepts. While this task usually requires a manual construction of terms and their relations by domain experts, this paper describes a methodology to automatically generate a taxonomy of legal noun concepts. We apply and compare two approaches on a corpus consisting of statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page

    Using compression for profiling rheumatoid arthritis disease progression through data mining techniques

    Get PDF
    Tese de mestrado, Ciência de Dados, 2022, Universidade de Lisboa, Faculdade de CiênciasAnkylosing spondylitis (AS) is a chronic autoimmune inflammatory condition belonging to the spondyloarthropathy category of rheumatic diseases characterized by being highly debilitating diseases and having a high impact on patients physical and mental health as well as social and quality of life. Biological treatment for this pathology is difficult to pick and lacks clear selection criteria. Usually, treatment is chosen based on patient convenience. Our goal is to use an approach based on algorithmic information theory, without any domain-specific parameters to set, or any background knowledge required (clustering by compression), iterate over the current state of the art, so it can be better integrated into python pipelines as well as better suit our specific problem, and apply it to our data comprised of patients with AS so patterns between biological treatments and patient profiles can be established thereby helping clinicians make a better treatment choice for each patient. Unsupervised clustering models are generated using normalized compression distance matrices, which are then evaluated using v-measure, adjusted random score, and visually analyzed taking into account model contingency matrix and feature distribution per cluster. Possible patterns between biological treatment success and patient profiles were identified. Furthermore, we observed that the compression by column developed and implemented in this new tool for clustering by compression seemed to yield better results than the previous approach

    Normalized Information Distance

    Get PDF
    The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appea
    corecore