151,065 research outputs found
DTW-Global Constraint Learning Using Tabu Search Algorithm
AbstractMany methods have been proposed to measure the similarity between time series data sets, each with advantages and weaknesses. It is to choose the most appropriate similarity measure depending on the intended application domain and data considered. The performance of machine learning algorithms depends on the metric used to compare two objects. For time series, Dynamic Time Warping (DTW) is the most appropriate distance measure used. Many variants of DTW intended to accelerate the calculation of this distance are proposed. The distance learning is a subject already well studied. Indeed Data Mining tools, such as the algorithm of k-Means clustering, and K-Nearest Neighbor classification, require the use of a similarity/distance measure. This measure must be adapted to the application domain. For this reason, it is important to have and develop effective methods of computation and algorithms that can be applied to a large data set integrating the constraints of the specific field of study. In this paper a new hybrid approach to learn a global constraint of DTW distance is proposed. This approach is based on Large Margin Nearest Neighbors classification and Tabu Search algorithm. Experiments show the effectiveness of this approach to improve time series classification results
Automatic Taxonomy Generation - A Use-Case in the Legal Domain
A key challenge in the legal domain is the adaptation and representation of
the legal knowledge expressed through texts, in order for legal practitioners
and researchers to access this information easier and faster to help with
compliance related issues. One way to approach this goal is in the form of a
taxonomy of legal concepts. While this task usually requires a manual
construction of terms and their relations by domain experts, this paper
describes a methodology to automatically generate a taxonomy of legal noun
concepts. We apply and compare two approaches on a corpus consisting of
statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page
Using compression for profiling rheumatoid arthritis disease progression through data mining techniques
Tese de mestrado, Ciência de Dados, 2022, Universidade de Lisboa, Faculdade de CiênciasAnkylosing spondylitis (AS) is a chronic autoimmune inflammatory condition belonging to the spondyloarthropathy category of rheumatic diseases characterized by being highly debilitating diseases and having a high impact on patients physical and mental
health as well as social and quality of life. Biological treatment for this pathology is
difficult to pick and lacks clear selection criteria. Usually, treatment is chosen based on
patient convenience.
Our goal is to use an approach based on algorithmic information theory, without any
domain-specific parameters to set, or any background knowledge required (clustering by
compression), iterate over the current state of the art, so it can be better integrated into
python pipelines as well as better suit our specific problem, and apply it to our data comprised of patients with AS so patterns between biological treatments and patient profiles
can be established thereby helping clinicians make a better treatment choice for each patient.
Unsupervised clustering models are generated using normalized compression distance
matrices, which are then evaluated using v-measure, adjusted random score, and visually
analyzed taking into account model contingency matrix and feature distribution per cluster.
Possible patterns between biological treatment success and patient profiles were identified. Furthermore, we observed that the compression by column developed and implemented in this new tool for clustering by compression seemed to yield better results than
the previous approach
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
- …