5,465 research outputs found

    Using Answer Set Programming for pattern mining

    Get PDF
    Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

    Data mining and fusion

    No full text

    Improving Hypernymy Extraction with Distributional Semantic Classes

    Full text link
    In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japa

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    Efficiently Mining Temporal Patterns in Time Series Using Information Theory

    Get PDF

    Understanding and Evaluating Policies for Sequential Decision-Making

    Get PDF
    Sequential-decision making is a critical component of many complex systems, such as finance, healthcare, and robotics. The long-term goal of a sequential decision-making process is to optimize the policy under which decisions are made. In safety-critical domains, the search for an optimal policy must be based on observational data, as new decision-making strategies need to be carefully evaluated before they can be tested in practice. In this thesis, we highlight the importance of understanding sequential decision-making at different stages of this procedure. For example, to assess which policies can be evaluated with the available data, we need to understand the policy that actually generated the data. And once we are given a policy to evaluate, we need to understand how it differs from current practice.First, we focus on the evaluation process, where a target policy is evaluated using off-policy data collected under a different so-called behavior policy. This problem, commonly referred to as off-policy evaluation, is often solved with importance sampling (IS) techniques. Despite their popularity, IS-based methods suffer from high variance and are hard to diagnose. To address these issues, we propose estimating the behavior policy using prototype learning. Using the learned prototypes, we describe differences between target and behavior policies, allowing for better assessment of the IS estimates.Next, we take a clinical direction and study the sequential treatment of patients with rheumatoid arthritis (RA). The armamentarium of disease-modifying anti-rheumatic drugs (DMARDs) for RA patients has greatly expanded over the past decades. However, it is still unclear which treatment work best for individual patients. To examine how observational data can be used to evaluate new policies, we describe the most common patterns of DMARDs in a large patient registry from the US. We find that the number of unique patterns is large, indicating a significant variation in clinical practice which can be exploited for evaluation purposes. However, additional assumptions may be required to arrive at statistically sound results

    Automatic Algorithm Selection for Complex Simulation Problems

    Get PDF
    To select the most suitable simulation algorithm for a given task is often difficult. This is due to intricate interactions between model features, implementation details, and runtime environment, which may strongly affect the overall performance. The thesis consists of three parts. The first part surveys existing approaches to solve the algorithm selection problem and discusses techniques to analyze simulation algorithm performance.The second part introduces a software framework for automatic simulation algorithm selection, which is evaluated in the third part.Die Auswahl des passendsten Simulationsalgorithmus für eine bestimmte Aufgabe ist oftmals schwierig. Dies liegt an der komplexen Interaktion zwischen Modelleigenschaften, Implementierungsdetails und Laufzeitumgebung. Die Arbeit ist in drei Teile gegliedert. Der erste Teil befasst sich eingehend mit Vorarbeiten zur automatischen Algorithmenauswahl, sowie mit der Leistungsanalyse von Simulationsalgorithmen. Der zweite Teil der Arbeit stellt ein Rahmenwerk zur automatischen Auswahl von Simulationsalgorithmen vor, welches dann im dritten Teil evaluiert wird
    • …
    corecore