3 research outputs found

    Caractérisation de registres de langue par extraction de motifs séquentiels émergents

    Get PDF
    International audienceLanguage registers are the highly perceptible characteristic of written or spoken communication. In this paper we present a methodology to automatically characterize language registers using statistical tool named "emerging sequential patterns". Our approach is presented in two steps : the first one exhibits the relevance of the chosen statistical tool from artificial texts ; the second one shows that the characteristic patterns of the language registers from real data can be extracted by using this statistical tool. Experimental results show the quality of our methodology

    C3Ro: An efficient mining algorithm of extende d-close d contiguous robust sequential patterns in noisy data

    Get PDF
    International audienceSequential pattern mining has been the focus of many works, but still faces a tough challenge in the mining of large databases for both efficiency and apprehensibility of its resulting set. To overcome these issues, the most promising direction taken by the literature relies on the use of constraints, including the well-known closedness constraint. However, such a mining is not resistant to noise in data, a characteristic of most real-world data. The main research question raised in this paper is thus: how to efficiently mine an apprehensible set of sequential patterns from noisy data? In order to address this research question, we introduce 1) two original constraints designed for the mining of noisy data: the robustness and the extended-closedness constraints, 2) a generic pattern mining algorithm, C3Ro, designed to mine a wide range of sequential patterns, going from closed or maximal contiguous sequential patterns to closed or maximal regular sequential patterns. C3Ro is dedicated to practitioners and is able to manage their multiple constraints. C3Ro also is the first sequential pattern mining algorithm to be as generic and parameterizable. Extensive experiments have been conducted and reveal the high efficiency of C3Ro, especially in large datasets, over well-known algorithms from the literature. Additional experiments have been conducted on a real-world job offers noisy dataset, with the goal to mine activities. This experiment offers a more thorough insight into C3Ro algorithm: job market experts confirm that the constraints we introduced actually have a significant positive impact on the apprehensibility of the set of mined activities

    Sequence mining under multiple constraints

    No full text
    International audienceIn this paper, we address the problem of mining sequential patterns under multiple constraints. Unlike classical algorithms , our approach handles various types of constraints which are not only numeric but also symbolic and syntactic. These multiple constraints enable us to express a large scope of knowledge to focus on interesting patterns. We illustrate our approach with the detection of gene–rare disease relationships from biomedical texts for the documentation of rare diseases
    corecore