132,836 research outputs found

    Knowledge-based Selection of Association Rules for Text Mining

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceA reoccuring problem in mining association rules is the selection of interesting association rules within the overall, and possibly huge set of extracted rules. The majority of previous work in this area relies on statistical methods for quality estimation and se-lection of association rules. However, strictly bottom-up approaches are oblivious of knowledge though knowledge may be available (e.g., provided by ontologies), and rule extraction may take advantage of it. In this paper, we conceive of the problem of selecting association rules as a classification task. A framework of a binary probabilistic classifier is introduced that uses ontologies in order to estimate whether and to which degree a rule expresses a mere taxonomic relationship. In so doing, selection of association rules (selection by elimination) is carried out by identifying and discarding trivial association rules

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    A Conformity Measure using Background Knowledge for Association Rules: Application to Text Mining

    Get PDF
    A text mining process using association rules generates a very large number of rules. According to experts of the domain, most of these rules basically convey a common knowledge, i.e. rules which associate terms that experts may likely relate to each other. In order to focus on the result interpretation and discover new knowledge units, it is necessary to define criteria for classifying the extracted rules. Most of the rule classification methods are based on numerical quality measures. In this chapter, we introduce two classification methods: The first one is based on a classical numerical approach, i.e. using quality measures, and the other one is based on domain knowledge. We propose the second original approach in order to classify association rules according to qualitative criteria using domain model as background knowledge. Hence, we extend the classical numerical approach in an effort to combine data mining and semantic techniques for post mining and selection of association rules. We mined a corpus of texts in molecular biology and present the results of both approaches, compare them, and give a discussion on the benefits of taking into account a knowledge domain model of the data

    Some Aspects on Data Modelling

    Get PDF
    Statistical methods are motivated by the desire of learning from data. Transaction dataset and time-ordered data sequence are commonly found in many research areas, such as finance, bioinformatics and text mining. In this dissertation, two problems regarding these two types of data: association rule mining from transaction data and structural change estimation in time-ordered sequence, are studied. Informative association rule mining is fundamental for knowledge discovery from transaction data, for which brute-force search algorithms, e.g., the well-known Apriori algorithm, were developed. However, operating these algorithms becomes computationally intractable in searching large rule space. A stochastic search framework is developed to tackle this challenge by imposing a probability distribution on the association rule space and using the idea of annealing Gibbs sampling. Large rule space of exponential order can still be randomly searched by this algorithm to generate a Markov chain of viable length. This chain contains the most informative rules with probability one. The stochastic search algorithm is flexible to incorporate any measure of interest. Moreover, it reduces computational complexities and large memory requirements. A time-ordered data sequence may contain some sudden changes at some time points, before and after which the data sequences follow different distributions or statistical models. Change point problems in generalized linear models and distributions of independent random variables are studied respectively. Firstly, to estimate multiple change points in generalized linear models, we convert it into a model selection problem. Then modern model selection techniques are applied to estimate the regression coefficients. A consistent estimator of the number of change points is developed, and an algorithm is provided to estimate the change points. Secondly, to estimate single change point in distributions of independent random variables, a change point estimator is proposed based on empirical characteristic functions. Its consistency is also established

    The Ideal Candidate. Analysis of Professional Competences through Text Mining of Job Offers

    Get PDF
    The aim of this paper is to propose analytical tools for identifying peculiar aspects of job market for graduates. We propose a strategy for dealing with daa tat have different source and nature
    • …
    corecore