2 research outputs found

    Learning rules from incomplete examples via implicit mention models

    No full text
    We consider the problem of learning rules from natural language text sources. These sources, such as news articles, journal articles, and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces an explicit probabilistic mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can successfully learn accurate rules and apply them to new texts to make correct inferences.

    Asian Conference on Machine Learning Learning Rules from Incomplete Examples via Implicit Mention Models

    No full text
    We study the problem of learning general rules from concrete facts extracted from natural data sources such as the newspaper stories and medical histories. Natural data sources present two challenges to automated learning, namely, radical incompleteness and systematic bias. In this paper, we propose an approach that combines simultaneous learning of multiple predictive rules with differential scoring of evidence which adapts to a presumed model of data generation. Learning multiple predicates simultaneously mitigates the problem of radical incompleteness, while the differential scoring would help reduce the effects of systematic bias. We evaluate our approach empirically on both textual and non-textual sources. Wefurther present atheoretical analysis that elucidates our approach and explains the empirical results
    corecore