273 research outputs found

    Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

    Get PDF
    In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)

    Knowledge and regularity in planning

    Get PDF
    The field of planning has focused on several methods of using domain-specific knowledge. The three most common methods, use of search control, use of macro-operators, and analogy, are part of a continuum of techniques differing in the amount of reused plan information. This paper describes TALUS, a planner that exploits this continuum, and is used for comparing the relative utility of these methods. We present results showing how search control, macro-operators, and analogy are affected by domain regularity and the amount of stored knowledge

    Automatic Dream Sentiment Analysis

    Get PDF
    In this position paper, we propose a first step toward automatic analysis of sentiments in dreams. 100 dreams were sampled from a dream bank created for a normative study of dreams. Two human judges assigned a score to describe dream sentiments. We ran four baseline algorithms in an attempt to automate the rating of sentiments in dreams. Particularly, we compared the General Inquirer (GI) tool, the Linguistic Inquiry and Word Count (LIWC), a weighted version of the GI lexicon and of the HM lexicon and a standard bag-of-words. We show that machine learning allows automating the human judgment with accuracy superior to majority class choice

    Bayesian Network Induction with Incomplete Private Data

    Get PDF
    A Bayesian network is a graphical model for representing probabilistic relationships among a set of variables. It is an important model for business analysis. Bayesian network learning methods have been applied to business analysis where data privacy is not considered. However, how to learn a Bayesian network over private data presents a much greater challenge. In this paper, we develop an approach to tackle the problem of Bayesian network induction on private data which may contain missing values. The basic idea of our proposed approach is that we combine randomization technique with Expectation Maximization (EM) algorithm. The purpose of using randomization is to disguise the raw data. EM algorithm is applied for missing values in the private data set. We also present a method to conduct Bayesian network construction, which is one of data mining computations, from the disguised data

    Privacy-Preserving Decision Tree Classification over Horizontally Partitioned Data

    Get PDF
    Protection of privacy is one of important problems in data mining. The unwillingness to share their data frequently results in failure of collaborative data mining. This paper studies how to build a decision tree classifier under the following scenario: a database is horizontally partitioned into multiple pieces, with each piece owned by a particular party. All the parties want to build a decision tree classifier based on such a database, but due to the privacy constraints, neither of them wants to disclose their private pieces. We build a privacy-preserving system, including a set of secure protocols, that allows the parties to construct such a classifier. We guarantee that the private data are securely protected
    • …
    corecore