Search CORE

273 research outputs found

Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

Author: Matwin Stan
Nadeau David
Turney Peter D.
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Knowledge and regularity in planning

Author: Allen John A.
Langley Pat
Matwin Stan
Publication venue
Publication date
Field of study

The field of planning has focused on several methods of using domain-specific knowledge. The three most common methods, use of search control, use of macro-operators, and analogy, are part of a continuum of techniques differing in the amount of reused plan information. This paper describes TALUS, a planner that exploits this continuum, and is used for comparing the relative utility of these methods. We present results showing how search control, macro-operators, and analogy are affected by domain regularity and the amount of stored knowledge

NASA Technical Reports Server

Automatic Dream Sentiment Analysis

Author: De Koninck Joseph
Matwin Stan
Nadeau David
Sabourin Catherine
Turney Peter D.
Publication venue
Publication date: 01/01/2006
Field of study

In this position paper, we propose a first step toward automatic analysis of sentiments in dreams. 100 dreams were sampled from a dream bank created for a normative study of dreams. Two human judges assigned a score to describe dream sentiments. We ran four baseline algorithms in an attempt to automate the rating of sentiments in dreams. Particularly, we compared the General Inquirer (GI) tool, the Linguistic Inquiry and Word Count (LIWC), a weighted version of the GI lexicon and of the HM lexicon and a standard bag-of-words. We show that machine learning allows automating the human judgment with accuracy superior to majority class choice

CogPrints Cognitive Sciences Eprint Archive

Bayesian Network Induction with Incomplete Private Data

Author: Chang LiWu
Matwin Stan
Zhan Justin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2004
Field of study

A Bayesian network is a graphical model for representing probabilistic relationships among a set of variables. It is an important model for business analysis. Bayesian network learning methods have been applied to business analysis where data privacy is not considered. However, how to learn a Bayesian network over private data presents a much greater challenge. In this paper, we develop an approach to tackle the problem of Bayesian network induction on private data which may contain missing values. The basic idea of our proposed approach is that we combine randomization technique with Expectation Maximization (EM) algorithm. The purpose of using randomization is to disguise the raw data. EM algorithm is applied for missing values in the private data set. We also present a method to conduct Bayesian network construction, which is one of data mining computations, from the disguised data

AIS Electronic Library (AISeL)

Privacy-Preserving Decision Tree Classification over Horizontally Partitioned Data

Author: Chang LiWu
Matwin Stan
Zhan Justin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2005
Field of study

Protection of privacy is one of important problems in data mining. The unwillingness to share their data frequently results in failure of collaborative data mining. This paper studies how to build a decision tree classifier under the following scenario: a database is horizontally partitioned into multiple pieces, with each piece owned by a particular party. All the parties want to build a decision tree classifier based on such a database, but due to the privacy constraints, neither of them wants to disclose their private pieces. We build a privacy-preserving system, including a set of secure protocols, that allows the parties to construct such a classifier. We guarantee that the private data are securely protected

AIS Electronic Library (AISeL)