2,124 research outputs found

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Cross-domain sentiment classification using a sentiment sensitive thesaurus

    Get PDF
    Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods. We conduct an extensive empirical analysis of the proposed method on single and multi-source domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus

    Computer-Aided Knowledge Engineering for Corporate Information Retrieval

    Get PDF
    In 1987, Digital Equipment Corporation's internal Madret Information Services Group I Information Access Services (lAS) decided to build a single thesaurus system to support production and retrieval of multiple applications. This system TIMS (Thesaurus I Indexing Management System) bad to be dynamic and allow for easy modification and merging of volatile business terminology. A faceted approach was used for knowledge-base building and semantic representation. 1be system allowed the knowledge engineer to determine a classification structure and to develop relation types suited to a specific application's requirements

    Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Ontology mapping by concept similarity

    Get PDF
    This paper presents an approach to the problem of mapping ontologies. The motivation for the research stems from the Diogene Project which is developing a web training environment for ICT professionals. The system includes high quality training material from registered content providers, and free web material will also be made available through the project's "Web Discovery" component. This involves using web search engines to locate relevant material, and mapping the ontology at the core of the Diogene system to other ontologies that exist on the Semantic Web. The project's approach to ontology mapping is presented, and an evaluation of this method is described

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
    corecore