659 research outputs found

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    A Scalable Tag-Based Recommender System for New Users of the Social Web

    Full text link
    Folksonomies have become a powerful tool to describe, discover, search, and navigate online resources (e.g., pictures, videos, blogs) on the Social Web. Unlike taxonomies and ontologies, which overimpose a hierarchical categorisation of content, folksonomies empower end users, by enabling them to freely create and choose the categories (in this case, tags) that best describe a piece of information. However, the freedom afforded to users comes at a cost: as tags are informally defined and ungoverned, the retrieval of information becomes more challenging. In this paper, we propose Clustered Social Ranking (CSR), a novel search and recommendation technique specifically developed to support new users of Web 2.0 websites finding content of interest. The observation underpinning CSR is that the vast majority of content on Web 2.0 websites is created by a small proportion of users (leaders), while the others (followers) mainly browse such content. CSR first identifies who the leaders are; it then clusters them into communities with shared interests, based on their tagging activity. Users' queries (be them searches or recommendations) are then directed to the community of leaders who can best answer them. Our evaluation, conducted on the CiteULike dataset, demonstrates that CSR achieves an accuracy that is comparable to the best state-of-the-art techniques, but at a much smaller computational cost, thus affording it better scalability in these fast growing settings. © 2011 Springer-Verlag Berlin Heidelberg

    Text Mining Infrastructure in R

    Get PDF
    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

    An Exploration of Semantic Features in an Unsupervised Thematic Fit Evaluation Framework

    Get PDF
    Thematic fit is the extent to which an entity fits a thematic role in the semantic frame of an event, e.g., how well humans would rate “knife” as an instrument of an event of cutting. We explore the use of the SENNA semantic role-labeller in defining a distributional space in order to build an unsupervised model of event-entity thematic fit judgements. We test a number of ways of extracting features from SENNA-labelled versions of the ukWaC and BNC corpora and identify tradeoffs. Some of our Distributional Memory models outperform an existing syntax-based model (TypeDM) that uses hand-crafted rules for role inference on a previously tested data set. We combine the results of a selected SENNA-based model with TypeDM’s results and find that there is some amount of complementarity in what a syntactic and a semantic model will cover. In the process, we create a broad-coverage semantically-labelled corpus

    Instance-based natural language generation

    Get PDF
    In recent years, ranking approaches to Natural Language Generation have become increasingly popular. They abandon the idea of generation as a deterministic decision¬ making process in favour of approaches that combine overgeneration with ranking at some stage in processing.In this thesis, we investigate the use of instance-based ranking methods for surface realization in Natural Language Generation. Our approach to instance-based Natural Language Generation employs two basic components: a rule system that generates a number of realization candidates from a meaning representation and an instance-based ranker that scores the candidates according to their similarity to examples taken from a training corpus. The instance-based ranker uses information retrieval methods to rank output candidates.Our approach is corpus-based in that it uses a treebank (a subset of the Penn Treebank II containing management succession texts) in combination with manual semantic markup to automatically produce a generation grammar. Furthermore, the corpus is also used by the instance-based ranker. The semantic annotation of a test portion of the compiled subcorpus serves as input to the generator.In this thesis, we develop an efficient search technique for identifying the optimal candidate based on the A*-algorithm, detail the annotation scheme and grammar con¬ struction algorithm and show how a Rete-based production system can be used for efficient candidate generation. Furthermore, we examine the output of the generator and discuss issues like input coverage (completeness), fluency and faithfulness that are relevant to surface generation in general

    Next Generation of Product Search and Discovery

    Get PDF
    Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users. This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized

    Picbreeder: A Case Study in Collaborative Evolutionary Exploration of Design Space

    Get PDF
    For domains in which fitness is subjective or difficult to express formally, interactive evolutionary computation (IEC) is a natural choice. It is possible that a collaborative process combining feedback from multiple users can improve the quality and quantity of generated artifacts. Picbreeder, a large-scale online experiment in collaborative interactive evolution (CIE), explores this potential. Picbreeder is an online community in which users can evolve and share images, and most importantly, continue evolving others\u27 images. Through this process of branching from other images, and through continually increasing image complexity made possible by the underlying neuroevolution of augmenting topologies (NEAT) algorithm, evolved images proliferate unlike in any other current IEC system. This paper discusses not only the strengths of the Picbreeder approach, but its challenges and shortcomings as well, in the hope that lessons learned will inform the design of future CIE systems

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    On construction, performance, and diversification for structured queries on the semantic desktop

    Get PDF
    [no abstract
    corecore