4,411 research outputs found
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?
Purpose:
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.
Design/methodology/approach:
A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel NaĂŻve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.
Findings:
The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.
Research limitations/implications:
The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Originality/value:
Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Effective Unsupervised Author Disambiguation with Relative Frequencies
This work addresses the problem of author name homonymy in the Web of
Science. Aiming for an efficient, simple and straightforward solution, we
introduce a novel probabilistic similarity measure for author name
disambiguation based on feature overlap. Using the researcher-ID available for
a subset of the Web of Science, we evaluate the application of this measure in
the context of agglomeratively clustering author mentions. We focus on a
concise evaluation that shows clearly for which problem setups and at which
time during the clustering process our approach works best. In contrast to most
other works in this field, we are sceptical towards the performance of author
name disambiguation methods in general and compare our approach to the trivial
single-cluster baseline. Our results are presented separately for each correct
clustering size as we can explain that, when treating all cases together, the
trivial baseline and more sophisticated approaches are hardly distinguishable
in terms of evaluation results. Our model shows state-of-the-art performance
for all correct clustering sizes without any discriminative training and with
tuning only one convergence parameter.Comment: Proceedings of JCDL 201
Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007
This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p
Doctor of Philosophy
dissertationDomain adaptation of natural language processing systems is challenging because it requires human expertise. While manual e ort is e ective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and con dentiality restrictions that hinder the ability to share training corpora among di erent research groups. Semantic ambiguity is a major barrier for e ective and accurate concept recognition by natural language processing systems. In my research I propose an automated domain adaptation method that utilizes sublanguage semantic schema for all-word word sense disambiguation of clinical narrative. According to the sublanguage theory developed by Zellig Harris, domain-speci c language is characterized by a relatively small set of semantic classes that combine into a small number of sentence types. Previous research relied on manual analysis to create language models that could be used for more e ective natural language processing. Building on previous semantic type disambiguation research, I propose a method of resolving semantic ambiguity utilizing automatically acquired semantic type disambiguation rules applied on clinical text ambiguously mapped to a standard set of concepts. This research aims to provide an automatic method to acquire Sublanguage Semantic Schema (S3) and apply this model to disambiguate terms that map to more than one concept with di erent semantic types. The research is conducted using unmodi ed MetaMap version 2009, a concept recognition system provided by the National Library of Medicine, applied on a large set of clinical text. The project includes creating and comparing models, which are based on unambiguous concept mappings found in seventeen clinical note types. The e ectiveness of the nal application was validated through a manual review of a subset of processed clinical notes using recall, precision and F-score metrics
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
- …