108,759 research outputs found
To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy,
non-canonical data? To the best of our knowledge, little is known on the actual
impact of normalization in a real-world scenario, where gold error detection is
not available. We investigate the effect of automatic normalization on POS
tagging of tweets. We also compare normalization to strategies that leverage
large amounts of unlabeled data kept in its raw form. Our results show that
normalization helps, but does not add consistently beyond just word embedding
layer initialization. The latter approach yields a tagging model that is
competitive with a Twitter state-of-the-art tagger.Comment: In WNUT 201
Of course we share! Testing Assumptions about Social Tagging Systems
Social tagging systems have established themselves as an important part in
today's web and have attracted the interest from our research community in a
variety of investigations. The overall vision of our community is that simply
through interactions with the system, i.e., through tagging and sharing of
resources, users would contribute to building useful semantic structures as
well as resource indexes using uncontrolled vocabulary not only due to the
easy-to-use mechanics. Henceforth, a variety of assumptions about social
tagging systems have emerged, yet testing them has been difficult due to the
absence of suitable data. In this work we thoroughly investigate three
available assumptions - e.g., is a tagging system really social? - by examining
live log data gathered from the real-world public social tagging system
BibSonomy. Our empirical results indicate that while some of these assumptions
hold to a certain extent, other assumptions need to be reflected and viewed in
a very critical light. Our observations have implications for the design of
future search and other algorithms to better reflect the actual user behavior
Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG.
Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
- …