821 research outputs found
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
Application of Weighted Voting Taggers to Languages Described with Large Tagsets
The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets
The TXM Portal Software giving access to Old French Manuscripts Online
Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdfInternational audiencehttp://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations)
Towards a machine-learning architecture for lexical functional grammar parsing
Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also
recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages.
The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing.
In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able
to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously
lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages
Relating folksonomies with Dublin Core
This article presents a research carried out to continue the project Kinds of Tags,
which intends to identify elements required for metadata originating from folksonomies.
It will provide information that may be used by intelligent applications to assign tags to
metadata elements. Despite the unquestionably high value of DC and DC Terms, the pilot study
revealed a significant number of tags for which no corresponding properties yet existed. A need
for new properties was determined. This article presents the problem, motivation and
methodology of the underlying research. It further presents and discusses the findings from the
pilot study.(undefined
- …