199 research outputs found
Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.JRC.G.2-Global security and crisis managemen
Linking News Content Across Languages
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 4-5.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
JRC's Participation in the Guided Summarization Task at TAC 2010
In this paper we describe our participation in the Guided Summarization Task at the Text Analysis Conference 2010 (TAC'10). The goal of the task was to encourage a deeper semantic analysis of the source documents instead of relying only on document word frequencies to select important concepts. We used the output of our event extraction system and automatic learning of semantically-related terms to capture the required aspects of each particular article category. We submitted two runs: the first uses information extraction tools in combination with co-occurrence of features, the second uses only co-occurrence information. In the following sections we describe our runs and discuss the results attained.JRC.DG.G.2-Global security and crisis managemen
Experiments to Improve Named Entity Recognition on Turkish Tweets
Social media texts are significant information sources for several
application areas including trend analysis, event monitoring, and opinion
mining. Unfortunately, existing solutions for tasks such as named entity
recognition that perform well on formal texts usually perform poorly when
applied to social media texts. In this paper, we report on experiments that
have the purpose of improving named entity recognition on Turkish tweets, using
two different annotated data sets. In these experiments, starting with a
baseline named entity recognition system, we adapt its recognition rules and
resources to better fit Twitter language by relaxing its capitalization
constraint and by diacritics-based expansion of its lexical resources, and we
employ a simplistic normalization scheme on tweets to observe the effects of
these on the overall named entity recognition performance on Turkish tweets.
The evaluation results of the system with these different settings are provided
with discussions of these results.Comment: appears in Proceedings of the EACL Workshop on Language Analysis for
Social Media, 201
Wrapping up a Summary: from Representation to Generation
The main focus of this work is to investigate robust ways for generating summaries from summary representations without recurring to simple sentence extraction and aiming at more human-like summaries. This is motivated by empirical evidence from TAC 2009 data showing that human summaries contain on average more and shorter sentences than the system summaries. We report encouraging preliminary results comparable to those attained by participating systems at TAC 2009.JRC.DG.G.2-Global security and crisis managemen
- …