18 research outputs found

    Résumé automatique de textes juridiques

    Full text link
    Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal

    Domain Adaptation Techniques for Machine Translation and Their Evaluation in a Real-World Setting

    Get PDF
    Abstract. Statistical Machine Translation (SMT) is currently used in real-time and commercial settings to quickly produce initial translations for a document which can later be edited by a human. The SMT models specialized for one domain often perform poorly when applied to other domains. The typical assumption that both training and testing data are drawn from the same distribution no longer applies. This paper evaluates domain adaptation techniques for SMT systems in the context of end-user feedback in a real world application. We present our experiments using two adaptive techniques, one relying on log-linear models and the other using mixture models. We describe our experimental results on legal and government data, and present the human evaluation effort for post-editing in addition to traditional automated scoring techniques (BLEU scores). The human effort is based primarily on the amount of time and number of edits required by a professional post-editor to improve the quality of machine-generated translations to meet industry standards. The experimental results in this paper show that the domain adaptation techniques can yield a significant increase in BLEU score (up to four points) and a significant reduction in post-editing time of about one second per word

    TAL et réseaux sociaux

    No full text
    International audienceLes réseaux sociaux intègrent un volume et une variété sans précédent de données textuelles. Leur analyse permet de mieux comprendre des comportements sociaux et certaines évolutions sociétales. L’étude des messages échangés, qui sont par nature complexes, représente de nouvelles problématiques pour le traitement automatique des langues (TAL). Dans ce contexte, cet article introductif au numéro spécial de la revue TAL présente les défis liés à l’infobésité des données issues des réseaux sociaux puis discute de l’utilisation des méthodes de TAL pour traiter le contenu textuel de ces nouveaux modes de communication

    A Survey of Techniques for Event Detection in Twitter

    No full text
    Twitter is among the fastest-growing microblogging and online social networking services. Messages posted on Twitter (tweets) have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire actionable knowledge. This article provides a survey of techniques for event detection from Twitter streams. These techniques aim at finding real-world occurrences that unfold over space and time. In contrast to conventional media, event detection from Twitter streams poses new challenges. Twitter streams contain large amounts of meaningless messages and polluted content, which negatively affect the detection performance. In addition, traditional text mining techniques are not suitable, because of the short length of tweets, the large number of spelling and grammatical errors, and the frequent use of informal and mixed language. Event detection techniques presented in literature address these issues by adapting techniques from various fields to the uniqueness of Twitter. This article classifies these techniques according to the event type, detection task, and detection method and discusses commonly used features. Finally, it highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features

    Les défis de l'analyse des réseaux sociaux pour le traitement automatique des langues

    No full text
    International audienceSocial networks incorporate an unprecedented amount and variety of textual data. The analysis of this information furthers our understanding of social behaviors and some trends. The study of inherently complex messages sent between users represents new problems for Natural Language Processing (NLP). In this context, the first article in this special issue of the TAL journal introduces the challenges of information overload from social networks, and discusses the use of NLP methods for processing the textual content of these new modes of communication. MOTS-CLÉS : TAL, réseaux sociaux, analyse sémantique.Les réseaux sociaux intègrent un volume et une variété sans précédent de données textuelles. Leur analyse permet de mieux comprendre des comportements sociaux et certaines évolutions sociétales. L'étude des messages échangés, qui sont par nature complexes, représente de nouvelles problématiques pour le traitement automatique des langues (TAL). Dans ce contexte, cet article introductif au numéro spécial de la revue TAL présente les défis liés à l'infobésité des données issues des réseaux sociaux puis discute de l'utilisation des méthodes de TAL pour traiter le contenu textuel de ces nouveaux modes de communication

    3rd ed.

    No full text
    corecore