8 research outputs found
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition. Volume 2 : Traitement Automatique des Langues Naturelles
@ 6ème conférence conjointe: JEP-TALN-RECITAL 2020no abstrac
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Peer reviewe
Computational models for semantic textual similarity
164 p.The overarching goal of this thesis is to advance on computational models of meaning and their evaluation. To achieve this goal we define two tasks and develop state-of-the-art systems that tackle both task: Semantic Textual Similarity (STS) and Typed Similarity.STS aims to measure the degree of semantic equivalence between two sentences by assigning graded similarity values that capture the intermediate shades of similarity. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube. This new system for STS is on par with state-of-the-art approaches that make use of Machine Learning (ML) without using any of it, but ML can be used on this system, improving the results.Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.Finally, we present systems capable of resolving the Typed Similarity task. The best system resulted in a real-world application to recommend similar items to users in an online digital library
Computational models for semantic textual similarity
164 p.The overarching goal of this thesis is to advance on computational models of meaning and their evaluation. To achieve this goal we define two tasks and develop state-of-the-art systems that tackle both task: Semantic Textual Similarity (STS) and Typed Similarity.STS aims to measure the degree of semantic equivalence between two sentences by assigning graded similarity values that capture the intermediate shades of similarity. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube. This new system for STS is on par with state-of-the-art approaches that make use of Machine Learning (ML) without using any of it, but ML can be used on this system, improving the results.Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.Finally, we present systems capable of resolving the Typed Similarity task. The best system resulted in a real-world application to recommend similar items to users in an online digital library
concepts - methods - visualization
While Darwin’s grand view of evolution has undergone many changes and shown up
in many facets, there remains one outstanding common feature in its 150-year
history: since the very beginning, branching trees have been the dominant
scheme for representing evolutionary processes. Only recently, network models
have gained ground reflecting contact-induced mixing or hybridization in
evolutionary scenarios. In biology, research on prokaryote evolution indicates
that lateral gene transfer is a major feature in the evolution of bacteria. In
the field of linguistics, the mutual lexical and morphosyntactic borrowing
between languages seems to be much more central for language evolution than
the family tree model is likely to concede. In the humanities, networks are
employed as an alternative to established phylogenetic models, to express the
hybridization of cultural phenomena, concepts or the social structure of
science. However, an interdisciplinary display of network analyses for
evolutionary processes remains lacking. Therefore, this volume includes
approaches studying the evolutionary dynamics of science, languages and
genomes, all of which were based on methods incorporating network approaches
A high speed transcription interface for annotating primary linguistic data
We present a new transcription mode for the annotation tool ELAN. This mode is designed to speed up the process of creating transcriptions of primary linguistic data (video and/or audio recordings of linguistic behaviour). We survey the basic transcription workflow of some commonly used tools (Transcriber, BlitzScribe, and ELAN) and describe how the new transcription interface improves on these existing implementations. We describe the design of the transcription interface and explore some further possibilities for improvement in the areas of segmentation and computational enrichment of annotations
Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2). 29 November 2012, Lisbon, Portugal
Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), held in Lisbon, Portugal on 29 November 2012