Search CORE

35 research outputs found

Predicting the Semantic Textual Similarity with Siamese CNN and LSTM

Author: Huet Stéphane
Linhares Pontes Elvys
Linhares Andréa Carneiro
Torres-Moreno Juan-Manuel
Publication venue: HAL CCSD
Publication date: 14/05/2018
Field of study

National audienceSemantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems.La Similarité Textuelle Sémantique (STS) est la base de nombreuses applications dans le Traitement Automatique du Langage Naturel (TALN). Notre système combine des réseaux neuronaux convolutifs et récurrents pour mesurer la similarité sémantique des phrases. Il utilise un réseau convolutif pour tenir compte du contexte local des mots et un LSTM pour prendre en considération le contexte global d'une phrase. Cette combinaison des réseaux préserve mieux les informations significatives des phrases et améliore le calcul de la similarité entre les phrases. Notre modèle a obtenu de bons résultats et est compétitif avec les meilleurs systèmes de l'état de l'art

Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

Author: Benjannet Mohamed
Ming Lam Kim
Pontes Elvys Linhares
Publication venue
Publication date: 05/09/2023
Field of study

Environmental, Social, and Governance (ESG) has been used as a metric to measure the negative impacts and enhance positive outcomes of companies in areas such as the environment, society, and governance. Recently, investors have increasingly recognized the significance of ESG criteria in their investment choices, leading businesses to integrate ESG principles into their operations and strategies. The Multi-Lingual ESG Issue Identification (ML-ESG) shared task encompasses the classification of news documents into 35 distinct ESG issue labels. In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels. Our analysis revealed that the RoBERTa classifier emerged as one of the most successful approaches, securing the second-place position for the English test dataset, and sharing the fifth-place position for the French test dataset. Furthermore, our SVM-based binary model tailored for the Chinese language exhibited exceptional performance, earning the second-place rank on the test dataset

arXiv.org e-Print Archive

Microblog Contextualization Using Continuous Space Vectors: Multi-Sentence Compression of Cultural Documents

Author: Huet Stéphane
Linhares Pontes Elvys
Linhares Andréa Carneiro
Torres-Moreno Juan-Manuel
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

International audienceIn this paper we describe our work for the MC2 CLEF 2017 lab. We participated in the content analysis task that involves filtering, language recognition and summarization. We combine Information Retrieval with Multi-Sentence Compression methods to contextualize mi-croblogs using Wikipedia's pages

PolyPublie

Automatic Text Summarization with a Reduced Vocabulary Using Continuous Space Vectors

Author: Huet Stéphane
Linhares Pontes Elvys
Linhares Andréa Carneiro
Torres-Moreno Juan-Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

poster paperInternational audienceIn this paper, we propose a new method that uses continuous vectors to map words to a reduced vocabulary, in the context of Automatic Text Summarization (ATS). This method is evaluated on the MultiLing corpus by the ROUGE evaluation measures with four ATS systems. Our experiments show that the reduced vocabulary improves the performance of state-of-the-art systems

Crossref

PolyPublie

SASI: sumarizador automático de documentos baseado no problema do subconjunto independente de vértices

Author: Linhares Pontes Elvys
Linhares Andréa Carneiro
Torres-Moreno Juan-Manuel
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

XLVI Simpósio Brasileiro de Pesquisa OperacionalThis article discusses a summarizer system of documents named SASI. This system features an innovative approach to provide automatic summaries, based on the determination of the maximum independent subset of vertices, modeling the problem a graph of phrases (vertices) and the relationships between them (edges). The concepts and operation of the proposed summarizer and a series of tests comparing the results provided by SASI with others summarizer systems are described. Initial results are promising, evaluating questions of informativeness of the produced summaries on the parameters of time and algorithmic complexity

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Author: da Silva Thiago G.
Huet Stéphane
Linhares Andréa Carneiro
Pontes Elvys Linhares
Torres-Moreno Juan-Manuel
Publication venue
Publication date: 01/01/2020
Field of study

Multi-Sentence Compression (MSC) aims to generate a short sentence with the key information from a cluster of similar sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes an Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, with the goal of generating more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state of the art for evaluations led on news datasets in three languages: French, Portuguese and Spanish. We led both automatic and manual evaluations to determine the informativeness and the grammaticality of compressions for each dataset. In additional tests, which take advantage of the fact that the length of compressions can be modulated, we still improve ROUGE scores with shorter output sentences.Comment: Preprint versio

arXiv.org e-Print Archive

PolyPublie

Cross-Language Text Summarization using Sentence and Multi-Sentence Compression

Author: Huet Stéphane
Linhares Pontes Elvys
Linhares Andréa Carneiro
Torres-Moreno Juan-Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

long paperInternational audienceCross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual sum-marization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics

Crossref

PolyPublie