35 research outputs found

    Predicting the Semantic Textual Similarity with Siamese CNN and LSTM

    Get PDF
    National audienceSemantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems.La Similarité Textuelle Sémantique (STS) est la base de nombreuses applications dans le Traitement Automatique du Langage Naturel (TALN). Notre système combine des réseaux neuronaux convolutifs et récurrents pour mesurer la similarité sémantique des phrases. Il utilise un réseau convolutif pour tenir compte du contexte local des mots et un LSTM pour prendre en considération le contexte global d'une phrase. Cette combinaison des réseaux préserve mieux les informations significatives des phrases et améliore le calcul de la similarité entre les phrases. Notre modèle a obtenu de bons résultats et est compétitif avec les meilleurs systèmes de l'état de l'art

    Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

    Full text link
    Environmental, Social, and Governance (ESG) has been used as a metric to measure the negative impacts and enhance positive outcomes of companies in areas such as the environment, society, and governance. Recently, investors have increasingly recognized the significance of ESG criteria in their investment choices, leading businesses to integrate ESG principles into their operations and strategies. The Multi-Lingual ESG Issue Identification (ML-ESG) shared task encompasses the classification of news documents into 35 distinct ESG issue labels. In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels. Our analysis revealed that the RoBERTa classifier emerged as one of the most successful approaches, securing the second-place position for the English test dataset, and sharing the fifth-place position for the French test dataset. Furthermore, our SVM-based binary model tailored for the Chinese language exhibited exceptional performance, earning the second-place rank on the test dataset

    Microblog Contextualization Using Continuous Space Vectors: Multi-Sentence Compression of Cultural Documents

    Get PDF
    International audienceIn this paper we describe our work for the MC2 CLEF 2017 lab. We participated in the content analysis task that involves filtering, language recognition and summarization. We combine Information Retrieval with Multi-Sentence Compression methods to contextualize mi-croblogs using Wikipedia's pages

    Automatic Text Summarization with a Reduced Vocabulary Using Continuous Space Vectors

    Get PDF
    poster paperInternational audienceIn this paper, we propose a new method that uses continuous vectors to map words to a reduced vocabulary, in the context of Automatic Text Summarization (ATS). This method is evaluated on the MultiLing corpus by the ROUGE evaluation measures with four ATS systems. Our experiments show that the reduced vocabulary improves the performance of state-of-the-art systems

    SASI: sumarizador automático de documentos baseado no problema do subconjunto independente de vértices

    Get PDF
    XLVI Simpósio Brasileiro de Pesquisa OperacionalThis article discusses a summarizer system of documents named SASI. This system features an innovative approach to provide automatic summaries, based on the determination of the maximum independent subset of vertices, modeling the problem a graph of phrases (vertices) and the relationships between them (edges). The concepts and operation of the proposed summarizer and a series of tests comparing the results provided by SASI with others summarizer systems are described. Initial results are promising, evaluating questions of informativeness of the produced summaries on the parameters of time and algorithmic complexity

    A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

    Full text link
    Multi-Sentence Compression (MSC) aims to generate a short sentence with the key information from a cluster of similar sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes an Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, with the goal of generating more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state of the art for evaluations led on news datasets in three languages: French, Portuguese and Spanish. We led both automatic and manual evaluations to determine the informativeness and the grammaticality of compressions for each dataset. In additional tests, which take advantage of the fact that the length of compressions can be modulated, we still improve ROUGE scores with shorter output sentences.Comment: Preprint versio

    Cross-Language Text Summarization using Sentence and Multi-Sentence Compression

    Get PDF
    long paperInternational audienceCross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual sum-marization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics
    corecore