43 research outputs found

    Ground Truth Spanish Automatic Extractive Text Summarization Bounds

    Get PDF
    The textual information has accelerated growth in the most spoken languages by native Internet users, such as Chinese, Spanish, English, Arabic, Hindi, Portuguese, Bengali, Russian, among others. It is necessary to innovate the methods of Automatic Text Summarization (ATS) that can extract essential information without reading the entire text. The most competent methods are Extractive ATS (EATS) that extract essential parts of the document (sentences, phrases, or paragraphs) to compose a summary. During the last 60 years of research of EATS, the creation of standard corpus with human-generated summaries and evaluation methods which are highly correlated with human judgments help to increase the number of new state-of-the-art methods. However, these methods are mainly supported for the English language, leaving aside other equally important languages such as Spanish, which is the second most spoken language by natives and the third most used on the Internet. A standard corpus for Spanish EATS (SAETS) is created to evaluate the state-of-the-art methods and systems for the Spanish language. The main contribution consists of a proposal for configuration and evaluation of 5 state-ofthe-art methods, five systems and four heuristics using three evaluation methods (ROUGE, ROUGE-C, and Jensen-Shannon divergence). It is the first time that Jensen-Shannon divergence is used to evaluate AETS. In this paper the ground truth bounds for the Spanish language are presented, which are the heuristics baseline:first, baseline:random, topline and concordance. In addition, the ranking of 30 evaluation tests of the state-of-the-art methods and systems is calculated that forms a benchmark for SAETS

    Mondatszám-meghatározás hatása a magyar nyelvű jogi szövegek extraktív kivonatainak minőségére

    Get PDF
    Az egyes dokumentumok tartalmi összefoglalása során a cél egy dokumentum rövidebb változatának előállítása úgy, hogy annak fő információtartalma a kivonatban megőrződjön. Cikkünkben az anonimizált bírósági határozatokhoz készült extraktív kivonatoló rendszer fejlesztése során szerzett tapasztalatokat ismertetjük, különös tekintettel a kivonatok hosszával (mondatszám) kapcsolatban felmerült kérdésekre, és az azokra adott válaszainkra. A kivonatokkal egy jogi adatbázis felhasználóinak találati listában való könnyebb orientációját kívántuk támogatni

    Mondatszám-meghatározás hatása a magyar nyelvű jogi szövegek extraktív kivonatainak minőségére

    Get PDF
    Az egyes dokumentumok tartalmi összefoglalása során a cél egy dokumentum rövidebb változatának előállítása úgy, hogy annak fő információtartalma a kivonatban megőrződjön. Cikkünkben az anonimizált bírósági határozatokhoz készült extraktív kivonatoló rendszer fejlesztése során szerzett tapasztalatokat ismertetjük, különös tekintettel a kivonatok hosszával (mondatszám) kapcsolatban felmerült kérdésekre, és az azokra adott válaszainkra. A kivonatokkal egy jogi adatbázis felhasználóinak találati listában való könnyebb orientációját kívántuk támogatni

    Security enhanced sentence similarity computing model based on convolutional neural network

    Get PDF
    Deep learning model shows great advantages in various fields. However, researchers pay attention to how to improve the accuracy of the model, while ignoring the security considerations. The problem of controlling the judgment result of deep learning model by attack examples and then affecting the system decision-making is gradually exposed. In order to improve the security of sentence similarity analysis model, we propose a convolution neural network model based on attention mechanism. First of all, the mutual information between sentences is correlated by attention weighting. Then, it is input into improved convolutional neural network. In addition, we add attack examples to the input, which is generated by the firefly algorithm. In the attack example, we replace the words in the sentence to some extent, which results in the adversarial data with great semantic change but slight sentence structure change. To a certain extent, the addition of attack example increases the ability of model to identify adversarial data and improves the robustness of the model. Experimental results show that the accuracy, recall rate and F1 value of the model are due to other baseline models.This work was supported in part by the Major Scientific and Technological Projects of China National Petroleum Corporation (CNPC) under Grant ZD2019-183-006, in part by the Shandong Provincial Natural Science Foundation, China, under Grant ZR2020MF006, in part by the Fundamental Research Funds for the Central Universities of China University of Petroleum (East China) under Grant 20CX05017A, and in part by the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant SKLNST-2021-1-17.Postprint (author's final draft

    Generación automática inteligente de resúmenes de textos con técnicas de <i>soft computing</i>

    Get PDF
    Esta tesis se ha desarrollado siguiendo las líneas de investigación que el Instituto de Investigacion en Informática LIDI (III-LIDI, Argentina) y el grupo de investigación Soft Managemement of Internet and Learning (SMILe, España) llevan a cabo de manera colaborativa. Contó con el apoyo externo de los profesores doctores Cristina Puente (Universidad Pontificia Comillas), Aurelio F. Bariviera (Universidad Rovira i Virgili) y Alejandro Sobrino (Universidad de Santiago de Compostela). Fue presentada por Augusto Villa Monte, en el marco de su doctorado en cotutela, como requisito para obtener el grado de doctor en Ciencias Informáticas por la Universidad Nacional de La Plata (UNLP, Argentina) y doctor en Tecnologías Informáticas Avanzadas por la Universidad de Castilla-La Mancha (UCLM, España).Tesis doctoral realizada en co-tutela entre la Universidad Nacional de La Plata y la Universidad de Castilla-La Mancha (España). Grado alcanzado: Doctor en Ciencias Informáticas. Directores de tesis: Laura Lanzarini (UNLP) y José Ángel Olivas Varela (UCLM). La tesis, presentada en el año 2019, obtuvo el Premio "Dr. Raúl Gallard" en el 2020.Red de Universidades con Carreras en Informátic

    Applications of Artificial Intelligence in Battling Against Covid-19: A Literature Review

    Get PDF
    © 2020 Elsevier Ltd. All rights reserved.Colloquially known as coronavirus, the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), that causes CoronaVirus Disease 2019 (COVID-19), has become a matter of grave concern for every country around the world. The rapid growth of the pandemic has wreaked havoc and prompted the need for immediate reactions to curb the effects. To manage the problems, many research in a variety of area of science have started studying the issue. Artificial Intelligence is among the area of science that has found great applications in tackling the problem in many aspects. Here, we perform an overview on the applications of AI in a variety of fields including diagnosis of the disease via different types of tests and symptoms, monitoring patients, identifying severity of a patient, processing covid-19 related imaging tests, epidemiology, pharmaceutical studies, etc. The aim of this paper is to perform a comprehensive survey on the applications of AI in battling against the difficulties the outbreak has caused. Thus we cover every way that AI approaches have been employed and to cover all the research until the writing of this paper. We try organize the works in a way that overall picture is comprehensible. Such a picture, although full of details, is very helpful in understand where AI sits in current pandemonium. We also tried to conclude the paper with ideas on how the problems can be tackled in a better way and provide some suggestions for future works.Peer reviewe

    How to Rank Answers in Text Mining

    Get PDF
    In this thesis, we mainly focus on case studies about answers. We present the methodology CEW-DTW and assess its performance about ranking quality. Based on the CEW-DTW, we improve this methodology by combining Kullback-Leibler divergence with CEW-DTW, since Kullback-Leibler divergence can check the difference of probability distributions in two sequences. However, CEW-DTW and KL-CEW-DTW do not care about the effect of noise and keywords from the viewpoint of probability distribution. Therefore, we develop a new methodology, the General Entropy, to see how probabilities of noise and keywords affect answer qualities. We firstly analyze some properties of the General Entropy, such as the value range of the General Entropy. Especially, we try to find an objective goal, which can be regarded as a standard to assess answers. Therefore, we introduce the maximum general entropy. We try to use the general entropy methodology to find an imaginary answer with the maximum entropy from the mathematical viewpoint (though this answer may not exist). This answer can also be regarded as an “ideal” answer. By comparing maximum entropy probabilities and global probabilities of noise and keywords respectively, the maximum entropy probability of noise is smaller than the global probability of noise, maximum entropy probabilities of chosen keywords are larger than global probabilities of keywords in some conditions. This allows us to determinably select the max number of keywords. We also use Amazon dataset and a small group of survey to assess the general entropy. Though these developed methodologies can analyze answer qualities, they do not incorporate the inner connections among keywords and noise. Based on the Markov transition matrix, we develop the Jump Probability Entropy. We still adapt Amazon dataset to compare maximum jump entropy probabilities and global jump probabilities of noise and keywords respectively. Finally, we give steps about how to get answers from Amazon dataset, including obtaining original answers from Amazon dataset, removing stopping words and collinearity. We compare our developed methodologies to see if these methodologies are consistent. Also, we introduce Wald–Wolfowitz runs test and compare it with developed methodologies to verify their relationships. Depending on results of comparison, we get conclusions about consistence of these methodologies and illustrate future plans

    Proceedings of the 1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020)

    Get PDF
    1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020), 29-30 August, 2020 Santiago de Compostela, SpainThe DC-ECAI 2020 provides a unique opportunity for PhD students, who are close to finishing their doctorate research, to interact with experienced researchers in the field. Senior members of the community are assigned as mentors for each group of students based on the student’s research or similarity of research interests. The DC-ECAI 2020, which is held virtually this year, allows students from all over the world to present their research and discuss their ongoing research and career plans with their mentor, to do networking with other participants, and to receive training and mentoring about career planning and career option
    corecore