43 research outputs found
Ground Truth Spanish Automatic Extractive Text Summarization Bounds
The textual information has accelerated growth in the most spoken languages by native Internet users, such as Chinese, Spanish, English, Arabic, Hindi, Portuguese, Bengali, Russian, among others. It is necessary to innovate the methods of Automatic Text Summarization (ATS) that can extract essential information without reading the entire text. The most competent methods are Extractive ATS (EATS) that extract essential parts of the document (sentences, phrases, or paragraphs) to compose a summary. During the last 60 years of research of EATS, the creation of standard corpus with human-generated summaries and evaluation methods which are highly correlated with human judgments help to increase the number of new state-of-the-art methods. However, these methods are mainly supported for the English language, leaving aside other equally important languages such as Spanish, which is the second most spoken language by natives and the third most used on the Internet. A standard corpus for Spanish EATS (SAETS) is created to evaluate the state-of-the-art methods and systems for the Spanish language. The main contribution consists of a proposal for configuration and evaluation of 5 state-ofthe-art methods, five systems and four heuristics using three evaluation methods (ROUGE, ROUGE-C, and Jensen-Shannon divergence). It is the first time that Jensen-Shannon divergence is used to evaluate AETS. In this paper the ground truth bounds for the Spanish language are presented, which are the heuristics baseline:first, baseline:random, topline and concordance. In addition, the ranking of 30 evaluation tests of the state-of-the-art methods and systems is calculated that forms a benchmark for SAETS
Mondatszám-meghatározás hatása a magyar nyelvű jogi szövegek extraktív kivonatainak minőségére
Az egyes dokumentumok tartalmi összefoglalása során a cél egy dokumentum rövidebb változatának előállítása úgy, hogy annak fő információtartalma a kivonatban megőrződjön. Cikkünkben az anonimizált bírósági határozatokhoz készült extraktív kivonatoló rendszer fejlesztése során szerzett tapasztalatokat ismertetjük, különös tekintettel a kivonatok hosszával (mondatszám) kapcsolatban felmerült kérdésekre, és az azokra adott válaszainkra. A kivonatokkal egy jogi adatbázis felhasználóinak találati listában való könnyebb orientációját kívántuk támogatni
Mondatszám-meghatározás hatása a magyar nyelvű jogi szövegek extraktív kivonatainak minőségére
Az egyes dokumentumok tartalmi összefoglalása során a cél egy dokumentum rövidebb változatának előállítása úgy, hogy annak fő információtartalma a kivonatban megőrződjön. Cikkünkben az anonimizált bírósági határozatokhoz készült extraktív kivonatoló rendszer fejlesztése során szerzett tapasztalatokat ismertetjük, különös tekintettel a kivonatok hosszával (mondatszám) kapcsolatban felmerült kérdésekre, és az azokra adott válaszainkra. A kivonatokkal egy jogi adatbázis felhasználóinak találati listában való könnyebb orientációját kívántuk támogatni
Security enhanced sentence similarity computing model based on convolutional neural network
Deep learning model shows great advantages in various fields. However, researchers pay attention to how to improve the accuracy of the model, while ignoring the security considerations. The problem of controlling the judgment result of deep learning model by attack examples and then affecting the system decision-making is gradually exposed. In order to improve the security of sentence similarity analysis model, we propose a convolution neural network model based on attention mechanism. First of all, the mutual information between sentences is correlated by attention weighting. Then, it is input into improved convolutional neural network. In addition, we add attack examples to the input, which is generated by the firefly algorithm. In the attack example, we replace the words in the sentence to some extent, which results in the adversarial data with great semantic change but slight sentence structure change. To a certain extent, the addition of attack example increases the ability of model to identify adversarial data and improves the robustness of the model. Experimental results show that the accuracy, recall rate and F1 value of the model are due to other baseline models.This work was supported in part by the Major Scientific and Technological Projects of China National Petroleum Corporation (CNPC) under Grant ZD2019-183-006, in part by the Shandong Provincial Natural Science Foundation, China, under Grant ZR2020MF006, in part by the Fundamental Research Funds for the Central Universities of China University of Petroleum (East China) under Grant 20CX05017A, and in part by the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) under Grant SKLNST-2021-1-17.Postprint (author's final draft
Generación automática inteligente de resúmenes de textos con técnicas de <i>soft computing</i>
Esta tesis se ha desarrollado siguiendo las líneas de investigación que el Instituto de Investigacion en Informática LIDI (III-LIDI, Argentina) y el grupo de investigación Soft Managemement of Internet and Learning (SMILe, España) llevan a cabo de manera colaborativa. Contó con el apoyo externo de los profesores doctores Cristina Puente (Universidad Pontificia Comillas), Aurelio F. Bariviera (Universidad Rovira i Virgili) y Alejandro Sobrino (Universidad de Santiago de Compostela). Fue presentada por Augusto Villa Monte, en el marco de su doctorado en cotutela, como requisito para obtener el grado de doctor en Ciencias Informáticas por la Universidad Nacional de La Plata (UNLP, Argentina) y doctor en Tecnologías Informáticas Avanzadas por la Universidad de Castilla-La Mancha (UCLM, España).Tesis doctoral realizada en co-tutela entre la Universidad Nacional de La Plata y la Universidad de Castilla-La Mancha (España). Grado alcanzado: Doctor en Ciencias Informáticas. Directores de tesis: Laura Lanzarini (UNLP) y José Ángel Olivas Varela (UCLM). La tesis, presentada en el año 2019, obtuvo el Premio "Dr. Raúl Gallard" en el 2020.Red de Universidades con Carreras en Informátic
Applications of Artificial Intelligence in Battling Against Covid-19: A Literature Review
© 2020 Elsevier Ltd. All rights reserved.Colloquially known as coronavirus, the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), that causes CoronaVirus Disease 2019 (COVID-19), has become a matter of grave concern for every country around the world. The rapid growth of the pandemic has wreaked havoc and prompted the need for immediate reactions to curb the effects. To manage the problems, many research in a variety of area of science have started studying the issue. Artificial Intelligence is among the area of science that has found great applications in tackling the problem in many aspects. Here, we perform an overview on the applications of AI in a variety of fields including diagnosis of the disease via different types of tests and symptoms, monitoring patients, identifying severity of a patient, processing covid-19 related imaging tests, epidemiology, pharmaceutical studies, etc. The aim of this paper is to perform a comprehensive survey on the applications of AI in battling against the difficulties the outbreak has caused. Thus we cover every way that AI approaches have been employed and to cover all the research until the writing of this paper. We try organize the works in a way that overall picture is comprehensible. Such a picture, although full of details, is very helpful in understand where AI sits in current pandemonium. We also tried to conclude the paper with ideas on how the problems can be tackled in a better way and provide some suggestions for future works.Peer reviewe
How to Rank Answers in Text Mining
In this thesis, we mainly focus on case studies about answers. We present the methodology CEW-DTW and assess its performance about ranking quality. Based on the CEW-DTW, we improve this methodology by combining Kullback-Leibler divergence with CEW-DTW, since Kullback-Leibler divergence can check the difference of probability distributions in two sequences.
However, CEW-DTW and KL-CEW-DTW do not care about the effect of noise and keywords from the viewpoint of probability distribution. Therefore, we develop a new methodology, the General Entropy, to see how probabilities of noise and keywords affect answer qualities. We firstly analyze some properties of the General Entropy, such as the value range of the General Entropy. Especially, we try to find an objective goal, which can be regarded as a standard to assess answers. Therefore, we introduce the maximum general entropy. We try to use the general entropy methodology to find an imaginary answer with the maximum entropy from the mathematical viewpoint (though this answer may not exist). This answer can also be regarded as an “ideal” answer. By comparing maximum entropy probabilities and global probabilities of noise and keywords respectively, the maximum entropy probability of noise is smaller than the global probability of noise, maximum entropy probabilities of chosen keywords are larger than global probabilities of keywords in some conditions. This allows us to determinably select the max number of keywords. We also use Amazon dataset and a small group of survey to assess the general entropy.
Though these developed methodologies can analyze answer qualities, they do not incorporate the inner connections among keywords and noise. Based on the Markov transition matrix, we develop the Jump Probability Entropy. We still adapt Amazon dataset to compare maximum jump entropy probabilities and global jump probabilities of noise and keywords respectively.
Finally, we give steps about how to get answers from Amazon dataset, including obtaining original answers from Amazon dataset, removing stopping words and collinearity. We compare our developed methodologies to see if these methodologies are consistent. Also, we introduce Wald–Wolfowitz runs test and compare it with developed methodologies to verify their relationships. Depending on results of comparison, we get conclusions about consistence of these methodologies and illustrate future plans
Proceedings of the 1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020)
1st Doctoral Consortium at the European Conference on
Artificial Intelligence (DC-ECAI 2020), 29-30 August, 2020
Santiago de Compostela, SpainThe DC-ECAI 2020 provides a unique opportunity for PhD students, who are close to finishing their doctorate research, to interact with experienced researchers in the field. Senior members of the community are assigned as mentors for each group of students based on the student’s research or similarity of research interests. The DC-ECAI 2020, which is held virtually this year, allows students from all over the world to present their research and discuss their ongoing research and career plans with their mentor, to do networking with other participants, and to receive training and mentoring about career planning and career option
Recommended from our members
An Evaluation of Computational Methods to Support the Clinical Management of Chronic Disease Populations
Innovative primary care models that deliver comprehensive primary care to address medical and social needs are an established means of improving health outcomes and reducing healthcare costs among persons living with chronic disease. Care management is one such approach that requires providers to monitor their respective patient panels and intervene on patients requiring care. Health information technology (IT) has been established as a critical component of care management and similar care models. While there exist a plethora of health IT systems for facilitating primary care, there is limited research on their ability to support care management and its emphasis on monitoring panels of patients with complex needs. In this dissertation, I advance the understanding of how computational methods can better support clinicians delivering care management, and use the management of human immunodeficiency virus (HIV) as an example scenario of use.
The research described herein is segmented into 3 aims; the first was to understand the processes and barriers associated with care management and assess whether existing IT can support clinicians in this domain. The second and third aim focused on informing potential solutions to the technological shortcomings identified in the first aim. In the studies of the first aim, I conducted interviews and observations in two HIV primary care programs and analyzed the data generated to create a conceptual framework of population monitoring and identify challenges faced by clinicians in delivering care management. In the studies of the second aim, I used computational methods to advance the science of extracting from the patient record social and behavioral determinants of health (SBDH), which are not easily accessible to clinicians and represent an important barrier to care management. In the third aim, I conducted a controlled experimental evaluation to assess whether data visualization can improve clinician’s ability to maintain awareness of their patient panels