Search CORE

1,010 research outputs found

Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey

Author: Chan Jonathan H.
Qi Yiyan
Song Yuanfeng
Tian Yuxing
Wang Yifei
Wei Victor Junqiu
Wong Raymond Chi-Wing
Yang Haiqin
Zhang Weixu
Publication venue
Publication date: 27/10/2023
Field of study

The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.Comment: 20 pages, 4 figures, 5 tables. Submitted to IEEE TKD

arXiv.org e-Print Archive

ProseBot: Expanding Textual Diversity through Natural Language Processing

Author: Alexandra Matilde Santos Ferreira
Publication venue
Publication date: 18/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Construction of a disaster-support dynamic knowledge chatbot

Author: Boné João Miguel Baptista
Publication venue
Publication date: 23/12/2020
Field of study

This dissertation is aimed at devising a disaster-support chatbot system with the capacity to enhance citizens and first responders’ resilience in disaster scenarios, by gathering and processing information from crowd-sensing sources, and informing its users with relevant knowledge about detected disasters, and how to deal with them. This system is composed of two artifacts that interact via a mediator graph-structured knowledge base. Our first artifact is a crowd-sourced disaster-related knowledge extraction system, which uses social media as a means to exploit humans behaving as sensors. It consists in a pipeline of natural language processing (NLP) tools, and a mixture of convolutional neural networks (CNNs) and lexicon-based models for classifying and extracting disasters. It then outputs the extracted information to the knowledge graph (KG), for presenting connected insights. The second artifact, the disaster-support chatbot, uses a state-of-the-art Dual Intent Entity Transformer (DIET) architecture to classify user intents, and makes use of several dialogue policies for managing user conversations, as well as storing relevant information to be used in further dialogue turns. To generate responses, the chatbot uses local and official disaster-related knowledge, and infers the knowledge graph for dynamic knowledge extracted by the first artifact. According to the achieved results, our devised system is on par with the state-of-the- art on Disaster Extraction systems. Both artifacts have also been validated by field specialists, who have considered them to be valuable assets in disaster-management.Esta dissertação visa a conceção de um sistema de chatbot de apoio a desastres, com a capacidade de aumentar a resiliência dos cidadãos e socorristas nestes cenários, através da recolha e processamento de informação de fontes de crowdsensing, e informar os seus utilizadores com conhecimentos relevantes sobre os desastres detetados, e como lidar com eles. Este sistema é composto por dois artefactos que interagem através de uma base de conhecimento baseada em grafos. O primeiro artefacto é um sistema de extração de conhecimento relacionado com desastres, que utiliza redes sociais como forma de explorar o conceito humans as sensors. Este artefacto consiste numa sequência de ferramentas de processamento de língua natural, e uma mistura de redes neuronais convolucionais e modelos baseados em léxicos, para classificar e extrair informação sobre desastres. A informação extraída é então passada para o grafo de conhecimento. O segundo artefacto, o chatbot de apoio a desastres, utiliza uma arquitetura Dual Intent Entity Transformer (DIET) para classificar as intenções dos utilizadores, e faz uso de várias políticas de diálogo para gerir as conversas, bem como armazenar informação chave. Para gerar respostas, o chatbot utiliza conhecimento local relacionado com desastres, e infere o grafo de conhecimento para extrair o conhecimento inserido pelo primeiro artefacto. De acordo com os resultados alcançados, o nosso sistema está ao nível do estado da arte em sistemas de extração de informação sobre desastres. Ambos os artefactos foram também validados por especialistas da área, e considerados um contributo significativo na gestão de desastres

Repositório Institucional do ISCTE-IUL

Automatic information search for countering covid-19 misinformation through semantic similarity

Author: Huertas García Álvaro
Publication venue
Publication date: 01/02/2021
Field of study

Trabajo Fin de Máster en Bioinformática y Biología ComputacionalInformation quality in social media is an increasingly important issue and misinformation problem has become even more critical in the current COVID-19 pandemic, leading people exposed to false and potentially harmful claims and rumours. Civil society organizations, such as the World Health Organization, have demanded a global call for action to promote access to health information and mitigate harm from health misinformation. Consequently, this project pursues countering the spread of COVID-19 infodemic and its potential health hazards. In this work, we give an overall view of models and methods that have been employed in the NLP field from its foundations to the latest state-of-the-art approaches. Focusing on deep learning methods, we propose applying multilingual Transformer models based on siamese networks, also called bi-encoders, combined with ensemble and PCA dimensionality reduction techniques. The goal is to counter COVID-19 misinformation by analyzing the semantic similarity between a claim and tweets from a collection gathered from official fact-checkers verified by the International Fact-Checking Network of the Poynter Institute. It is factual that the number of Internet users increases every year and the language spoken determines access to information online. For this reason, we give a special effort in the application of multilingual models to tackle misinformation across the globe. Regarding semantic similarity, we firstly evaluate these multilingual ensemble models and improve the result in the STS-Benchmark compared to monolingual and single models. Secondly, we enhance the interpretability of the models’ performance through the SentEval toolkit. Lastly, we compare these models’ performance against biomedical models in TREC-COVID task round 1 using the BM25 Okapi ranking method as the baseline. Moreover, we are interested in understanding the ins and outs of misinformation. For that purpose, we extend interpretability using machine learning and deep learning approaches for sentiment analysis and topic modelling. Finally, we developed a dashboard to ease visualization of the results. In our view, the results obtained in this project constitute an excellent initial step toward incorporating multilingualism and will assist researchers and people in countering COVID-19 misinformation

Biblos-e Archivo

Translation Alignment Applied to Historical Languages: methods, evaluation, applications, and visualization

Author: Yousef Tariq
Publication venue
Publication date: 17/07/2023
Field of study

Translation alignment is an essential task in Digital Humanities and Natural Language Processing, and it aims to link words/phrases in the source text with their translation equivalents in the translation. In addition to its importance in teaching and learning historical languages, translation alignment builds bridges between ancient and modern languages through which various linguistics annotations can be transferred. This thesis focuses on word-level translation alignment applied to historical languages in general and Ancient Greek and Latin in particular. As the title indicates, the thesis addresses four interdisciplinary aspects of translation alignment. The starting point was developing Ugarit, an interactive annotation tool to perform manual alignment aiming to gather training data to train an automatic alignment model. This effort resulted in more than 190k accurate translation pairs that I used for supervised training later. Ugarit has been used by many researchers and scholars also in the classroom at several institutions for teaching and learning ancient languages, which resulted in a large, diverse crowd-sourced aligned parallel corpus allowing us to conduct experiments and qualitative analysis to detect recurring patterns in annotators’ alignment practice and the generated translation pairs. Further, I employed the recent advances in NLP and language modeling to develop an automatic alignment model for historical low-resourced languages, experimenting with various training objectives and proposing a training strategy for historical languages that combines supervised and unsupervised training with mono- and multilingual texts. Then, I integrated this alignment model into other development workflows to project cross-lingual annotations and induce bilingual dictionaries from parallel corpora. Evaluation is essential to assess the quality of any model. To ensure employing the best practice, I reviewed the current evaluation procedure, defined its limitations, and proposed two new evaluation metrics. Moreover, I introduced a visual analytics framework to explore and inspect alignment gold standard datasets and support quantitative and qualitative evaluation of translation alignment models. Besides, I designed and implemented visual analytics tools and reading environments for parallel texts and proposed various visualization approaches to support different alignment-related tasks employing the latest advances in information visualization and best practice. Overall, this thesis presents a comprehensive study that includes manual and automatic alignment techniques, evaluation methods and visual analytics tools that aim to advance the field of translation alignment for historical languages

Qucosa - Publikationsserver der Universität Leipzig

A Survey on Legal Question Answering Systems

Author: Martinez-Gil Jorge
Publication venue
Publication date: 12/10/2021
Field of study

Many legal professionals think that the explosion of information about local, regional, national, and international legislation makes their practice more costly, time-consuming, and even error-prone. The two main reasons for this are that most legislation is usually unstructured, and the tremendous amount and pace with which laws are released causes information overload in their daily tasks. In the case of the legal domain, the research community agrees that a system allowing to generate automatic responses to legal questions could substantially impact many practical implications in daily activities. The degree of usefulness is such that even a semi-automatic solution could significantly help to reduce the workload to be faced. This is mainly because a Question Answering system could be able to automatically process a massive amount of legal resources to answer a question or doubt in seconds, which means that it could save resources in the form of effort, money, and time to many professionals in the legal sector. In this work, we quantitatively and qualitatively survey the solutions that currently exist to meet this challenge.Comment: 57 pages, 1 figure, 10 table

arXiv.org e-Print Archive

Web interaction environments : characterising Web accessibility at the large

Author: Lopes Rui Miguel do Nascimento Dias, 1980-
Publication venue
Publication date: 01/01/2011
Field of study

Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2012Accessibility quality on the Web is essential for providing a good Web experience to people with disabilities. The existence of virtual ramps aid these users grasping and interacting withWeb content, just like the experience of those who are unimpaired. However, more often than not, Web pages impose accessibility barriers, usually centred on the unavailability of tailored content to specific perceptual abilities (e.g., textual description of images, enabling grasping information with assistive technologies), as well as on proper HTML structural elements that adequate the semantics of a Web page. When evaluating the accessibility quality of Web pages, the resulting analysis is often focused on a small sample set (e.g., a single Web page or a selection of pages from a Web site). While this kind of analysis gets the gist of accessibility quality, it misses the big picture on the overall accessibility quality of the Web. This thesis addresses the challenge of observing accessibility phenomena on the Web, through the experimental evaluation of large collections of Web pages. This resulted on new findings about the accessibility quality of the Web, such as its correlation with HTML element count, and the erroneous perception of accessibility quality by developers. Small-scale experiments have been verified also at large scale, such as the correlation between the usage of HTML templates and accessibility quality. Based on the challenges raised by the experimental evaluation, this thesis proposes a novel approach for large scale Web accessibility evaluation based on Linked Data, as well as the establishment of metrics to assess the truthfulness and coverage of automated evaluation methods.A qualidade da acessibilidade é um factor crucial para as pessoas com deficiências terem uma boa experiência de interacção com a Web.A qualidade da acessibilidade é um factor crucial para as pessoas com deficiências terem uma boa experiência de interacção com a Web. A existência de rampas virtuais ajuda estas pessoas a compreender e interagir com conteúdos Web, a par do que o utilizador comum já experiencia. Porém, a maioria das páginas Web ainda contêm barreiras à acessibilidade. Estas barreiras centram-se normalmente na indisponibilidade de conteúdos perceptíveis por diferentes tipos de capacidades (e.g., descrições textuais de imagens), bem como no uso incorrecto de elementos HTML de acordo com a semântica de uma página Web. Nos dias de hoje, a avaliação da qualidade de acessibilidade de páginas Web é ainda efectuada em pequena escala (e.g., uma página Web ou, no melhor caso, um conjunto de páginas representativas de um sítio Web). Apesar deste tipo de avaliações resultarem na compreensão de alguns fenómenos do estado da acessibilidade na Web, ainda não se sabe qual o seu impacto em larga escala. Esta tese discute os principais desafios na observação da acessibilidade da Web, tendo por base um conjunto de avaliações experimentais de colecções de grande dimensão de páginas Web. Destes estudos destacam-se as seguintes contribuições e resultados:a diferença drástica na interpretação dos avisos resultantes de avaliações de acessibilidade Web: um dos resultados principais da avaliação experimental em larga escala destaca a diferença na interpretação dos avisos (warnings) da aplicação de técnicas da norma WCAG, onde a interpretação optimista (i.e., a visão da maioria dos criadores de páginas Web) se distancia amplamente da interpretação conservadora (onde os avisos são interpretados como erros); a correlação entre a qualidade da acessibilidade de uma página Web e a sua complexidade: este mesmo estudo de larga escala revelou uma correlação entre a complexidade de uma página Web (no que diz respeito ao número de elementos HTML que contém) e a qualidade da acessibilidade. Quanto menor a complexidade de uma página Web, mais certa se torna a alta qualidade da acessibilidade dessa página; o benefício do uso de templates e sistemas de gestão de conteúdos na melhoria da acessibilidade de páginas Web: em ambos os estudos experimentais de acessibilidade foi detectada uma correlação entre a qualidade de acessibilidade das páginas Web e o uso de templates e sistemas de gestão de conteúdo. Esta propriedade foi verificada quer em pequena escala (sobre uma colecção de páginas Web da Wikipedia), quer em larga escala; o incumprimento das regras mais elementares e mais conhecidas da acessibilidade: estes estudos experimentais permitiram também verificar que, apesar de toda a envagelização e educação sobre as questões de acessibilidade na Web, a maioria das regras de acessibilidade são incessantemente quebradas pela maioria das páginas Web.Esta problemática verifica-se, em particular, nas regras de cumprimento de acessibilidade mais conhecidas, tal como por exemplo a disponibilidade de textos alternativos a conteúdos multimédia. Com base nestas experiências e resultados, esta tese apresenta um novo modelo de estudo da acessibilidade na Web, tendo por base o ciclo de estudos da Web em larga escala. Deste modelo resultaram as seguintes contribuições: um modelo para a avaliação distribuída de acessibilidade Web, baseado em propriedades tecnológicas e topológicas: foi concebido um modelo de avaliação de acessibilidade Web que permite a concepção de sistemas de avaliação com base em propriedades tecnológicas e topológicas. Este modelo possibilita, entre outras características, o estudo da cobertura de plataformas e avaliadores de acessibilidade, bem como da sua aplicação em larga escala; uma extensão às linguagens e modelos EARL e Linked Data, bem como um conjunto de definições para extrair informação destes: este modelo de avaliação de acessibilidade Web foi sustentado também pela sua concretização em linguagens e modelos já existentes para o estudo de acessibilidade (EARL) e da Web em larga escala (Linked Data), permitindo assim a sua validação; definição dos limites da avaliação de acessibilidade Web: por fim, este modelo de avaliação de acessibilidade permitiu também delinear uma metodologia de meta-avaliação da acessibilidade, na qual se poderão enquadrar as propriedades dos avaliadores de acessibilidade existentes. Todas estas contribuições resultaram também num conjunto de publicações científicas, das quais se destacam: Rui Lopes and Luís Carriço, A Web Science Perspective of Web Accessibility, in submission for the ACM Transactions on Accessible Computing (TACCESS), ACM, 2011; Rui Lopes and Luís Carriço, Macroscopic Characterisations of Web Accessibility, New Review of Hypermedia and Multimedia – Special Issue on Web Accessibility. Taylor & Francis, 2010; Rui Lopes, Karel Van Isacker and Luís Carriço, Redefining Assumptions: Accessibility and Its Stakeholders, The 12th International Conference on Computers Helping People with Special Needs (ICCHP), Vienna, Austria, 14-16 July 2010; Rui Lopes, Daniel Gomes and Luís Carriço, Web Not For All: A Large Scale Study of Web Accessibility, W4A: 7th ACM International Cross-Disciplinary Conference on Web Accessibility, Raleigh, North Carolina, USA, 26-27 April 2010; Rui Lopes, Konstantinos Votis, Luís Carriço, Dimitrios Tzovaras, and Spiridon Likothanassis, The Semantics of Personalised Web Accessibility Assessment, 25th Annual ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, 22-26 March, 2010 Konstantinos Votis, Rui Lopes, Dimitrios Tzovaras, Luís Carriço and Spiridon Likothanassis, A Semantic Accessibility Assessment Environment for Design and Development for the Web, HCI International 2009 (HCII 2009), San Diego, California, USA, 19-24 July 2009 Rui Lopes and Luís Carriço, On the Gap Between Automated and In-Vivo Evaluations of Web Accessibility, HCI International 2009 (HCII 2009), San Diego, California, USA, 19-24 July 2009; Rui Lopes, Konstantinos Votis, Luís Carriço, Spiridon Likothanassis and Dimitrios Tzovaras, Towards the Universal Semantic Assessment of Accessibility, 24th Annual ACM Symposium on Applied Computing (SAC),Waikiki Beach, Honolulu, Hawaii, USA, 8-12 March 2009; Rui Lopes and Luís Carriço, Querying Web Accessibility Knowledge from Web Graphs, Handbook of Research on Social Dimensions of Semantic Technologies, IGI Global, 2009; Rui Lopes, Konstantinos Votis, Luís Carriço, Spiridon Likothanassis and Dimitrios Tzovaras, A Service Oriented Ontological Framework for the Semantic Validation of Web Accessibility, Handbook of Research on Social Dimensions of Semantic Technologies, IGI Global, 2009; Rui Lopes and Luís Carriço, On the Credibility of Wikipedia: an Accessibility Perspective, Second Workshop on Information Credibility on the Web (WICOW 2008), Napa Valley, California, USA, 2008; Rui Lopes, Luís Carriço, A Model for Universal Usability on the Web, WSW 2008: Web Science Workshop, Beijing, China, 22 April 2008; Rui Lopes, Luís Carriço, The Impact of Accessibility Assessment in Macro Scale Universal Usability Studies of the Web, W4A: 5th ACM International Cross-Disciplinary Conference on Web Accessibility, Beijing, China, 21-22 April 2008. Best paper award; Rui Lopes, Luís Carriço, Modelling Web Accessibility for Rich Document Production, Journal on Access Services 6 (1-2), Routledge, Taylor & Francis Group, 2009; Rui Lopes, Luís Carriço, Leveraging Rich Accessible Documents on the Web, W4A: 4th ACM International Cross-Disciplinary Conference on Web Accessibility, Banff, Canada, 7-8 May 2007.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/29150/2006

Universidade de Lisboa: Repositório.UL

Summarization from Medical Documents: A Survey

Author: Alfred
Barzilay
Becher
Busemann
Cios Krzysztof
Dalianis
DeJong
Ebadollahi
Edmundson
Elhadad
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Futrelle
Gaizauskas
Hersh
Johnson
Kan
Kan
Karkaletsis
Klavans
Lenci
Luhn
Mani
Mani
Mann
Marcu
McKeown
McKeown
Merlino
Merlino
Oepen
Paice
Paice
Panagiotis Stamatopoulos
Pierrakos
Radev
Radev
Reiter
Reiter
Saggion
Salton
Sparck-Jones
Stergos Afantenos
Vangelis Karkaletsis
Woodall
Xenarios
Xingquan
Zabih
Zechner
Publication venue: 'Elsevier BV'
Publication date: 13/04/2005
Field of study

Objective: The aim of this paper is to survey the recent work in medical documents summarization. Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc. Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics. Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applicationsComment: 21 pages, 4 table

arXiv.org e-Print Archive

Crossref