271 research outputs found
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey
The emergence of natural language processing has revolutionized the way users
interact with tabular data, enabling a shift from traditional query languages
and manual plotting to more intuitive, language-based interfaces. The rise of
large language models (LLMs) such as ChatGPT and its successors has further
advanced this field, opening new avenues for natural language processing
techniques. This survey presents a comprehensive overview of natural language
interfaces for tabular data querying and visualization, which allow users to
interact with data using natural language queries. We introduce the fundamental
concepts and techniques underlying these interfaces with a particular emphasis
on semantic parsing, the key technology facilitating the translation from
natural language to SQL queries or data visualization commands. We then delve
into the recent advancements in Text-to-SQL and Text-to-Vis problems from the
perspectives of datasets, methodologies, metrics, and system designs. This
includes a deep dive into the influence of LLMs, highlighting their strengths,
limitations, and potential for future improvements. Through this survey, we aim
to provide a roadmap for researchers and practitioners interested in developing
and applying natural language interfaces for data interaction in the era of
large language models.Comment: 20 pages, 4 figures, 5 tables. Submitted to IEEE TKD
Translating Natural Language Queries to SQL Using the T5 Model
This paper presents the development process of a natural language to SQL
model using the T5 model as the basis. The models, developed in August 2022 for
an online transaction processing system and a data warehouse, have a 73\% and
84\% exact match accuracy respectively. These models, in conjunction with other
work completed in the research project, were implemented for several companies
and used successfully on a daily basis. The approach used in the model
development could be implemented in a similar fashion for other database
environments and with a more powerful pre-trained language model
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System
Exploring data is crucial in data analysis, as it helps users understand and
interpret the data more effectively. However, performing effective data
exploration requires in-depth knowledge of the dataset and expertise in data
analysis techniques. Not being familiar with either can create obstacles that
make the process time-consuming and overwhelming for data analysts. To address
this issue, we introduce InsightPilot, an LLM (Large Language Model)-based,
automated data exploration system designed to simplify the data exploration
process. InsightPilot automatically selects appropriate analysis intents, such
as understanding, summarizing, and explaining. Then, these analysis intents are
concretized by issuing corresponding intentional queries (IQueries) to create a
meaningful and coherent exploration sequence. In brief, an IQuery is an
abstraction and automation of data analysis operations, which mimics the
approach of data analysts and simplifies the exploration process for users. By
employing an LLM to iteratively collaborate with a state-of-the-art insight
engine via IQueries, InsightPilot is effective in analyzing real-world
datasets, enabling users to gain valuable insights through natural language
inquiries. We demonstrate the effectiveness of InsightPilot in a case study,
showing how it can help users gain valuable insights from their datasets
Toward enhancement of deep learning techniques using fuzzy logic: a survey
Deep learning has emerged recently as a type of artificial intelligence (AI) and machine learning (ML), it usually imitates the human way in gaining a particular knowledge type. Deep learning is considered an essential data science element, which comprises predictive modeling and statistics. Deep learning makes the processes of collecting, interpreting, and analyzing big data easier and faster. Deep neural networks are kind of ML models, where the non-linear processing units are layered for the purpose of extracting particular features from the inputs. Actually, the training process of similar networks is very expensive and it also depends on the used optimization method, hence optimal results may not be provided. The techniques of deep learning are also vulnerable to data noise. For these reasons, fuzzy systems are used to improve the performance of deep learning algorithms, especially in combination with neural networks. Fuzzy systems are used to improve the representation accuracy of deep learning models. This survey paper reviews some of the deep learning based fuzzy logic models and techniques that were presented and proposed in the previous studies, where fuzzy logic is used to improve deep learning performance. The approaches are divided into two categories based on how both of the samples are combined. Furthermore, the models' practicality in the actual world is revealed
DevSecOps for web applications: a case study
O paradigma DevOps permite agilizar o processo de entrega de software. Visa reduzir as barreiras existentes entre as equipas responsáveis pelo desenvolvimento e as equipas de operação. Com recurso a estruturas de pipelines o processo de desenvolvimento de software é conduzido através de diversas etapas até à sua entrega. Estas estruturas permitem automatizar várias tarefas de forma a evitar erros humanos, liberta os intervenientes de tarefas morosas e repetitivas. Mais previsível e com maior exatidão o tempo necessário para as entregas de software é encurtado e mais frequente. Dadas estas vantagens o paradigma tem muita adoção por parte da indústria de desenvolvimento, no entanto, o aumento do volume das entregas acarreta desafios, nomeadamente no que diz respeito à segurança das soluções desenvolvidas. Negligenciar os fatores de segurança pode levar a organização a acarretar com custos financeiros e denegrir a sua reputação. A integração entre o paradigma DevOps e segurança originou o paradigma designado por DevSecOps. Este visa a adoção pelo processo de desenvolvimento de ações de segurança, que após inseridas nas diversas fases de entrega, permitirão analisar e validar a solução, de forma a assegurar a sua consistência. A arquitetura das aplicações web é por sua natureza acessível, o que resulta à sua maior exposição. Este projeto apresenta uma lista de problemas de segurança encontrados durante a pesquisa efetuada no domínio das aplicações web, analisa quais as ferramentas para a deteção e resolução destes problemas, quais as suas implicações no tempo de entrega de software e a sua eficiência na deteção de falhas. Concluí com uma implementação de um fluxo de execução utilizando o paradigma DevSecOps, para compreender a sua contribuição no melhoramento da qualidade do software.The DevOps paradigm streamlines the software delivery process, reducing the barriers between the teams involved in development and operations. It relies on pipelines to structure the development process until delivered. These structures enable the automation of many tasks, avoiding human error and freeing the team elements from doing slow and repeated tasks. More predictable and accurate development allows teams to reduce the time required for software deliveries and make them more frequent. Despite the wide adoption of the paradigm, the increase in deliveries cannot compromise the security aspects of the developed solutions. Companies may incur financial costs and tarnish their reputations by neglecting security factors. Joining security and DevOps originate a new paradigm, DevSecOps. It aims to bring more quality compliance and avoid risk by adding security considerations to discover all potential security defects before delivery. Web applications architecture, by their accessibility intent, has a vast exposed area. This project presents a list of common security issues found during the research performed in the web application security domain analyses, what tools are used to detect and solve these problems, which time implications they cause in the overall software delivery and their effectiveness in defect detection. It concludes with implementing a pipeline using the DevSecOps paradigm to establish its viability in improving software quality
Recommended from our members
Knowledge Discovery and Data Mining (KDDM) survey report.
The large number of government and industry activities supporting the Unit of Action (UA), with attendant documents, reports and briefings, can overwhelm decision-makers with an overabundance of information that hampers the ability to make quick decisions often resulting in a form of gridlock. In particular, the large and rapidly increasing amounts of data and data formats stored on UA Advanced Collaborative Environment (ACE) servers has led to the realization that it has become impractical and even impossible to perform manual analysis leading to timely decisions. UA Program Management (PM UA) has recognized the need to implement a Decision Support System (DSS) on UA ACE. The objective of this document is to research the commercial Knowledge Discovery and Data Mining (KDDM) market and publish the results in a survey. Furthermore, a ranking mechanism based on UA ACE-specific criteria has been developed and applied to a representative set of commercially available KDDM solutions. In addition, an overview of four R&D areas identified as critical to the implementation of DSS on ACE is provided. Finally, a comprehensive database containing detailed information on surveyed KDDM tools has been developed and is available upon customer request
Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering
This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm
Enhanced Prediction of Network Attacks Using Incomplete Data
For years, intrusion detection has been considered a key component of many organizations’ network defense capabilities. Although a number of approaches to intrusion detection have been tried, few have been capable of providing security personnel responsible for the protection of a network with sufficient information to make adjustments and respond to attacks in real-time. Because intrusion detection systems rarely have complete information, false negatives and false positives are extremely common, and thus valuable resources are wasted responding to irrelevant events. In order to provide better actionable information for security personnel, a mechanism for quantifying the confidence level in predictions is needed. This work presents an approach which seeks to combine a primary prediction model with a novel secondary confidence level model which provides a measurement of the confidence in a given attack prediction being made. The ability to accurately identify an attack and quantify the confidence level in the prediction could serve as the basis for a new generation of intrusion detection devices, devices that provide earlier and better alerts for administrators and allow more proactive response to events as they are occurring
Agriculture 4.0 and beyond: Evaluating cyber threat intelligence sources and techniques in smart farming ecosystems
The digitisation of agriculture, integral to Agriculture 4.0, has brought significant benefits while simultaneously escalating cybersecurity risks. With the rapid adoption of smart farming technologies and infrastructure, the agricultural sector has become an attractive target for cyberattacks. This paper presents a systematic literature review that assesses the applicability of existing cyber threat intelligence (CTI) techniques within smart farming infrastructures (SFIs). We develop a comprehensive taxonomy of CTI techniques and sources, specifically tailored to the SFI context, addressing the unique cyber threat challenges in this domain. A crucial finding of our review is the identified need for a virtual Chief Information Security Officer (vCISO) in smart agriculture. While the concept of a vCISO is not yet established in the agricultural sector, our study highlights its potential significance. The implementation of a vCISO could play a pivotal role in enhancing cybersecurity measures by offering strategic guidance, developing robust security protocols, and facilitating real-time threat analysis and response strategies. This approach is critical for safeguarding the food supply chain against the evolving landscape of cyber threats. Our research underscores the importance of integrating a vCISO framework into smart farming practices as a vital step towards strengthening cybersecurity. This is essential for protecting the agriculture sector in the era of digital transformation, ensuring the resilience and sustainability of the food supply chain against emerging cyber risks
- …