7,107 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Multi-Task Learning Improves Disease Models from Web Search
We investigate the utility of multi-task learning to disease surveillance using Web search data. Our motivation is two-fold. Firstly, we assess whether concurrently training models for various geographies - inside a country or across different countries - can improve accuracy. We also test the ability of such models to assist health systems that are producing sporadic disease surveillance reports that reduce the quantity of available training data. We explore both linear and nonlinear models, specifically a multi-task expansion of elastic net and a multi-task Gaussian Process, and compare them to their respective single task formulations. We use influenza-like illness as a case study and conduct experiments on the United States (US) as well as England, where both health and Google search data were obtained. Our empirical results indicate that multi-task learning improves regional as well as national models for the US. The percentage of improvement on mean absolute error increases up to 14.8% as the historical training data is reduced from 5 to 1 year(s), illustrating that accurate models can be obtained, even by training on relatively short time intervals. Furthermore, in simulated scenarios, where only a few health reports (training data) are available, we show that multi-task learning helps to maintain a stable performance across all the affected locations. Finally, we present results from a cross-country experiment, where data from the US improves the estimates for England. As the historical training data for England is reduced, the benefits of multi-task learning increase, reducing mean absolute error by up to 40%
Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information
Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which
limits their potential performance. Knowledge Graph Completion (KGC) techniques
aim to address this issue. However, traditional KGC methods are computationally
intensive and impractical for large-scale KGs, necessitating the learning of
dense node embeddings and computing pairwise distances. Generative
transformer-based language models (e.g., T5 and recent KGT5) offer a promising
solution as they can predict the tail nodes directly. In this study, we propose
to include node neighborhoods as additional information to improve KGC methods
based on language models. We examine the effects of this imputation and show
that, on both inductive and transductive Wikidata subsets, our method
outperforms KGT5 and conventional KGC approaches. We also provide an extensive
analysis of the impact of neighborhood on model prediction and show its
importance. Furthermore, we point the way to significantly improve KGC through
more effective neighborhood selection.Comment: Accepted to Findings of the Association for Computational
Linguistics: EMNLP 202
Neural Networks forBuilding Semantic Models and Knowledge Graphs
1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp
Technology assessment of advanced automation for space missions
Six general classes of technology requirements derived during the mission definition phase of the study were identified as having maximum importance and urgency, including autonomous world model based information systems, learning and hypothesis formation, natural language and other man-machine communication, space manufacturing, teleoperators and robot systems, and computer science and technology
- …