387 research outputs found
Knowledge extraction from unstructured data
Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models
Recommended from our members
Resorting to Context-Aware Background Knowledge for Unveiling Semantically Related Social Media Posts
Social media networks have become a prime source for sharing news, opinions, and research accomplishments in various domains, and hundreds of millions of posts are announced daily. Given this wealth of information in social media, finding related announcements has become a relevant task, particularly in trending news (e.g., COVID-19 or lung cancer). To facilitate the search of connected posts, social networks enable users to annotate their posts, e.g., with hashtags in tweets. Albeit effective, an annotation-based search is limited because results will only include the posts that share the same annotations. This paper focuses on retrieving context-related posts based on a specific topic, and presents PINYON, a knowledge-driven framework, that retrieves associated posts effectively. PINYON implements a two-fold pipeline. First, it encodes, in a graph, a CORPUS of posts and an input post; posts are annotated with entities for existing knowledge graphs and connected based on the similarity of their entities. In a decoding phase, the encoded graph is used to discover communities of related posts. We cast this problem into the Vertex Coloring Problem, where communities of similar posts include the posts annotated with entities colored with the same colors. Built on results reported in the graph theory, PINYON implements the decoding phase guided by a heuristic-based method that determines relatedness among posts based on contextual knowledge, and efficiently groups the most similar posts in the same communities. PINYON is empirically evaluated on various datasets and compared with state-of-the-art implementations of the decoding phase. The quality of the generated communities is also analyzed based on multiple metrics. The observed outcomes indicate that PINYON accurately identifies semantically related posts in different contexts. Moreover, the reported results put in perspective the impact of known properties about the optimality of existing heuristics for vertex graph coloring and their implications on PINYON scalability
Online Social Networks: Measurements, Analysis and Solutions for Mining Challenges
In the last decade, online social networks showed enormous growth. With the rise
of these networks and the consequent availability of wealth social network data, Social
Network Analysis (SNA) led researchers to get the opportunity to access, analyse and
mine the social behaviour of millions of people, explore the way they communicate and
exchange information.
Despite the growing interest in analysing social networks, there are some challenges
and implications accompanying the analysis and mining of these networks. For example,
dealing with large-scale and evolving networks is not yet an easy task and still requires
a new mining solution. In addition, finding communities within these networks is a
challenging task and could open opportunities to see how people behave in groups on a
large scale. Also, the challenge of validating and optimizing communities without knowing
in advance the structure of the network due to the lack of ground truth is yet another
challenging barrier for validating the meaningfulness of the resulting communities.
In this thesis, we started by providing an overview of the necessary background and key
concepts required in the area of social networks analysis. Our main focus is to provide
solutions to tackle the key challenges in this area. For doing so, first, we introduce a predictive
technique to help in the prediction of the execution time of the analysis tasks for
evolving networks through employing predictive modeling techniques to the problem of
evolving and large-scale networks. Second, we study the performance of existing community
detection approaches to derive high quality community structure using a real email
network through analysing the exchange of emails and exploring community dynamics.
The aim is to study the community behavioral patterns and evaluate their quality within
an actual network. Finally, we propose an ensemble technique for deriving communities
using a rich internal enterprise real network in IBM that reflects real collaborations
and communications between employees. The technique aims to improve the community
detection process through the fusion of different algorithms
Knowledge Modelling and Learning through Cognitive Networks
One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot
Graph Signal Processing: Overview, Challenges and Applications
Research in Graph Signal Processing (GSP) aims to develop tools for
processing data defined on irregular graph domains. In this paper we first
provide an overview of core ideas in GSP and their connection to conventional
digital signal processing. We then summarize recent developments in developing
basic GSP tools, including methods for sampling, filtering or graph learning.
Next, we review progress in several application areas using GSP, including
processing and analysis of sensor network data, biological data, and
applications to image processing and machine learning. We finish by providing a
brief historical perspective to highlight how concepts recently developed in
GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE
- …