25,589 research outputs found
Knowledge extraction from unstructured data
Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models
Generating Preview Tables for Entity Graphs
Users are tapping into massive, heterogeneous entity graphs for many
applications. It is challenging to select entity graphs for a particular need,
given abundant datasets from many sources and the oftentimes scarce information
for them. We propose methods to produce preview tables for compact presentation
of important entity types and relationships in entity graphs. The preview
tables assist users in attaining a quick and rough preview of the data. They
can be shown in a limited display space for a user to browse and explore,
before she decides to spend time and resources to fetch and investigate the
complete dataset. We formulate several optimization problems that look for
previews with the highest scores according to intuitive goodness measures,
under various constraints on preview size and distance between preview tables.
The optimization problem under distance constraint is NP-hard. We design a
dynamic-programming algorithm and an Apriori-style algorithm for finding
optimal previews. Results from experiments, comparison with related work and
user studies demonstrated the scoring measures' accuracy and the discovery
algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be
tiny differences in layout, spacing and linebreaking, compared with the
version in the SIGMOD16 proceedings, since we must submit TeX files and use
arXiv to compile the file
Pathways: Augmenting interoperability across scholarly repositories
In the emerging eScience environment, repositories of papers, datasets,
software, etc., should be the foundation of a global and natively-digital
scholarly communications system. The current infrastructure falls far short of
this goal. Cross-repository interoperability must be augmented to support the
many workflows and value-chains involved in scholarly communication. This will
not be achieved through the promotion of single repository architecture or
content representation, but instead requires an interoperability framework to
connect the many heterogeneous systems that will exist.
We present a simple data model and service architecture that augments
repository interoperability to enable scholarly value-chains to be implemented.
We describe an experiment that demonstrates how the proposed infrastructure can
be deployed to implement the workflow involved in the creation of an overlay
journal over several different repository systems (Fedora, aDORe, DSpace and
arXiv).Comment: 18 pages. Accepted for International Journal on Digital Libraries
special issue on Digital Libraries and eScienc
MeLinDa: an interlinking framework for the web of data
The web of data consists of data published on the web in such a way that they
can be interpreted and connected together. It is thus critical to establish
links between these data, both for the web of data and for the semantic web
that it contributes to feed. We consider here the various techniques developed
for that purpose and analyze their commonalities and differences. We propose a
general framework and show how the diverse techniques fit in the framework.
From this framework we consider the relation between data interlinking and
ontology matching. Although, they can be considered similar at a certain level
(they both relate formal entities), they serve different purposes, but would
find a mutual benefit at collaborating. We thus present a scheme under which it
is possible for data linking tools to take advantage of ontology alignments.Comment: N° RR-7691 (2011
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
- …