192 research outputs found
Statistically-driven generation of multidimensional analytical schemas from linked data
The ever-increasing Linked Data (LD) initiative has given place to open, large amounts of semi-structured and rich data published on the Web. However, effective analytical tools that aid the user in his/her analysis and go beyond browsing and querying are still lacking. To address this issue, we propose the automatic generation of multidimensional analytical stars (MDAS). The success of the multidimensional (MD) model for data analysis has been in great part due to its simplicity. Therefore, in this paper we aim at automatically discovering MD conceptual patterns that summarize LD. These patterns resemble the MD star schema typical of relational data warehousing. The underlying foundations of our method is a statistical framework that takes into account both concept and instance data. We present an implementation that makes use of the statistical framework to generate the MDAS. We have performed several experiments that assess and validate the statistical approach with two well-known and large LD sets.This
research
has
been
partially
funded
by
the
“Ministerio
de
EconomĂa
y
Competitividad” with
contract
number
TIN2014-55335-R.
Victoria
Nebot
was
supported
by
the
UJI
Postdoctoral
Fel-
lowship
program
with
reference
PI14490
Semantic transference for enriching multilingual biomedical knowledge resources
Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non- English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to gener- ate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad-hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transfer- ence quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Span- ish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842)
Exploiting semantic annotations for open information extraction: an experience in the biomedical domain
The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)
Building Data Warehouses with Semantic Web Data
The Semantic Web (SW) deployment is now a realization and the amount of
semantic annotations is ever increasing thanks to several initiatives that promote
a change in the current Web towards the Web of Data, where the semantics of
data become explicit through data representation formats and standards such as
RDF/(S) and OWL. However, such initiatives have not yet been accompanied
by e cient intelligent applications that can exploit the implicit semantics and
thus, provide more insightful analysis. In this paper, we provide the means for
e ciently analyzing and exploring large amounts of semantic data by combining
the inference power from the annotation semantics with the analysis capabilities
provided by OLAP-style aggregations, navigation, and reporting. We formally
present how semantic data should be organized in a well-de ned conceptual
MD schema, so that sophisticated queries can be expressed and evaluated. Our
proposal has been evaluated over a real biomedical scenario, which demonstrates
the scalability and applicability of the proposed approach
Digital Twins in Solar Farms: An Approach through Time Series and Deep Learning
The generation of electricity through renewable energy sources increases every day, with solar energy being one of the fastest-growing. The emergence of information technologies such as Digital Twins (DT) in the field of the Internet of Things and Industry 4.0 allows a substantial development in automatic diagnostic systems. The objective of this work is to obtain the DT of a Photovoltaic Solar Farm (PVSF) with a deep-learning (DL) approach. To build such a DT, sensor-based time series are properly analyzed and processed. The resulting data are used to train a DL model (e.g., autoencoders) in order to detect anomalies of the physical system in its DT. Results show a reconstruction error around 0.1, a recall score of 0.92 and an Area Under Curve (AUC) of 0.97. Therefore, this paper demonstrates that the DT can reproduce the behavior as well as detect efficiently anomalies of the physical system.This project has been funded by the Ministry of Economy and Commerce with project
contract TIN2016-88835-RET and by the Universitat Jaume I with project contract UJI-B2020-15
Defining Dynamic Indicators for Social Network Analysis: A Case Study in the Automotive Domain using Twiter
Comunicación pesentada en 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR 2018) (18-20 septiembre Sevilla, España)In this paper we present a framework based on Linked Open Data Infrastructures to perform analysis tasks in social networks based on dynamically defined indicators. Based on the typical stages of business intelligence models, which starts from the definition of strategic goals to define relevant indicators (Key Performance Indicators), we propose a new scenario where the sources of information are the social networks. The fundamental contribution of this work is to provide a framework for easily specifying and monitoring social indicators based on the measures offered by the APIs of the most important social networks. The main novelty of this method is that all the involved data and information is represented and stored as Linked Data. In this work we demonstrate the benefits of using linked open data, especially for processing and publishing company-specific social metrics and indicators
Designing Similarity Measures for XML
In this demonstration we will show a series of tools that
support a methodology [1] for the design of complex similarity functions
in the context of heterogenous XML systems
XTaGe: a flexible generation system for complex XML collections
We introduce XTaGe (XML Tester and Generator), a system for the synthesis of XML collections meant for testing and micro benchmarking applications. In contrast with existing approaches, XTaGe focuses on complex collections, by providing a highly extensible framework to introduce controlled variability in XML structures. In this paper we present the theoretical foundation, internal architecture and main features of our generator; we describe its implementation, which includes a GUI to facilitate the specication of collections; we discuss how XTaGe's features compare with those in other XML generation systems; finally, we illustrate its usage by presenting a use case in the bioinformatics domai
- …