4,787 research outputs found
Heterogeneous data source integration for smart grid ecosystems based on metadata mining
The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de EconomĂa y Competitividad TEC2013-40767-
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Recommended from our members
MultiView : a methodology for supporting multiple view schemata in object-oriented databases
It has been widely recognized that object-oriented database (OODB) technology needs to be extended to provide a mechanism similar to views in relational database systems. We define an object-oriented view to be an arbitrarily complex virtual schema graph with possibly restructured generalization and decomposition hierarchies - rather than just one virtual class as has been proposed in the literature. In this paper, we propose a methodology, called MultiView, for supporting multiple such view schemata. MultiView breaks the schema design task into the following independent and well-defined subtasks: (1) the customization of type descriptions and object sets of existing classes by deriving virtual classes, (2) the integration of all derived classes into one consistent global schema graph, and (3) the definition of arbitrarily complex view schemata on this augmented global schema. For the first task of MultiView, we define a set of object algebra operators that can be used by the view definer for class customization. For the second task of MultiView, we propose an algorithm that automatically integrates these newly derived virtual classes into the global schema. We solve the third task of MultiView by first letting the view definer explicitly select the desired view classes from the global schema using a view definition language and then by automatically generating a view class hierarchy for these selected classes. In addition, we present algorithms that verify the closure property of a view and, if found to be incomplete, transform it into a closed, yet minimal, view. In this paper, we introduce the fundamental concept of view independence and show MultiView to be view independent. We also outline implementation techniques for realizing MultiView with existing OODB technology
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
- âŠ