29,456 research outputs found
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
Comparative analysis of PropertyFirst vs. EntityFirst modeling approaches in graph databases
While relational databases still hold the primary position in the database technology domain, and have been for the longest time of any Computer Science technology has since its inception, for the first time the relational databases now have valid and worthy opponent in the NoSQL database movement.
NoSQL databases, even though not many people have heard of them, with a significant number of Computer Science people included, have spread rapidly in many shapes and forms and have done so in quite a chaotic fashion. Similarly to the way they appeared and spread, design and modeling for them have been undertaken in an unstructured manner. Currently they are subcategorized in 4 main groups as: Key-value stores, Column Family stores, Document stores and Graph databases.
In this thesis, different modeling approaches for graph databases, applied to the same domain are analyzed and compared, especially from a design perspective.
The database selected here as the implemented technology is Neo4J by Neo Technologies and is a directed property graph database, which means that relationships between its data entities must have a starting and ending (or source and destination) node.
This research provides an overview of two competing modeling approaches and evaluates them in a context of a real world example.
The work done here shows that both of these modeling approaches are valid and that it is possible to fully develop a data model based on the same domain data with both approaches and that both can be used later to support application access in a similar fashion. One of the models provides for faster access to data, but at a cost of higher maintenance and increased complexity
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Structural Logistic Regression for Link Analysis
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a flat file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Kernel arquitecture for CAD/CAM in shipbuilding enviroments
The capabilities of complex software products such as CAD/CAM systems are strongly supported by basic information technologies related with data management, visualization, communication, geometry modeling and others related with the development process. These basic information technologies are involved in a continuous evolution process, but over recent years this evolution has been dramatic. The main reason for this has been that new hardware capabilities (including graphic cards) are available at very low cost, but also a contributing factor has been the evolution of the prices of basic software. To take advantage of these new features, the existing CAD/CAM systems must undergo a complete and drastic redesign. This process is complicated but strategic for the future evolution of a system. There are several examples in the market of how a bad decision has lead to a cul-de-sac (both technically and commercially). This paper describes what the authors consider are the basic architectural components of a kernel for a CAD/CAM system oriented to shipbuilding. The proposed solution is a combination of in-house developed frameworks together with commercial products that are accepted as standard components. The proportion of in-house frameworks within this combination of products is a key factor, especially when considering CAD/CAM systems oriented to shipbuilding. General-purpose CAD/CAM systems are mainly oriented to the mechanical CAD market. For this reason several basic products exist devoted to geometry modelling in this context. But these basic products are not well suited to deal with the very specific geometry modelling requirements of a CAD/CAM system oriented to shipbuilding. The complexity of the ship model, the different model requirements through its short and changing life cycle and the many different disciplines involved in the process are reasons for this inadequacy. Apart from these basic frameworks, specific shipbuilding frameworks are also required. This second layer is built over the basic technology components mentioned above. This paper describes in detail the technological frameworks which have been used to develop the latest FORAN version.Postprint (published version
- …