29,456 research outputs found

    kLog: A Language for Logical and Relational Learning with Kernels

    Full text link
    We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials

    Comparative analysis of PropertyFirst vs. EntityFirst modeling approaches in graph databases

    Get PDF
    While relational databases still hold the primary position in the database technology domain, and have been for the longest time of any Computer Science technology has since its inception, for the first time the relational databases now have valid and worthy opponent in the NoSQL database movement. NoSQL databases, even though not many people have heard of them, with a significant number of Computer Science people included, have spread rapidly in many shapes and forms and have done so in quite a chaotic fashion. Similarly to the way they appeared and spread, design and modeling for them have been undertaken in an unstructured manner. Currently they are subcategorized in 4 main groups as: Key-value stores, Column Family stores, Document stores and Graph databases. In this thesis, different modeling approaches for graph databases, applied to the same domain are analyzed and compared, especially from a design perspective. The database selected here as the implemented technology is Neo4J by Neo Technologies and is a directed property graph database, which means that relationships between its data entities must have a starting and ending (or source and destination) node. This research provides an overview of two competing modeling approaches and evaluates them in a context of a real world example. The work done here shows that both of these modeling approaches are valid and that it is possible to fully develop a data model based on the same domain data with both approaches and that both can be used later to support application access in a similar fashion. One of the models provides for faster access to data, but at a cost of higher maintenance and increased complexity

    Structural Logistic Regression for Link Analysis

    Get PDF
    We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a flat file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Kernel arquitecture for CAD/CAM in shipbuilding enviroments

    Get PDF
    The capabilities of complex software products such as CAD/CAM systems are strongly supported by basic information technologies related with data management, visualization, communication, geometry modeling and others related with the development process. These basic information technologies are involved in a continuous evolution process, but over recent years this evolution has been dramatic. The main reason for this has been that new hardware capabilities (including graphic cards) are available at very low cost, but also a contributing factor has been the evolution of the prices of basic software. To take advantage of these new features, the existing CAD/CAM systems must undergo a complete and drastic redesign. This process is complicated but strategic for the future evolution of a system. There are several examples in the market of how a bad decision has lead to a cul-de-sac (both technically and commercially). This paper describes what the authors consider are the basic architectural components of a kernel for a CAD/CAM system oriented to shipbuilding. The proposed solution is a combination of in-house developed frameworks together with commercial products that are accepted as standard components. The proportion of in-house frameworks within this combination of products is a key factor, especially when considering CAD/CAM systems oriented to shipbuilding. General-purpose CAD/CAM systems are mainly oriented to the mechanical CAD market. For this reason several basic products exist devoted to geometry modelling in this context. But these basic products are not well suited to deal with the very specific geometry modelling requirements of a CAD/CAM system oriented to shipbuilding. The complexity of the ship model, the different model requirements through its short and changing life cycle and the many different disciplines involved in the process are reasons for this inadequacy. Apart from these basic frameworks, specific shipbuilding frameworks are also required. This second layer is built over the basic technology components mentioned above. This paper describes in detail the technological frameworks which have been used to develop the latest FORAN version.Postprint (published version
    • …
    corecore