58 research outputs found

    Machine Learning over Static and Dynamic Relational Data

    Get PDF
    This tutorial overviews principles behind recent works on training and maintaining machine learning models over relational data, with an emphasis on the exploitation of the relational data structure to improve the runtime performance of the learning task. The tutorial has the following parts: 1) Database research for data science 2) Three main ideas to achieve performance improvements 2.1) Turn the ML problem into a DB problem 2.2) Exploit structure of the data and problem 2.3) Exploit engineering tools of a DB researcher 3) Avenues for future researchComment: arXiv admin note: text overlap with arXiv:2008.0786

    A survey of large-scale reasoning on the Web of data

    Get PDF
    As more and more data is being generated by sensor networks, social media and organizations, the Webinterlinking this wealth of information becomes more complex. This is particularly true for the so-calledWeb of Data, in which data is semantically enriched and interlinked using ontologies. In this large anduncoordinated environment, reasoning can be used to check the consistency of the data and of asso-ciated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insightsfrom the data. However, reasoning approaches need to be scalable in order to enable reasoning over theentire Web of Data. To address this problem, several high-performance reasoning systems, whichmainly implement distributed or parallel algorithms, have been proposed in the last few years. Thesesystems differ significantly; for instance in terms of reasoning expressivity, computational propertiessuch as completeness, or reasoning objectives. In order to provide afirst complete overview of thefield,this paper reports a systematic review of such scalable reasoning approaches over various ontologicallanguages, reporting details about the methods and over the conducted experiments. We highlight theshortcomings of these approaches and discuss some of the open problems related to performing scalablereasoning

    Late-bound code generation

    Get PDF
    Each time a function or method is invoked during the execution of a program, a stream of instructions is issued to some underlying hardware platform. But exactly what underlying hardware, and which instructions, is usually left implicit. However in certain situations it becomes important to control these decisions. For example, particular problems can only be solved in real-time when scheduled on specialised accelerators, such as graphics coprocessors or computing clusters. We introduce a novel operator for hygienically reifying the behaviour of a runtime function instance as a syntactic fragment, in a language which may in general differ from the source function definition. Translation and optimisation are performed by recursively invoked, dynamically dispatched code generators. Side-effecting operations are permitted, and their ordering is preserved. We compare our operator with other techniques for pragmatic control, observing that: the use of our operator supports lifting arbitrary mutable objects, and neither requires rewriting sections of the source program in a multi-level language, nor interferes with the interface to individual software components. Due to its lack of interference at the abstraction level at which software is composed, we believe that our approach poses a significantly lower barrier to practical adoption than current methods. The practical efficacy of our operator is demonstrated by using it to offload the user interface rendering of a smartphone application to an FPGA coprocessor, including both statically and procedurally defined user interface components. The generated pipeline is an application-specific, statically scheduled processor-per-primitive rendering pipeline, suitable for place-and-route style optimisation. To demonstrate the compatibility of our operator with existing languages, we show how it may be defined within the Python programming language. We introduce a transformation for weakening mutable to immutable named bindings, termed let-weakening, to solve the problem of propagating information pertaining to named variables between modular code generating units.Open Acces

    Federated Query Processing for the Semantic Web

    Get PDF
    The recent years have witnessed a constant growth in the amount of RDF data available on the Web. This growth is largely based on the increasing rate of data publication on the Web by different actors such governments, life science researchers or geographical institutes. RDF data generation is mainly done by converting already existing legacy data resources into RDF (e.g. converting data stored in relational databases into RDF), but also by creating that RDF data directly (e.g. sensors). These RDF data are normally exposed by means of Linked Data-enabled URIs and SPARQL endpoints. Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across them has also grown. Tools for accessing sets of RDF data repositories are starting to appear, differing between them on the way in which they allow users to access these data (allowing users to specify directly what RDF data set they want to query, or making this process transparent to them). To overcome this heterogeneity in federated query processing solutions, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1, which allows combining in a single query, graph patterns that can be evaluated in several endpoints. In this PhD thesis, we describe the syntax of that SPARQL extension for providing access to distributed RDF data sets and formalise its semantics. We adapt existing techniques for distributed data access in relational databases in order to deal with SPARQL endpoints, which we have implemented in our federation query evaluation system (SPARQL-DQP). We describe the static optimisation techniques that we implemented in our system and we carry out a series of experiments that show that our optimisations significantly speed up the query evaluation process in presence of large query results and optional operator

    The Third NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in October 1993. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems involved. Discussion topics include the necessary use of computers in the solution of today's infinitely complex problems, the need for greatly increased storage densities in both optical and magnetic recording media, currently popular storage media and magnetic media storage risk factors, data archiving standards including a talk on the current status of the IEEE Storage Systems Reference Model (RM). Additional topics addressed System performance, data storage system concepts, communications technologies, data distribution systems, data compression, and error detection and correction

    An Object-Oriented Programming Environment for Parallel Genetic Algorithms

    Get PDF
    This thesis investigates an object-oriented programming environment for building parallel applications based on genetic algorithms (GAs). It describes the design of the Genetic Algorithms Manipulation Environment (GAME), which focuses on three major software development requirements: flexibility, expandability and portability. Flexibility is provided by GAME through a set of libraries containing pre-defined and parameterised components such as genetic operators and algorithms. Expandability is offered by GAME'S object-oriented design. It allows applications, algorithms and genetic operators to be easily modified and adapted to satisfy diverse problem's requirements. Lastly, portability is achieved through the use of the standard C++ language, and by isolating machine and operating system dependencies into low-level modules, which are hidden from the application developer by GAME'S application programming interfaces. The development of GAME is central to the Programming Environment for Applications of PArallel GENetic Algorithms project (PAPAGENA). This is the principal European Community (ESPRIT III) funded parallel genetic algorithms project. It has two main goals: to provide a general-purpose tool kit, supporting the development and analysis of large-scale parallel genetic algorithms (PGAs) applications, and to demonstrate the potential of applying evolutionary computing in diverse problem domains. The research reported in this thesis is divided in two parts: i) the analysis of GA models and the study of existing GA programming environments from an application developer perspective; ii) the description of a general-purpose programming environment designed to help with the development of GA and PGA-based computer programs. The studies carried out in the first part provide the necessary understanding of GAs' structure and operation to outline the requirements for the development of complex computer programs. The second part presents GAME as the result of combining development requirements, relevant features of existing environments and innovative ideas, into a powerful programming environment. The system is described in terms of its abstract data structures and sub-systems that allow the representation of problems independently of any particular GA model. GAME's programming model is also presented as general-purpose object-oriented framework for programming coarse-grained parallel applications. GAME has a modular architecture comprising five modules: the Virtual Machine, the Parallel Execution Module, the Genetic Libraries, the Monitoring Control Module, and the Graphic User Interface. GAME's genetic-oriented abstract data structures, and the Virtual Machine, isolates genetic operators and algorithms from low-level operations such as memory management, exception handling, etc. The Parallel Execution Module supports GAME's object- oriented parallel programming model. It defines an application programming interface and a runtime library that allow the same parallel application, created within the environment, to run on different hardware and operating system platforms. The Genetic Libraries outline a hierarchy of components implemented as parameterised versions of standard and custom genetic operators, algorithms and applications. The Monitoring Control Module supports dynamic control and monitoring of simulations, whereas the Graphic User Interface defines a basic framework and graphic 'widgets' for displaying and entering data. This thesis describes the design philosophy and rationale behind these modules, covering in more detail the Virtual Machine, the Parallel Execution Module and the Genetic Libraries. The assessment discusses the system's ability to satisfy the main requirements of GA and PGA software development, as well as the features that distinguish GAME from other programming environments

    Evaluation of Functional Data Models for Database Design and Use

    Get PDF
    The problems of design, operation, and maintenance of databases using the three most popular database management systems (Hierarchical, CQDASYL/DBTG, and Relational) are well known. Users wishing to use these systems have to make conscious and often complex mappings between the real-world structures and the data structuring options (data models) provided by these systems. In addition, much of the semantics associated with the data either does not get expressed at all or gets embedded procedurally in application programs in an ad-hoc way. In recent years, a large number of data models (called semantic data models) have been proposed with the aim of simplifying database design and use. However, the lack of usable implementations of these proposals has so far inhibited the widespread use of these concepts. The present work reports on an effort to evaluate and extend one such semantic model by means of an implementation. It is based on the functional data model proposed earlier by Shipman[SHIP81). We call this 'Extended Functional Data Model' (EFDM). EFDM, like Shipman's proposals, is a marriage of three of the advanced modelling concepts found in both database and artificial intelligence research: the concept of entity to represent an object in the real world, the concept of type hierarchy among entity types, and the concept of derived data for modelling procedural knowledge. The functional notation of the model lends itself to high level data manipulation languages. The data selection in these languages is expressed simply as function application. Further, the functional approach makes it possible to incorporate general purpose computation facilities in the data languages without having to embed them in procedural languages. In addition to providing the usual database facilities, the implementation also provides a mechanism to specify multiple user views of the database

    Prototyping of CMS storage management

    Get PDF
    corecore