51,604 research outputs found

    Entity Resolution On-Demand

    Get PDF
    Entity Resolution (ER) aims to identify and merge records that refer to the same real-world entity. ER is typically employed as an expensive cleaning step on the entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely on the user's application, which may need only a fraction of them. For instance, when dealing with Web data, we would like to be able to filter the entities of interest gathered from multiple sources without cleaning the entire, continuously-growing data. Similarly, when querying data lakes, we want to transform data on-demand and return the results in a timely manner---a fundamental requirement of ELT (Extract-Load-Transform) pipelines. We propose BrewER, a framework to evaluate SQL SP queries on dirty data while progressively returning results as if they were issued on cleaned data. BrewER tries to focus the cleaning effort on one entity at a time, following an ORDER BY predicate. Thus, it inherently supports top-k and stop-and-resume execution. For a wide range of applications, a significant amount of resources can be saved. We exhaustively evaluate and show the efficacy of BrewER on four real-world datasets

    Automated information retrieval using CLIPS

    Get PDF
    Expert systems have considerable potential to assist computer users in managing the large volume of information available to them. One possible use of an expert system is to model the information retrieval interests of a human user and then make recommendations to the user as to articles of interest. At Cal Poly, a prototype expert system written in the C Language Integrated Production System (CLIPS) serves as an Automated Information Retrieval System (AIRS). AIRS monitors a user's reading preferences, develops a profile of the user, and then evaluates items returned from the information base. When prompted by the user, AIRS returns a list of items of interest to the user. In order to minimize the impact on system resources, AIRS is designed to run in the background during periods of light system use

    Using Intelligent Prefetching to Reduce the Energy Consumption of a Large-scale Storage System

    Get PDF
    Many high performance large-scale storage systems will experience significant workload increases as their user base and content availability grow over time. The U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center hosts one such system that has recently undergone a period of rapid growth as its user population grew nearly 400% in just about three years. When administrators of these massive storage systems face the challenge of meeting the demands of an ever increasing number of requests, the easiest solution is to integrate more advanced hardware to existing systems. However, additional investment in hardware may significantly increase the system cost as well as daily power consumption. In this paper, we present evidence that well-selected software level optimization is capable of achieving comparable levels of performance without the cost and power consumption overhead caused by physically expanding the system. Specifically, we develop intelligent prefetching algorithms that are suitable for the unique workloads and user behaviors of the world\u27s largest satellite images distribution system managed by USGS EROS. Our experimental results, derived from real-world traces with over five million requests sent by users around the globe, show that the EROS hybrid storage system could maintain the same performance with over 30% of energy savings by utilizing our proposed prefetching algorithms, compared to the alternative solution of doubling the size of the current FTP server farm

    Filtering, Piracy Surveillance and Disobedience

    Get PDF
    There has always been a cyclical relationship between the prevention of piracy and the protection of civil liberties. While civil liberties advocates previously warned about the aggressive nature of copyright protection initiatives, more recently, a number of major players in the music industry have eventually ceded to less direct forms of control over consumer behavior. As more aggressive forms of consumer control, like litigation, have receded, we have also seen a rise in more passive forms of consumer surveillance. Moreover, even as technology has developed more perfect means for filtering and surveillance over online piracy, a number of major players have opted in favor of “tolerated use,” a term coined by Professor Tim Wu to denote the allowance of uses that may be otherwise infringing, but that are allowed to exist for public use and enjoyment. Thus, while the eventual specter of copyright enforcement and monitoring remains a pervasive digital reality, the market may fuel a broad degree of consumer freedom through the toleration or taxation of certain kinds of activities. This Article is meant largely to address and to evaluate these shifts by drawing attention to the unique confluence of these two important moments: the growth of tolerated uses, coupled with an increasing trend towards more passive forms of piracy surveillance in light of the balance between copyright enforcement and civil liberties. The content industries may draw upon a broad definition of disobedience in their campaigns to educate the public about copyright law, but the market’s allowance of DRM-free content suggests an altogether different definition. The divide in turn between copyright enforcement and civil liberties results in a perfect storm of uncertainty, suggesting the development of an even further division between the role of the law and the role of the marketplace in copyright enforcement and innovation, respectively

    Automated system for integration and display of physiological response data

    Get PDF
    The system analysis approach was applied in a study of physiological systems in both 1-g and weightlessness, for short and long term experiments. A whole body, algorithm developed as the first step in the construction of a total body simulation system is described and an advanced biomedical computer system concept including interactive display/command consoles is discussed. The documentation of the design specifications, design and development studies, and user's instructions (which include program listings) for these delivered end-terms; the reports on the results of many research and feasibility studies; and many subcontract reports are cited in the bibliography

    Towards automated knowledge-based mapping between individual conceptualisations to empower personalisation of Geospatial Semantic Web

    No full text
    Geospatial domain is characterised by vagueness, especially in the semantic disambiguation of the concepts in the domain, which makes defining universally accepted geo- ontology an onerous task. This is compounded by the lack of appropriate methods and techniques where the individual semantic conceptualisations can be captured and compared to each other. With multiple user conceptualisations, efforts towards a reliable Geospatial Semantic Web, therefore, require personalisation where user diversity can be incorporated. The work presented in this paper is part of our ongoing research on applying commonsense reasoning to elicit and maintain models that represent users' conceptualisations. Such user models will enable taking into account the users' perspective of the real world and will empower personalisation algorithms for the Semantic Web. Intelligent information processing over the Semantic Web can be achieved if different conceptualisations can be integrated in a semantic environment and mismatches between different conceptualisations can be outlined. In this paper, a formal approach for detecting mismatches between a user's and an expert's conceptual model is outlined. The formalisation is used as the basis to develop algorithms to compare models defined in OWL. The algorithms are illustrated in a geographical domain using concepts from the SPACE ontology developed as part of the SWEET suite of ontologies for the Semantic Web by NASA, and are evaluated by comparing test cases of possible user misconceptions

    Knowledge structure representation and automated updates in intelligent information management systems

    Get PDF
    A continuing effort to apply rapid prototyping and Artificial Intelligence techniques to problems associated with projected Space Station-era information management systems is examined. In particular, timely updating of the various databases and knowledge structures within the proposed intelligent information management system (IIMS) is critical to support decision making processes. Because of the significantly large amounts of data entering the IIMS on a daily basis, information updates will need to be automatically performed with some systems requiring that data be incorporated and made available to users within a few hours. Meeting these demands depends first, on the design and implementation of information structures that are easily modified and expanded, and second, on the incorporation of intelligent automated update techniques that will allow meaningful information relationships to be established. Potential techniques are studied for developing such an automated update capability and IIMS update requirements are examined in light of results obtained from the IIMS prototyping effort
    corecore