822,671 research outputs found

    Advanced techniques and technology for efficient data storage, access, and transfer

    Get PDF
    Advanced techniques for efficiently representing most forms of data are being implemented in practical hardware and software form through the joint efforts of three NASA centers. These techniques adapt to local statistical variations to continually provide near optimum code efficiency when representing data without error. Demonstrated in several earlier space applications, these techniques are the basis of initial NASA data compression standards specifications. Since the techniques clearly apply to most NASA science data, NASA invested in the development of both hardware and software implementations for general use. This investment includes high-speed single-chip very large scale integration (VLSI) coding and decoding modules as well as machine-transferrable software routines. The hardware chips were tested in the laboratory at data rates as high as 700 Mbits/s. A coding module's definition includes a predictive preprocessing stage and a powerful adaptive coding stage. The function of the preprocessor is to optimally process incoming data into a standard form data source that the second stage can handle.The built-in preprocessor of the VLSI coder chips is ideal for high-speed sampled data applications such as imaging and high-quality audio, but additionally, the second stage adaptive coder can be used separately with any source that can be externally preprocessed into the 'standard form'. This generic functionality assures that the applicability of these techniques and their recent high-speed implementations should be equally broad outside of NASA

    In order to fully realise the value of open data researchers must first address the quality of the datasets

    Get PDF
    There has been a phenomenal increase in the availability of data over the last decade. Open data is provided as a means of empowering users with information and in the hope of sparking innovation and increased efficiency in governments and businesses. However, in spite of the many success stories based on the open data paradigm, concerns remain over the quality of such datasets. Marta Indulska and Shazia Sadiq argue that in order to facilitate more effective and efficient realisation of value from open data, research must reach a shared consensus on the definition of data quality dimensions, provide methods and guidelines for assessing the potential usefulness of open datasets using exploratory tools and techniques, and develop rigorous theoretical underpinnings on effective use of open data

    Information driven evaluation of data hiding algorithms

    Get PDF
    Abstract. Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. This paper explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.

    Privacy Preserving Data Mining, Evaluation Methodologies

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. Moreover, because sanitization modifies the data, an important issue, especially for critical data, is to preserve the quality of data. However, to the best of our knowledge, no approaches have been developed dealing with the issue of data quality in the context of PPDM algorithms. This report explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. Moreover, because of the "environment related" nature of data quality, a structure to represent constraints and information relevance related to data is presented. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.JRC.G.6-Sensors, radar technologies and cybersecurit

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    Full text link
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Multi Sensor Data Fusion Architectures for Air Traffic Control Applications

    Get PDF
    Nowadays, the radar is no longer the sole technology which is able to ensure the surveillance of air traffic. The extensive deployment of satellite systems and air-to-ground data links leads to the emergence of complementary means and techniques on which a great deal of research and experiments have been carried out over the past ten years. In such an environment, the sensor data processing, which is a key element in any Air Traffic Control (ATC) centre, has been continuously upgraded so as to follow the sensor technology evolution and in the meantime improves the quality in term of continuity, integrity and accuracy criteria. This book chapter proposes a comprehensive description of the state of art and the roadmap for the future of the multi sensor data fusion architectures and techniques in use in ATC centres. The first part of the chapter describes the background of ATC centres, while the second part of the chapter points out various data fusion techniques. Multi radar data processing architecture is analysed and a brief definition of internal core tracking algorithms is given as well as a comparative benchmark based on their respective advantages and drawbacks. The third part of the chapter focuses on the most recent evolution that leads from a Multi Radar Tracking System to a Multi Sensor Tracking System. The last part of the chapter deals with the sensor data processing that will be put in operation in the next ten years. The main challenge will be to provide the same level of services in both surface and air surveillance areas in order to offer: ⢠highly accurate air and surface situation awareness to air traffic controllers, ⢠situational awareness via Traffic Information System â Broadcast (TIS-B) services to pilots and vehicle drivers, and ⢠new air and surface safety, capacity and efficiency applications to airports and airlines

    Indoor Air Quality Design and Control in Low-Energy Residential Buildings, International Energy Agency, EBC Annex 68, Subtask 2: Pollutant loads in residential buildings (Common exercises)

    Get PDF
    The objective of the present work was to develop Common exercises to help readers better understand and practice the theory that define a reference house with the local climate, methods, and techniques to evaluate and predict energy efficiency and indoor air quality in the buildings with the changing environment conditions which have been developed in Subtask 2 of Annex 68. The report includes three Common exercises. CE1: A procedure for the definition of reference buildings for estimating the pollution loads, IAQ and energy analysis for different countries/climates. CE2: A method and procedure of using a full-scale chamber to evaluate the effects of emission sources and sinks, ventilation and air cleaning on IAQ. CE3: Development of a procedure for estimating the parameters of mechanistic emission source models from chamber testing data. They are corresponding to Chapters 2, 3 and 5 in the final report of Subtask 2, respectively. Finally, the solutions for CE1 are presented in the Appendices of the report. The readers with appropriate research facilities are encouraged to use the procedures described in CE2 and CE3

    UKERC Review of evidence for the rebound effect: Technical report 2: Econometric studies

    Get PDF
    This Working Paper examines the evidence for direct rebound effects that is available from studies that use econometric techniques to analyse secondary data. The focus throughout is on consumer energy services, since this is where the bulk of the evidence lies
    • …
    corecore