822,671 research outputs found
Advanced techniques and technology for efficient data storage, access, and transfer
Advanced techniques for efficiently representing most forms of data are being implemented in practical hardware and software form through the joint efforts of three NASA centers. These techniques adapt to local statistical variations to continually provide near optimum code efficiency when representing data without error. Demonstrated in several earlier space applications, these techniques are the basis of initial NASA data compression standards specifications. Since the techniques clearly apply to most NASA science data, NASA invested in the development of both hardware and software implementations for general use. This investment includes high-speed single-chip very large scale integration (VLSI) coding and decoding modules as well as machine-transferrable software routines. The hardware chips were tested in the laboratory at data rates as high as 700 Mbits/s. A coding module's definition includes a predictive preprocessing stage and a powerful adaptive coding stage. The function of the preprocessor is to optimally process incoming data into a standard form data source that the second stage can handle.The built-in preprocessor of the VLSI coder chips is ideal for high-speed sampled data applications such as imaging and high-quality audio, but additionally, the second stage adaptive coder can be used separately with any source that can be externally preprocessed into the 'standard form'. This generic functionality assures that the applicability of these techniques and their recent high-speed implementations should be equally broad outside of NASA
In order to fully realise the value of open data researchers must first address the quality of the datasets
There has been a phenomenal increase in the availability of data over the last decade. Open data is provided as a means of empowering users with information and in the hope of sparking innovation and increased efficiency in governments and businesses. However, in spite of the many success stories based on the open data paradigm, concerns remain over the quality of such datasets. Marta Indulska and Shazia Sadiq argue that in order to facilitate more effective and efficient realisation of value from open data, research must reach a shared consensus on the definition of data quality dimensions, provide methods and guidelines for assessing the potential usefulness of open datasets using exploratory tools and techniques, and develop rigorous theoretical underpinnings on effective use of open data
Information driven evaluation of data hiding algorithms
Abstract. Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. This paper explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.
Privacy Preserving Data Mining, Evaluation Methodologies
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM)
algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. Moreover, because sanitization modifies the data, an important issue, especially for critical data, is to preserve the quality of data. However, to the best of our knowledge, no approaches have been developed dealing with the issue of data quality in the context of PPDM algorithms. This report explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. Moreover, because of the "environment related" nature of data quality, a structure to represent constraints and information relevance related to data is presented. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.JRC.G.6-Sensors, radar technologies and cybersecurit
Valentine: Evaluating Matching Techniques for Dataset Discovery
Data scientists today search large data lakes to discover and integrate
datasets. In order to bring together disparate data sources, dataset discovery
methods rely on some form of schema matching: the process of establishing
correspondences between datasets. Traditionally, schema matching has been used
to find matching pairs of columns between a source and a target schema.
However, the use of schema matching in dataset discovery methods differs from
its original use. Nowadays schema matching serves as a building block for
indicating and ranking inter-dataset relationships. Surprisingly, although a
discovery method's success relies highly on the quality of the underlying
matching algorithms, the latest discovery methods employ existing schema
matching algorithms in an ad-hoc fashion due to the lack of openly-available
datasets with ground truth, reference method implementations, and evaluation
metrics. In this paper, we aim to rectify the problem of evaluating the
effectiveness and efficiency of schema matching methods for the specific needs
of dataset discovery. To this end, we propose Valentine, an extensible
open-source experiment suite to execute and organize large-scale automated
matching experiments on tabular data. Valentine includes implementations of
seminal schema matching methods that we either implemented from scratch (due to
absence of open source code) or imported from open repositories. The
contributions of Valentine are: i) the definition of four schema matching
scenarios as encountered in dataset discovery methods, ii) a principled dataset
fabrication process tailored to the scope of dataset discovery methods and iii)
the most comprehensive evaluation of schema matching techniques to date,
offering insight on the strengths and weaknesses of existing techniques, that
can serve as a guide for employing schema matching in future dataset discovery
methods
Multi Sensor Data Fusion Architectures for Air Traffic Control Applications
Nowadays, the radar is no longer the sole technology which is able to ensure the surveillance of air traffic. The extensive deployment of satellite systems and air-to-ground data links leads to the emergence of complementary means and techniques on which a great deal of research and experiments have been carried out over the past ten years. In such an environment, the sensor data processing, which is a key element in any Air Traffic Control (ATC) centre, has been continuously upgraded so as to follow the sensor technology evolution and in the meantime improves the quality in term of continuity, integrity and accuracy criteria. This book chapter proposes a comprehensive description of the state of art and the roadmap for the future of the multi sensor data fusion architectures and techniques in use in ATC centres. The first part of the chapter describes the background of ATC centres, while the second part of the chapter points out various data fusion techniques. Multi radar data processing architecture is analysed and a brief definition of internal core tracking algorithms is given as well as a comparative benchmark based on their respective advantages and drawbacks. The third part of the chapter focuses on the most recent evolution that leads from a Multi Radar Tracking System to a Multi Sensor Tracking System. The last part of the chapter deals with the sensor data processing that will be put in operation in the next ten years. The main challenge will be to provide the same level of services in both surface and air surveillance areas in order to offer: ⢠highly accurate air and surface situation awareness to air traffic controllers, ⢠situational awareness via Traffic Information System â Broadcast (TIS-B) services to pilots and vehicle drivers, and ⢠new air and surface safety, capacity and efficiency applications to airports and airlines
Recommended from our members
A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures
The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed.
This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm.
The strategies developed to parallelise unstructured mesh codes should meet a number of requirements:
The algorithms are faithfully reproduced in parallel.
The code is largely unaltered in the parallel version.
The parallel efficiency is maximised.
The techniques should scale to highly parallel systems.
The parallelisation process should become automated.
Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance.
The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect.
The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication
Indoor Air Quality Design and Control in Low-Energy Residential Buildings, International Energy Agency, EBC Annex 68, Subtask 2: Pollutant loads in residential buildings (Common exercises)
The objective of the present work was to develop Common exercises to help readers better understand and practice the theory that define a reference house with the local climate, methods, and techniques to evaluate and predict energy efficiency and indoor air quality in the buildings with the changing environment conditions which have been developed in Subtask 2 of Annex 68. The report includes three Common exercises. CE1: A procedure for the definition of reference buildings for estimating the pollution loads, IAQ and energy analysis for different countries/climates. CE2: A method and procedure of using a full-scale chamber to evaluate the effects of emission sources and sinks, ventilation and air cleaning on IAQ. CE3: Development of a procedure for estimating the parameters of mechanistic emission source models from chamber testing data. They are corresponding to Chapters 2, 3 and 5 in the final report of Subtask 2, respectively. Finally, the solutions for CE1 are presented in the Appendices of the report. The readers with appropriate research facilities are encouraged to use the procedures described in CE2 and CE3
UKERC Review of evidence for the rebound effect: Technical report 2: Econometric studies
This Working Paper examines the evidence for direct rebound effects that is available from studies that use econometric techniques to analyse secondary data. The focus throughout is on consumer energy services, since this is where the bulk of the evidence lies
- …