145,367 research outputs found
Data mining of biometric data: revisting the concept of private life?
Over recent years, a whole new process known as data mining, equivalent to automated techniques processing large sets of data in order to extract patterns, relationships, trends and other information not traceable through usual ‘human’ reading, has been largely gaining in repute. By taking advantage of the seemingly indefinite opportunities enabled by applications of data mining techniques, various fields of scientific or medical research, business transactions, state-related and other security-concerned activities, could gain unprecedented benefits. However, notwithstanding established data protection principles reserved also for biometric information, data mining practices, inherently intrusive in the private sphere of individuals, have generated various concerns and controversy. As these emerging technological developments create new challenges to the protection of personal data, including primarily the most sensitive category of biometric data, the effectiveness of the concept of privacy under the European Convention on Human Rights (ECHR) and of the existing EU data protection legislation in securing an adequate legal framework is facing a new ordeal. This paper seeks to review, especially in the aftermath of the recent Luxembourg Court’s case law, whether evolving data mining practices materialize the need of adjusting the legal treatment of biometric data protection
Performance-Aware High-Performance Computing for Remote Sensing Big Data Analytics
The incredible increase in the volume of data emerging along with recent technological developments has made the analysis processes which use traditional approaches more difficult for many organizations. Especially applications involving subjects that require timely processing and big data such as satellite imagery, sensor data, bank operations, web servers, and social networks require efficient mechanisms for collecting, storing, processing, and analyzing these data. At this point, big data analytics, which contains data mining, machine learning, statistics, and similar techniques, comes to the help of organizations for end-to-end managing of the data. In this chapter, we introduce a novel high-performance computing system on the geo-distributed private cloud for remote sensing applications, which takes advantages of network topology, exploits utilization and workloads of CPU, storage, and memory resources in a distributed fashion, and optimizes resource allocation for realizing big data analytics efficiently
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Machine Learning Methods for Generating High Dimensional Discrete Datasets
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X\u27 that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons
Recommended from our members
Application of Data Mining in Air Traffic Forecasting
The main goal of the study centers on developing a model for the purpose of air traffic forecasting by using off-the-shelf data mining and machine learning techniques. Although data driven modeling has been extensively applied in the aviation sector, little research has been done in the area of air traffic forecasting. This study is inspired by previous research focused on improving the Federal Aviation Administration (FAA) Terminal Area Forecasting (TAF) methodology, which historically assumed that the US air transportation system (ATS) network structure was static. Recent developments use data mining algorithms to predict the likelihood of previously un-connected airport-pairs being connected in the future, and the likelihood of connected airport-pairs becoming un-connected. Despite the innovation of this research, it does not focus on improving the FAA’s existing methodology for forecasting future air traffic levels on existing routes, which is based on relatively simple regression and growth models. We investigate different approaches for improving and developing new features within the existing data mining applications in air traffic forecasting. We focus particularly on predicting detailed traffic information for the US ATS. Initially, a 2-stage log-log model is applied to establish the significance of different inputs and to identify issues of endogeneity and multi-colinearity, while maintaining the simplicity of current models. Although the model shows high goodness of fit, it tested positive for both mentioned issues as well as presenting problems with causality. With the objective of solving these issues, a 3-stage model that is under development is introduced. This model employs logistic regression and discrete choice modelling. As part of future work, machine learning techniques such as clustering and neural networks will be applied to improve this model’s performance
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
Recovering the lost gold of the developing world : bibliographic database
This report contains a library of 181 references, including abstracts, prepared for Project
R 7120 "Recovering the lost gold of the developing world" funded by the UK' s
Department for International Development (DFID) under the Knowledge and Research
(KAR) programme. As part of an initial desk study, a literature review of gold processing
methods used by small-scale miners was carried out using the following sources; the lSI
Science Citation Index accessed via Bath Information and Data Services (BIDS), a
licensed GEOREF CD-ROM database held at the BGS's Library in Keyworth and
IMMage a CD-ROM database produced by the Institution of Mining and Metallurgy held
by the Minerals group ofBGS. Information on the search terms used is available from the
author
Comment: Classifier Technology and the Illusion of Progress
Comment on Classifier Technology and the Illusion of Progress
[math.ST/0606441]Comment: Published at http://dx.doi.org/10.1214/088342306000000024 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …