11,224 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
The Design of a Low-Cost Traffic Calming Radar - Development of a radar solution intended to demonstrate proof of concept
This study aimed to develop a radar solution that would aid the traffic calming efforts of the CSIR business campus. The Institute of Transportation Engineers defined traffic calming as "The combination of mainly physical measures that reduce the negative effects of motor vehicle use." Radar-based solutions have been proven to help reduce the speeds of motorists in areas with speed restrictions. Unfortunately, these solutions are expensive and difficult to import. Thus, this dissertation's main focus is to produce a detailed blueprint of a radar-based solution, with technical specifications that are similar to those of commercial and experimental systems at relatively low-cost. With the above mindset, the project was initiated with the user requirements being stated. Then a detailed study of current experimental and commercial radar-based traffic calming systems followed. Thereafter, the technical and non-technical requirements were derived from user requirements, and the technical specifications obtained from the literature study. A review of fundamental radar and signal processing principles was initiated to give background knowledge for the design and simulation process. Consequently, a detailed design of the system's functional components was conceptualized, which included the hardware, software, and electrical aspects of the system as well as the enclosure design. With the detailed design in mind, a data-collection system was built. The data-collection system was built to verify whether the technical specifications, which relate to the detection performance and the velocity accuracy of the proposed radar design, were met. This was done to save on buying all the components of the proposed system while proving the design's technical feasibility. The data-collection system consisted of a radar sensor, an Analogue to Digital Converter (ADC), and a laptop computer. The radar sensor was a k-band, Continuous Wave (CW) transceiver, which provided I/Q demodulated data with beat frequencies ranging from DC to 50 kHz. The ADC is an 8-bit Picoscope 2206B portable oscilloscope, capable of sampling frequencies of up to 50 MHz. The target detection and the velocity estimation algorithms were executed on a Samsung Series 7 Chronos laptop. Preliminary experiments enabled the approximation of the noise intensity of the scene in which the radar would be placed. These noise intensity values enabled the relationship between the Signal to Noise Ratio (SNR) and the velocity error to be modelled at specific ranges from the radar, which led to a series of experiments that verified the prototypes' ability to accurately detect and estimate the vehicle speed at distances of up to 40 meters from the radar. The cell-averaging constant false alarm rate (CA-CFAR) detector was chosen as an optimum detector for this application, and parameters that produced the best results were found to be 50 reference cells and 12 guard cells. The detection rate was found to be 100% for all coherent processing intervals (CPIs) tested. The prototype was able to detect vehicle speeds that ranged from 2 km/h up to 60 km/h with an uncertainty of ±0.415 km/h, ±0.276 km/h, and ±0.156 km/h using a CPI of 0.0128 s, 0.256 s, and 0.0512 s respectively. The optimal CPI was found to be 0.0512 s, as it had the lowest mean velocity uncertainty, and it produced the largest first detection SNR of the CPIs tested. These findings were crucial for the feasibility of manufacturing a low-cost traffic calming solution for the South African market
The Error is the Feature: how to Forecast Lightning using a Model Prediction Error
Despite the progress within the last decades, weather forecasting is still a
challenging and computationally expensive task. Current satellite-based
approaches to predict thunderstorms are usually based on the analysis of the
observed brightness temperatures in different spectral channels and emit a
warning if a critical threshold is reached. Recent progress in data science
however demonstrates that machine learning can be successfully applied to many
research fields in science, especially in areas dealing with large datasets. We
therefore present a new approach to the problem of predicting thunderstorms
based on machine learning. The core idea of our work is to use the error of
two-dimensional optical flow algorithms applied to images of meteorological
satellites as a feature for machine learning models. We interpret that optical
flow error as an indication of convection potentially leading to thunderstorms
and lightning. To factor in spatial proximity we use various manual convolution
steps. We also consider effects such as the time of day or the geographic
location. We train different tree classifier models as well as a neural network
to predict lightning within the next few hours (called nowcasting in
meteorology) based on these features. In our evaluation section we compare the
predictive power of the different models and the impact of different features
on the classification result. Our results show a high accuracy of 96% for
predictions over the next 15 minutes which slightly decreases with increasing
forecast period but still remains above 83% for forecasts of up to five hours.
The high false positive rate of nearly 6% however needs further investigation
to allow for an operational use of our approach.Comment: 10 pages, 7 figure
Innovative observing strategy and orbit determination for Low Earth Orbit Space Debris
We present the results of a large scale simulation, reproducing the behavior
of a data center for the build-up and maintenance of a complete catalog of
space debris in the upper part of the low Earth orbits region (LEO). The
purpose is to determine the performances of a network of advanced optical
sensors, through the use of the newest orbit determination algorithms developed
by the Department of Mathematics of Pisa (DM). Such a network has been proposed
to ESA in the Space Situational Awareness (SSA) framework by Carlo Gavazzi
Space SpA (CGS), Istituto Nazionale di Astrofisica (INAF), DM, and Istituto di
Scienza e Tecnologie dell'Informazione (ISTI-CNR). The conclusion is that it is
possible to use a network of optical sensors to build up a catalog containing
more than 98% of the objects with perigee height between 1100 and 2000 km,
which would be observable by a reference radar system selected as comparison.
It is also possible to maintain such a catalog within the accuracy requirements
motivated by collision avoidance, and to detect catastrophic fragmentation
events. However, such results depend upon specific assumptions on the sensor
and on the software technologies
- …