11,224 research outputs found

    Foundational principles for large scale inference: Illustrations through correlation mining

    Full text link
    When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number nn of acquired samples (statistical replicates) is far fewer than the number pp of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size nn is fixed, and the dimension pp grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

    The Design of a Low-Cost Traffic Calming Radar - Development of a radar solution intended to demonstrate proof of concept

    Get PDF
    This study aimed to develop a radar solution that would aid the traffic calming efforts of the CSIR business campus. The Institute of Transportation Engineers defined traffic calming as "The combination of mainly physical measures that reduce the negative effects of motor vehicle use." Radar-based solutions have been proven to help reduce the speeds of motorists in areas with speed restrictions. Unfortunately, these solutions are expensive and difficult to import. Thus, this dissertation's main focus is to produce a detailed blueprint of a radar-based solution, with technical specifications that are similar to those of commercial and experimental systems at relatively low-cost. With the above mindset, the project was initiated with the user requirements being stated. Then a detailed study of current experimental and commercial radar-based traffic calming systems followed. Thereafter, the technical and non-technical requirements were derived from user requirements, and the technical specifications obtained from the literature study. A review of fundamental radar and signal processing principles was initiated to give background knowledge for the design and simulation process. Consequently, a detailed design of the system's functional components was conceptualized, which included the hardware, software, and electrical aspects of the system as well as the enclosure design. With the detailed design in mind, a data-collection system was built. The data-collection system was built to verify whether the technical specifications, which relate to the detection performance and the velocity accuracy of the proposed radar design, were met. This was done to save on buying all the components of the proposed system while proving the design's technical feasibility. The data-collection system consisted of a radar sensor, an Analogue to Digital Converter (ADC), and a laptop computer. The radar sensor was a k-band, Continuous Wave (CW) transceiver, which provided I/Q demodulated data with beat frequencies ranging from DC to 50 kHz. The ADC is an 8-bit Picoscope 2206B portable oscilloscope, capable of sampling frequencies of up to 50 MHz. The target detection and the velocity estimation algorithms were executed on a Samsung Series 7 Chronos laptop. Preliminary experiments enabled the approximation of the noise intensity of the scene in which the radar would be placed. These noise intensity values enabled the relationship between the Signal to Noise Ratio (SNR) and the velocity error to be modelled at specific ranges from the radar, which led to a series of experiments that verified the prototypes' ability to accurately detect and estimate the vehicle speed at distances of up to 40 meters from the radar. The cell-averaging constant false alarm rate (CA-CFAR) detector was chosen as an optimum detector for this application, and parameters that produced the best results were found to be 50 reference cells and 12 guard cells. The detection rate was found to be 100% for all coherent processing intervals (CPIs) tested. The prototype was able to detect vehicle speeds that ranged from 2 km/h up to 60 km/h with an uncertainty of ±0.415 km/h, ±0.276 km/h, and ±0.156 km/h using a CPI of 0.0128 s, 0.256 s, and 0.0512 s respectively. The optimal CPI was found to be 0.0512 s, as it had the lowest mean velocity uncertainty, and it produced the largest first detection SNR of the CPIs tested. These findings were crucial for the feasibility of manufacturing a low-cost traffic calming solution for the South African market

    The Error is the Feature: how to Forecast Lightning using a Model Prediction Error

    Full text link
    Despite the progress within the last decades, weather forecasting is still a challenging and computationally expensive task. Current satellite-based approaches to predict thunderstorms are usually based on the analysis of the observed brightness temperatures in different spectral channels and emit a warning if a critical threshold is reached. Recent progress in data science however demonstrates that machine learning can be successfully applied to many research fields in science, especially in areas dealing with large datasets. We therefore present a new approach to the problem of predicting thunderstorms based on machine learning. The core idea of our work is to use the error of two-dimensional optical flow algorithms applied to images of meteorological satellites as a feature for machine learning models. We interpret that optical flow error as an indication of convection potentially leading to thunderstorms and lightning. To factor in spatial proximity we use various manual convolution steps. We also consider effects such as the time of day or the geographic location. We train different tree classifier models as well as a neural network to predict lightning within the next few hours (called nowcasting in meteorology) based on these features. In our evaluation section we compare the predictive power of the different models and the impact of different features on the classification result. Our results show a high accuracy of 96% for predictions over the next 15 minutes which slightly decreases with increasing forecast period but still remains above 83% for forecasts of up to five hours. The high false positive rate of nearly 6% however needs further investigation to allow for an operational use of our approach.Comment: 10 pages, 7 figure

    Innovative observing strategy and orbit determination for Low Earth Orbit Space Debris

    Full text link
    We present the results of a large scale simulation, reproducing the behavior of a data center for the build-up and maintenance of a complete catalog of space debris in the upper part of the low Earth orbits region (LEO). The purpose is to determine the performances of a network of advanced optical sensors, through the use of the newest orbit determination algorithms developed by the Department of Mathematics of Pisa (DM). Such a network has been proposed to ESA in the Space Situational Awareness (SSA) framework by Carlo Gavazzi Space SpA (CGS), Istituto Nazionale di Astrofisica (INAF), DM, and Istituto di Scienza e Tecnologie dell'Informazione (ISTI-CNR). The conclusion is that it is possible to use a network of optical sensors to build up a catalog containing more than 98% of the objects with perigee height between 1100 and 2000 km, which would be observable by a reference radar system selected as comparison. It is also possible to maintain such a catalog within the accuracy requirements motivated by collision avoidance, and to detect catastrophic fragmentation events. However, such results depend upon specific assumptions on the sensor and on the software technologies
    corecore