4,929 research outputs found

    Robust methods of building regression models : an application to the housing sector.

    Get PDF
    This article studies robustification strategies for the linear model in the presence of outliers. The advantages of an internal analysis of the robustness of least squares for a given sample are pointed out. The application of this methodology is illustrated by building an explicit model of the determinants of rental housing values in the Madrid Metropolitan Area.Outliers; Influential observations; Robust regression; Cook distance; Hedonic price function; Housing market;

    Ultrashort filaments of light in weakly-ionized, optically-transparent media

    Get PDF
    Modern laser sources nowadays deliver ultrashort light pulses reaching few cycles in duration, high energies beyond the Joule level and peak powers exceeding several terawatt (TW). When such pulses propagate through optically-transparent media, they first self-focus in space and grow in intensity, until they generate a tenuous plasma by photo-ionization. For free electron densities and beam intensities below their breakdown limits, these pulses evolve as self-guided objects, resulting from successive equilibria between the Kerr focusing process, the chromatic dispersion of the medium, and the defocusing action of the electron plasma. Discovered one decade ago, this self-channeling mechanism reveals a new physics, widely extending the frontiers of nonlinear optics. Implications include long-distance propagation of TW beams in the atmosphere, supercontinuum emission, pulse shortening as well as high-order harmonic generation. This review presents the landmarks of the 10-odd-year progress in this field. Particular emphasis is laid to the theoretical modeling of the propagation equations, whose physical ingredients are discussed from numerical simulations. Differences between femtosecond pulses propagating in gaseous or condensed materials are underlined. Attention is also paid to the multifilamentation instability of broad, powerful beams, breaking up the energy distribution into small-scale cells along the optical path. The robustness of the resulting filaments in adverse weathers, their large conical emission exploited for multipollutant remote sensing, nonlinear spectroscopy, and the possibility to guide electric discharges in air are finally addressed on the basis of experimental results.Comment: 50 pages, 38 figure

    Outlier Mining Methods Based on Graph Structure Analysis

    Get PDF
    Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version

    Anomaly and Change Detection in Remote Sensing Images

    Get PDF
    Earth observation through satellite sensors, models and in situ measurements provides a way to monitor our planet with unprecedented spatial and temporal resolution. The amount and diversity of the data which is recorded and made available is ever-increasing. This data allows us to perform crop yield prediction, track land-use change such as deforestation, monitor and respond to natural disasters and predict and mitigate climate change. The last two decades have seen a large increase in the application of machine learning algorithms in Earth observation in order to make efficient use of the growing data-stream. Machine learning algorithms, however, are typically model agnostic and too flexible and so end up not respecting fundamental laws of physics. On the other hand there has, in recent years, been an increase in research attempting to embed physics knowledge in machine learning algorithms in order to obtain interpretable and physically meaningful solutions. The main objective of this thesis is to explore different ways of encoding physical knowledge to provide machine learning methods tailored for specific problems in remote sensing.Ways of expressing expert knowledge about the relevant physical systems in remote sensing abound, ranging from simple relations between reflectance indices and biophysical parameters to complex models that compute the radiative transfer of electromagnetic radiation through our atmosphere, and differential equations that explain the dynamics of key parameters. This thesis focuses on inversion problems, emulation of radiative transfer models, and incorporation of the above-mentioned domain knowledge in machine learning algorithms for remote sensing applications. We explore new methods that can optimally model simulated and in-situ data jointly, incorporate differential equations in machine learning algorithms, handle more complex inversion problems and large-scale data, obtain accurate and computationally efficient emulators that are consistent with physical models, and that efficiently perform approximate Bayesian inversion over radiative transfer models

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications

    Doctor of Philosophy

    Get PDF
    dissertationThree-dimensional (3D) models of industrial plant primitives are used extensively in modern asset design, management, and visualization systems. Such systems allow users to efficiently perform tasks in Computer Aided Design (CAD), life-cycle management, construction progress monitoring, virtual reality training, marketing walk-throughs, or other visualization. Thus, capturing industrial plant models has correspondingly become a rapidly growing industry. The purpose of this research was to demonstrate an efficient way to ascertain physical model parameters of reflectance properties of industrial plant primitives for use in CAD and 3D modeling visualization systems. The first part of this research outlines the sources of error corresponding to 3D models created from Light Detection and Ranging (LiDAR) point clouds. Fourier analysis exposes the error due to a LiDAR system's finite sampling rate. Taylor expansion illustrates the errors associated with linearization due to flat polygonal surfaces. Finally, a statistical analysis of the error associated with LiDar scanner hardware is presented. The second part of this research demonstrates a method for determining Phong specular and Oren-Nayar diffuse reflectance parameters for modeling and rendering pipes, the most ubiquitous form of industrial plant primitives. For specular reflectance, the Phong model is used. Estimates of specular and diffuse parameters of two ideal cylinders and one measured cylinder using brightness data acquired from a LiDAR scanner are presented. The estimated reflectance model of the measured cylinder has a mean relative error of 2.88% and a standard deviation of relative error of 4.0%. The final part of this research describes a method for determining specular, diffuse and color material properties and applies the method to seven pipes from an industrial plant. The colorless specular and diffuse properties were estimated by numerically inverting LiDAR brightness data. The color ambient and diffuse properties are estimated using k-means clustering. The colorless properties yielded estimated brightness values that are within an RMS of 3.4% with a maximum of 7.0% and a minimum of 1.6%. The estimated color properties effected an RMS residual of 13.2% with a maximum of 20.3% and a minimum of 9.1%

    A delta Scuti distance to the Large Magellanic Cloud

    Full text link
    We present results from a well studied delta Scuti star discovered in the LMC. The absolute magnitude of the variable was determined from the PL relation for Galactic delta Scuti stars and from the theoretical modeling of the observed B,V,I light curves. The two methods give distance moduli for the LMC of 18.46+-0.19 and 18.48+-0.15, respectively, for a consistent value of the stellar reddening of E(B-V)=0.08+-0.02. We have also analyzed 24 delta Scuti candidates discovered in the OGLE II survey of the LMC, and 7 variables identified in the open cluster LW 55 and in the galaxy disk by Kaluzny et al. (2003, 2006). We find that the LMC delta Scuti stars define a PL relation whose slope is very similar to that defined by the Galactic delta Scuti variables, and yield a distance modulus for the LMC of 18.50+-0.22 mag. We compare the results obtained from the delta Scuti variables with those derived from the LMC RR Lyrae stars and Cepheids. Within the observational uncertainties, the three groups of pulsating stars yield very similar distance moduli. These moduli are all consistent with the "long" astronomical distance scale for the Large Magellanic Cloud.Comment: Accepted for publication on A
    • …
    corecore