23,506 research outputs found
The detection of globular clusters in galaxies as a data mining problem
We present an application of self-adaptive supervised learning classifiers
derived from the Machine Learning paradigm, to the identification of candidate
Globular Clusters in deep, wide-field, single band HST images. Several methods
provided by the DAME (Data Mining & Exploration) web application, were tested
and compared on the NGC1399 HST data described in Paolillo 2011. The best
results were obtained using a Multi Layer Perceptron with Quasi Newton learning
rule which achieved a classification accuracy of 98.3%, with a completeness of
97.8% and 1.6% of contamination. An extensive set of experiments revealed that
the use of accurate structural parameters (effective radius, central surface
brightness) does improve the final result, but only by 5%. It is also shown
that the method is capable to retrieve also extreme sources (for instance, very
extended objects) which are missed by more traditional approaches.Comment: Accepted 2011 December 12; Received 2011 November 28; in original
form 2011 October 1
Lecture notes on ridge regression
The linear regression model cannot be fitted to high-dimensional data, as the
high-dimensionality brings about empirical non-identifiability. Penalized
regression overcomes this non-identifiability by augmentation of the loss
function by a penalty (i.e. a function of regression coefficients). The ridge
penalty is the sum of squared regression coefficients, giving rise to ridge
regression. Here many aspect of ridge regression are reviewed e.g. moments,
mean squared error, its equivalence to constrained estimation, and its relation
to Bayesian regression. Finally, its behaviour and use are illustrated in
simulation and on omics data. Subsequently, ridge regression is generalized to
allow for a more general penalty. The ridge penalization framework is then
translated to logistic regression and its properties are shown to carry over.
To contrast ridge penalized estimation, the final chapter introduces its lasso
counterpart
Local coherence and deflation of the low quark modes in lattice QCD
The spontaneous breaking of chiral symmetry in QCD is known to be linked to a
non-zero density of eigenvalues of the massless Dirac operator near the origin.
Numerical studies of two-flavour QCD now suggest that the low quark modes are
locally coherent to a certain extent. As a consequence, the modes can be
simultaneously deflated, using local projectors, with a total computational
effort proportional to the lattice volume (rather than its square). Deflation
has potentially many uses in lattice QCD. The technique is here worked out for
the case of quark propagator calculations, where large speed-up factors and a
flat scaling behaviour with respect to the quark mass are achieved.Comment: Plain TeX, 23 pages, 4 figures included; minor text modifications;
version published in JHE
Cumulative sum quality control charts design and applications
Includes bibliographical references (pages 165-169).Classical Statistical Process Control Charts are essential in Statistical Control exercises and thus constantly obtained attention for quality improvements. However, the establishment of control charts requires large-sample data (say, no less than I 000 data points). On the other hand, we notice that the small-sample based Grey System Theory Approach is well-established and applied in many areas: social, economic, industrial, military and scientific research fields. In this research, the short time trend curve in terms of GM( I, I) model will be merged into Shewhart and CU SUM two-sided version control charts and establish Grey Predictive Shewhart Control chart and Grey Predictive CUSUM control chart. On the other hand the GM(2, I) model is briefly checked its of how accurate it could be as compared to GM( I, 1) model in control charts. Industrial process data collected from TBF Packaging Machine Company in Taiwan was analyzed in terms of these new developments as an illustrative example for grey quality control charts
Atmospheric extinction properties above Mauna Kea from the Nearby Supernova Factory spectro-photometric data set
We present a new atmospheric extinction curve for Mauna Kea spanning
3200--9700 \AA. It is the most comprehensive to date, being based on some 4285
standard star spectra obtained on 478 nights spread over a period of 7 years
obtained by the Nearby SuperNova Factory using the SuperNova Integral Field
Spectrograph. This mean curve and its dispersion can be used as an aid in
calibrating spectroscopic or imaging data from Mauna Kea, and in estimating the
calibration uncertainty associated with the use of a mean extinction curve. Our
method for decomposing the extinction curve into physical components, and the
ability to determine the chromatic portion of the extinction even on cloudy
nights, is described and verified over the wide range of conditions sampled by
our large dataset. We demonstrate good agreement with atmospheric science data
obtain at nearby Mauna Loa Observatory, and with previously published
measurements of the extinction above Mauna Kea.Comment: 22 pages, 24 figures, 6 table
Data comparison schemes for Pattern Recognition in Digital Images using Fractals
Pattern recognition in digital images is a common problem with application in
remote sensing, electron microscopy, medical imaging, seismic imaging and
astrophysics for example. Although this subject has been researched for over
twenty years there is still no general solution which can be compared with the
human cognitive system in which a pattern can be recognised subject to
arbitrary orientation and scale.
The application of Artificial Neural Networks can in principle provide a very
general solution providing suitable training schemes are implemented.
However, this approach raises some major issues in practice. First, the CPU
time required to train an ANN for a grey level or colour image can be very
large especially if the object has a complex structure with no clear geometrical
features such as those that arise in remote sensing applications. Secondly,
both the core and file space memory required to represent large images and
their associated data tasks leads to a number of problems in which the use of
virtual memory is paramount.
The primary goal of this research has been to assess methods of image data
compression for pattern recognition using a range of different compression
methods. In particular, this research has resulted in the design and
implementation of a new algorithm for general pattern recognition based on
the use of fractal image compression.
This approach has for the first time allowed the pattern recognition problem to
be solved in a way that is invariant of rotation and scale. It allows both ANNs
and correlation to be used subject to appropriate pre-and post-processing
techniques for digital image processing on aspect for which a dedicated
programmer's work bench has been developed using X-Designer
- …