108 research outputs found
Recommended from our members
Design and Implementation of an Anomaly Detector
This paper describes the design and implementation of a general-purpose anomaly detector for streaming data. Based on a survey of similar work from the literature, a basic anomaly detector builds a model on normal data, compares this model to incoming data, and uses a threshold to determine when the incoming data represent an anomaly. Models compactly represent the data but still allow for effective comparison. Comparison methods determine the distance between two models of data or the distance between a model and a point. Threshold selection is a largely neglected problem in the literature, but the current implementation includes two methods to estimate thresholds from normal data. With these components, a user can construct a variety of anomaly detection schemes. The implementation contains several methods from the literature. Three separate experiments tested the performance of the components on two well-known and one completely artificial dataset. The results indicate that the implementation works and can reproduce results from previous experiments
Group Leaders Optimization Algorithm
We present a new global optimization algorithm in which the influence of the
leaders in social groups is used as an inspiration for the evolutionary
technique which is designed into a group architecture. To demonstrate the
efficiency of the method, a standard suite of single and multidimensional
optimization functions along with the energies and the geometric structures of
Lennard-Jones clusters are given as well as the application of the algorithm on
quantum circuit design problems. We show that as an improvement over previous
methods, the algorithm scales as N^2.5 for the Lennard-Jones clusters of
N-particles. In addition, an efficient circuit design is shown for two qubit
Grover search algorithm which is a quantum algorithm providing quadratic
speed-up over the classical counterpart
Mutagenesis as a Diversity Enhancer and Preserver in Evolution Strategies
Proceedings of: 9th International Symposium on Distributed Computing and Artificial Intelligence (DCAI 2012). Salamanca, March 28-30, 2012Mutagenesis is a process which forces the coverage of certain zones of the search space during the generations of an evolution strategy, by keeping track of the covered ranges for the different variables in the so called gene matrix. Originally introduced as an artifact to control the automated stopping criterion in a memetic algorithm, ESLAT, it also improved the exploration capabilities of the algorithm, even though this was considered a secondary matter and not properly analyzed or tested. This work focuses on this diversity enhancement, redefining mutagenesis to increase this characteristic, measuring this improvement over a set of twenty-seven unconstrained optimization functions to provide statistically significant results.This work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.Publicad
Grammatical evolution decision trees for detecting gene-gene interactions
<p>Abstract</p> <p>Background</p> <p>A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.</p> <p>Methods</p> <p>Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.</p> <p>Results</p> <p>The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.</p> <p>Conclusions</p> <p>GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p
A Deeper Look at DES Dwarf Galaxy Candidates: Grus I and Indus II
We present deep g- and r-band Magellan/Megacam photometry of two dwarf galaxy candidates discovered in the Dark Energy Survey (DES), Grus I and Indus II (DES J2038-4609). For the case of Grus I, we resolved the main sequence turn-off (MSTO) and similar to 2 mags below it. The MSTO can be seen at g(0) similar to 24 with a photometric uncertainty of 0.03 mag. We show Grus I to be consistent with an old, metal-poor (similar to 13.3 Gyr, [Fe/H] similar to -1.9) dwarf galaxy. We derive updated distance and structural parameters for Grus I using this deep, uniform, wide-field data set. We find an azimuthally-averaged halflight radius more than two times larger (similar to 151(-31)(+21) pc; similar to 4'. 16(-0.74)(+0.54)) and an absolute V-band magnitude similar to-4.1 that is similar to 1 magnitude brighter than previous studies. We obtain updated distance, ellipticity, and centroid parameters that are in agreement with other studies within uncertainties. Although our photometry of Indus II is similar to 2-3 magnitudes deeper than the DES Y1 public release, we find no coherent stellar population at its reported location. The original detection was located in an incomplete region of sky in the DES Y2Q1 data set and was flagged due to potential blue horizontal branch member stars. The best-fit isochrone parameters are physically inconsistent with both dwarf galaxies and globular clusters. We conclude that Indus II is likely a false positive, flagged due to a chance alignment of stars along the line of sight
Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk
BACKGROUND: In order to detect potential disease clusters where a putative source cannot be specified, classical procedures scan the geographical area with circular windows through a specified grid imposed to the map. However, the choice of the windows' shapes, sizes and centers is critical and different choices may not provide exactly the same results. The aim of our work was to use an Oblique Decision Tree model (ODT) which provides potential clusters without pre-specifying shapes, sizes or centers. For this purpose, we have developed an ODT-algorithm to find an oblique partition of the space defined by the geographic coordinates. METHODS: ODT is based on the classification and regression tree (CART). As CART finds out rectangular partitions of the covariate space, ODT provides oblique partitions maximizing the interclass variance of the independent variable. Since it is a NP-Hard problem in R(N), classical ODT-algorithms use evolutionary procedures or heuristics. We have developed an optimal ODT-algorithm in R(2), based on the directions defined by each couple of point locations. This partition provided potential clusters which can be tested with Monte-Carlo inference. We applied the ODT-model to a dataset in order to identify potential high risk clusters of malaria in a village in Western Africa during the dry season. The ODT results were compared with those of the Kulldorff' s SaTScanâą. RESULTS: The ODT procedure provided four classes of risk of infection. In the first high risk class 60%, 95% confidence interval (CI95%) [52.22â67.55], of the children was infected. Monte-Carlo inference showed that the spatial pattern issued from the ODT-model was significant (p < 0.0001). Satscan results yielded one significant cluster where the risk of disease was high with an infectious rate of 54.21%, CI95% [47.51â60.75]. Obviously, his center was located within the first high risk ODT class. Both procedures provided similar results identifying a high risk cluster in the western part of the village where a mosquito breeding point was located. CONCLUSION: ODT-models improve the classical scanning procedures by detecting potential disease clusters independently of any specification of the shapes, sizes or centers of the clusters
Neural networks for genetic epidemiology: past, present, and future
During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes
Dark Energy Survey Year 3 Results: Deep Field optical + near-infrared images and catalogue
We describe the Dark Energy Survey (DES) Deep Fields, a set of images and associated multiwavelength catalogue (ugrizJHKs) built from Dark Energy Camera (DECam) and Visible and Infrared Survey Telescope for Astronomy (VISTA) data. The DES Deep Fields comprise 11 fields (10 DES supernova fields plus COSMOS), with a total area of âŒ30 sq. deg. in ugriz bands and reaching a maximum i-band depth of 26.75 (AB, 10Ï, 2 arcsec). We present a catalogue for the DES 3-yr cosmology analysis of those four fields with full 8-band coverage, totalling 5.88 sq. deg. after masking. Numbering 2.8 million objects (1.6 million post-masking), our catalogue is drawn from images coadded to consistent depths of r = 25.7, i = 25, and z = 24.3 mag. We use a new model-fitting code, built upon established methods, to deblend sources and ensure consistent colours across the u-band to Ks-band wavelength range. We further detail the tight control we maintain over the point-spread function modelling required for the model fitting, astrometry and consistency of photometry between the four fields. The catalogue allows us to perform a careful star-galaxy separation and produces excellent photometric redshift performance (NMAD = 0.023 at i < 23). The Deep-Fields catalogue will be made available as part of the cosmology data products release, following the completion of the DES 3-yr weak lensing and galaxy clustering cosmology work
Recommended from our members
Selection Intensity in Genetic Algorithms with Generation Gaps
This paper presents calculations of the selection intensity of common selection and replacement methods used in genetic algorithms (GAs) with generation gaps. The selection intensity measures the increase of the average fitness of the population after selection, and it can be used to predict the average fitness of the population at each iteration as well as the number of steps until the population converges to a unique solution. In addition, the theory explains the fast convergence of some algorithms with small generation gaps. The accuracy of the calculations was verified experimentally with a simple test function. The results of this study facilitate comparisons between different algorithms, and provide a tool to adjust the selection pressure, which is indispensable to obtain robust algorithms
Recommended from our members
Using Evolutionary Algorithms to Induce Oblique Decision Trees
This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision tree induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that this can be accomplished in a shorter time. Experiments were performed with a (1+1) evolutionary strategy and a simple genetic algorithm on public domain and artificial data sets. The empirical results suggest that the EAs quickly find Competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of training instances
- âŠ