20,317 research outputs found

    ANALYZING BIG DATA WITH DECISION TREES

    Get PDF
    ANALYZING BIG DATA WITH DECISION TREE

    Genetic algorithms: a tool for optimization in econometrics - basic concept and an example for empirical applications

    Get PDF
    This paper discusses a tool for optimization of econometric models based on genetic algorithms. First, we briefly describe the concept of this optimization technique. Then, we explain the design of a specifically developed algorithm and apply it to a difficult econometric problem, the semiparametric estimation of a censored regression model. We carry out some Monte Carlo simulations and compare the genetic algorithm with another technique, the iterative linear programming algorithm, to run the censored least absolute deviation estimator. It turns out that both algorithms lead to similar results in this case, but that the proposed method is computationally more stable than its competitor. --Genetic Algorithm,Semiparametrics,Monte Carlo Simulation

    Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality

    Full text link
    In an era where accumulating data is easy and storing it inexpensive, feature selection plays a central role in helping to reduce the high-dimensionality of huge amounts of otherwise meaningless data. In this paper, we propose a graph-based method for feature selection that ranks features by identifying the most important ones into arbitrary set of cues. Mapping the problem on an affinity graph-where features are the nodes-the solution is given by assessing the importance of nodes through some indicators of centrality, in particular, the Eigen-vector Centrality (EC). The gist of EC is to estimate the importance of a feature as a function of the importance of its neighbors. Ranking central nodes individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. Our approach has been tested on 7 diverse datasets from recent literature (e.g., biological data and object recognition, among others), and compared against filter, embedded and wrappers methods. The results are remarkable in terms of accuracy, stability and low execution time.Comment: Preprint version - Lecture Notes in Computer Science - Springer 201

    A hybrid swarm-based algorithm for single-objective optimization problems involving high-cost analyses

    Full text link
    In many technical fields, single-objective optimization procedures in continuous domains involve expensive numerical simulations. In this context, an improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of the ABC structure and hybridizations with interpolation strategies. The latter are inspired by the quadratic trust region approach for local investigation and by an efficient global optimizer for separable problems. Each modification and their combined effects are studied with appropriate metrics on a numerical benchmark, which is also used for comparing AsBeC with some effective ABC variants and other derivative-free algorithms. In addition, the presented algorithm is validated on two recent benchmarks adopted for competitions in international conferences. Results show remarkable competitiveness and robustness for AsBeC.Comment: 19 pages, 4 figures, Springer Swarm Intelligenc

    DESIGN OF GEODETIC NETWORKS BASED ON OUTLIER IDENTIFICATION CRITERIA: AN EXAMPLE APPLIED TO THE LEVELING NETWORK

    Get PDF
    We present a numerical simulation method for designing geodetic networks. The quality criterion considered is based on the power of the test of data snooping testing procedure. This criterion expresses the probability of the data snooping to identify correctly an outlier. In general, the power of the test is defined theoretically. However, with the advent of the fast computers and large data storage systems, it can be estimated using numerical simulation. Here, the number of experiments in which the data snooping procedure identifies the outlier correctly is counted using Monte Carlos simulations. If the network configuration does not meet the reliability criterion at some part, then it can be improved by adding required observation to the surveying plan. The method does not use real observations. Thus, it depends on the geometrical configuration of the network; the uncertainty of the observations; and the size of outlier. The proposed method is demonstrated by practical application of one simulated leveling network. Results showed the needs of five additional observations between adjacent stations. The addition of these new observations improved the internal reliability of approximately 18%. Therefore, the final designed network must be able to identify and resist against the undetectable outliers – according to the probability levels
    • …
    corecore