7,296 research outputs found

    Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection

    Full text link
    In this paper, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. We discovered a condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help drive the process of dimension reduction

    Antitrust market definition using statistical learning techniques and consumer characteristics

    Get PDF
    Market definition is the first step in an antitrust case and relies on empirical evidence of substitution patterns. Cross-price elasticity estimates are preferred evidence for studying substitution patterns, due to advances in IO econometric modelling. However, the data and time requirements of these models weigh against their universal adoption for market definition purposes. These practical constraints Ć¢ā‚¬ā€ and the need for a greater variety of evidence Ć¢ā‚¬ā€ lead practitioners to rely on a larger set of less sophisticated tools for market definition. The paper proposes an addition to the existing toolkit, namely an analysis of consumer characteristics for market definition purposes. The paper shows how cluster analysis can be used to identify meaningful groups of substitutes on the basis of homogeneity of their consumer profiles. Cluster analysis enforces consistency, while recent bootstrap techniques ensure robust conclusions. To illustrate the tool, the paper relies on data from a recently concluded radio merger in South Africa.market definition substitutes media demography clusters bootstrap

    A novel prestack sparse azimuthal AVO inversion

    Full text link
    In this paper we demonstrate a new algorithm for sparse prestack azimuthal AVO inversion. A novel Euclidean prior model is developed to at once respect sparseness in the layered earth and smoothness in the model of reflectivity. Recognizing that methods of artificial intelligence and Bayesian computation are finding an every increasing role in augmenting the process of interpretation and analysis of geophysical data, we derive a generalized matrix-variate model of reflectivity in terms of orthogonal basis functions, subject to sparse constraints. This supports a direct application of machine learning methods, in a way that can be mapped back onto the physical principles known to govern reflection seismology. As a demonstration we present an application of these methods to the Marcellus shale. Attributes extracted using the azimuthal inversion are clustered using an unsupervised learning algorithm. Interpretation of the clusters is performed in the context of the Ruger model of azimuthal AVO

    Cramer-Rao Lower Bound for Point Based Image Registration with Heteroscedastic Error Model for Application in Single Molecule Microscopy

    Full text link
    The Cramer-Rao lower bound for the estimation of the affine transformation parameters in a multivariate heteroscedastic errors-in-variables model is derived. The model is suitable for feature-based image registration in which both sets of control points are localized with errors whose covariance matrices vary from point to point. With focus given to the registration of fluorescence microscopy images, the Cramer-Rao lower bound for the estimation of a feature's position (e.g. of a single molecule) in a registered image is also derived. In the particular case where all covariance matrices for the localization errors are scalar multiples of a common positive definite matrix (e.g. the identity matrix), as can be assumed in fluorescence microscopy, then simplified expressions for the Cramer-Rao lower bound are given. Under certain simplifying assumptions these expressions are shown to match asymptotic distributions for a previously presented set of estimators. Theoretical results are verified with simulations and experimental data

    JIDT: An information-theoretic toolkit for studying the dynamics of complex systems

    Get PDF
    Complex systems are increasingly being viewed as distributed information processing systems, particularly in the domains of computational neuroscience, bioinformatics and Artificial Life. This trend has resulted in a strong uptake in the use of (Shannon) information-theoretic measures to analyse the dynamics of complex systems in these fields. We introduce the Java Information Dynamics Toolkit (JIDT): a Google code project which provides a standalone, (GNU GPL v3 licensed) open-source code implementation for empirical estimation of information-theoretic measures from time-series data. While the toolkit provides classic information-theoretic measures (e.g. entropy, mutual information, conditional mutual information), it ultimately focusses on implementing higher-level measures for information dynamics. That is, JIDT focusses on quantifying information storage, transfer and modification, and the dynamics of these operations in space and time. For this purpose, it includes implementations of the transfer entropy and active information storage, their multivariate extensions and local or pointwise variants. JIDT provides implementations for both discrete and continuous-valued data for each measure, including various types of estimator for continuous data (e.g. Gaussian, box-kernel and Kraskov-Stoegbauer-Grassberger) which can be swapped at run-time due to Java's object-oriented polymorphism. Furthermore, while written in Java, the toolkit can be used directly in MATLAB, GNU Octave, Python and other environments. We present the principles behind the code design, and provide several examples to guide users.Comment: 37 pages, 4 figure

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developersā€™ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    SIVO: Semantically Informed Visual Odometry and Mapping

    Get PDF
    Accurate localization is a requirement for any autonomous mobile robot. In recent years, cameras have proven to be a reliable, cheap, and effective sensor to achieve this goal. Visual simultaneous localization and mapping (SLAM) algorithms determine camera motion by tracking the motion of reference points from the scene. However, these references must be static, as well as viewpoint, scale, and rotation invariant in order to ensure accurate localization. This is especially paramount for long-term robot operation, where we require our references to be stable over long durations and also require careful point selection to maintain the runtime and storage complexity of the algorithm while the robot navigates through its environment. In this thesis, we present SIVO (Semantically Informed Visual Odometry and Mapping), a novel feature selection method for visual SLAM which incorporates machine learning and neural network uncertainty into an information-theoretic approach to feature selection. The emergence of deep learning techniques has resulted in remarkable advances in scene understanding, and our method supplements traditional visual SLAM with this contextual knowledge. Our algorithm selects points which provide significant information to reduce the uncertainty of the state estimate while ensuring that the feature is detected to be a static object repeatedly, with a high confidence. This is done by evaluating the reduction in Shannon entropy between the current state entropy, and the joint entropy of the state given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Our method is evaluated against ORB SLAM2 and the ground truth of the KITTI odometry dataset. Overall, SIVO performs comparably to ORB SLAM2 (average of 0.17% translation error difference, 6.2 Ɨ 10 āˆ’5 deg/m rotation error difference) while removing 69% of the map points on average. As the reference points selected are from static objects (building, traffic signs, etc.), the map generated using our algorithm is suitable for long-term localization
    • ā€¦
    corecore