16,861 research outputs found

    Scatteract: Automated extraction of data from scatter plots

    Full text link
    Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.Comment: Submitted to ECML PKDD 2017 proceedings, 16 page

    Skewed Factor Models Using Selection Mechanisms

    Get PDF
    Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset

    Computational steering of a multi-objective genetic algorithm using a PDA

    Get PDF
    The execution process of a genetic algorithm typically involves some trial-and-error. This is due to the difficulty in setting the initial parameters of the algorithm ā€“ especially when little is known about the problem domain. The problem is magnified when applied to multi-objective optimisation, as care is needed to ensure that the final population of candidate solutions is representative of the trade-off surface. We propose a computational steering system that allows the engineer to interact with the optimisation routine during execution. This interaction can be as simple as monitoring the values of some parameters during the execution process, or could involve altering those parameters to influence the quality of the solutions produce by the optimisation process

    5G Positioning and Mapping with Diffuse Multipath

    Get PDF
    5G mmWave communication is useful for positioning due to the geometric connection between the propagation channel and the propagation environment. Channel estimation methods can exploit the resulting sparsity to estimate parameters(delay and angles) of each propagation path, which in turn can be exploited for positioning and mapping. When paths exhibit significant spread in either angle or delay, these methods breakdown or lead to significant biases. We present a novel tensor-based method for channel estimation that allows estimation of mmWave channel parameters in a non-parametric form. The method is able to accurately estimate the channel, even in the absence of a specular component. This in turn enables positioning and mapping using only diffuse multipath. Simulation results are provided to demonstrate the efficacy of the proposed approach

    Data mining in bioinformatics using Weka

    Get PDF
    The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it

    Computationally Efficient and Robust BIC-Based Speaker Segmentation

    Get PDF
    An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches

    Methods for fast and reliable clustering

    Get PDF
    • ā€¦
    corecore