18,260 research outputs found

    Neighborhood Counting Measure Metric and Minimum Risk Metric: An empirical comparison

    Get PDF
    Wang in a PAMI paper proposed Neighborhood Counting Measure (NCM) as a similarity measure for the k-nearest neighbors classification algorithm. In his paper, Wang mentioned Minimum Risk Metric (MRM) an earlier method based on the minimization of the risk of misclassification. However, Wang did not compare NCM with MRM because of its allegedly excessive computational load. In this letter, we empirically compare NCM against MRM on k-NN with k=1, 3, 5, 7 and 11 with decision taken with a voting scheme and k=21 with decision taken with a weighted voting scheme on the same datasets used by Wang. Our results shows that MRM outperforms NCM for most of the k values tested. Moreover, we show that the MRM computation is not so probihibitive as indicated by Wang. ©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    Network properties of written human language

    Get PDF
    We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures

    A comparative evaluation of nonlinear dynamics methods for time series prediction

    Get PDF
    A key problem in time series prediction using autoregressive models is to fix the model order, namely the number of past samples required to model the time series adequately. The estimation of the model order using cross-validation may be a long process. In this paper, we investigate alternative methods to cross-validation, based on nonlinear dynamics methods, namely Grassberger-Procaccia, K,gl, Levina-Bickel and False Nearest Neighbors algorithms. The experiments have been performed in two different ways. In the first case, the model order has been used to carry out the prediction, performed by a SVM for regression on three real data time series showing that nonlinear dynamics methods have performances very close to the cross-validation ones. In the second case, we have tested the accuracy of nonlinear dynamics methods in predicting the known model order of synthetic time series. In this case, most of the methods have yielded a correct estimate and when the estimate was not correct, the value was very close to the real one

    Terminology mining in social media

    Get PDF
    The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exemplifies a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining

    Combinatorial and Asymptotical Results on the Neighborhood Grid

    Full text link
    In 2009, Joselli et al introduced the Neighborhood Grid data structure for fast computation of neighborhood estimates in point clouds. Even though the data structure has been used in several applications and shown to be practically relevant, it is theoretically not yet well understood. The purpose of this paper is to present a polynomial-time algorithm to build the data structure. Furthermore, it is investigated whether the presented algorithm is optimal. This investigations leads to several combinatorial questions for which partial results are given. Finally, we present several limits and experiments regarding the quality of the obtained neighborhood relation.Comment: 33 pages, 18 Figure

    Bootstrap Robust Prescriptive Analytics

    Full text link
    We address the problem of prescribing an optimal decision in a framework where its cost depends on uncertain problem parameters YY that need to be learned from data. Earlier work by Bertsimas and Kallus (2014) transforms classical machine learning methods that merely predict YY from supervised training data [(x1,y1),,(xn,yn)][(x_1, y_1), \dots, (x_n, y_n)] into prescriptive methods taking optimal decisions specific to a particular covariate context X=xˉX=\bar x. Their prescriptive methods factor in additional observed contextual information on a potentially large number of covariates X=xˉX=\bar x to take context specific actions z(xˉ)z(\bar x) which are superior to any static decision zz. Any naive use of limited training data may, however, lead to gullible decisions over-calibrated to one particular data set. In this paper, we borrow ideas from distributionally robust optimization and the statistical bootstrap of Efron (1982) to propose two novel prescriptive methods based on (nw) Nadaraya-Watson and (nn) nearest-neighbors learning which safeguard against overfitting and lead to improved out-of-sample performance. Both resulting robust prescriptive methods reduce to tractable convex optimization problems and enjoy a limited disappointment on bootstrap data. We illustrate the data-driven decision-making framework and our novel robustness notion on a small news vendor problem as well as a small portfolio allocation problem

    Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election

    Get PDF
    Social media has become an emerging alternative to opinion polls for public opinion collection, while it is still posing many challenges as a passive data source, such as structurelessness, quantifiability, and representativeness. Social media data with geotags provide new opportunities to unveil the geographic locations of users expressing their opinions. This paper aims to answer two questions: 1) whether quantifiable measurement of public opinion can be obtained from social media and 2) whether it can produce better or complementary measures compared to opinion polls. This research proposes a novel approach to measure the relative opinion of Twitter users towards public issues in order to accommodate more complex opinion structures and take advantage of the geography pertaining to the public issues. To ensure that this new measure is technically feasible, a modeling framework is developed including building a training dataset by adopting a state-of-the-art approach and devising a new deep learning method called Opinion-Oriented Word Embedding. With a case study of the tweets selected for the 2016 U.S. presidential election, we demonstrate the predictive superiority of our relative opinion approach and we show how it can aid visual analytics and support opinion predictions. Although the relative opinion measure is proved to be more robust compared to polling, our study also suggests that the former can advantageously complement the later in opinion prediction
    corecore