18,260 research outputs found
Neighborhood Counting Measure Metric and Minimum Risk Metric: An empirical comparison
Wang in a PAMI paper proposed Neighborhood Counting Measure (NCM) as a similarity measure for the k-nearest neighbors classification algorithm. In his paper, Wang mentioned Minimum Risk Metric (MRM) an earlier method based on the minimization of the risk of misclassification. However, Wang did not compare NCM with MRM because of its allegedly excessive computational load. In this letter, we empirically compare NCM against MRM on k-NN with k=1, 3, 5, 7 and 11 with decision taken with a voting scheme and k=21 with decision taken with a weighted voting scheme on the same datasets used by Wang. Our results shows that MRM outperforms NCM for most of the k values tested. Moreover, we show that the MRM computation is not so probihibitive as indicated by Wang. ©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE
Network properties of written human language
We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures
A comparative evaluation of nonlinear dynamics methods for time series prediction
A key problem in time series prediction using autoregressive models is to fix the model order, namely the number of past samples required to model the time series adequately. The estimation of the model order using cross-validation may be a long process. In this paper, we investigate alternative methods to cross-validation, based on nonlinear dynamics methods, namely Grassberger-Procaccia, K,gl, Levina-Bickel and False Nearest Neighbors algorithms. The experiments have been performed in two different ways. In the first case, the model order has been used to carry out the prediction, performed by a SVM for regression on three real data time series showing that nonlinear dynamics methods have performances very close to the cross-validation ones. In the second case, we have tested the accuracy of nonlinear dynamics methods in predicting the known model order of synthetic time series. In this case, most of the methods have yielded a correct estimate and when the estimate was not correct, the value was very close to the real one
Terminology mining in social media
The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exemplifies a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining
Combinatorial and Asymptotical Results on the Neighborhood Grid
In 2009, Joselli et al introduced the Neighborhood Grid data structure for
fast computation of neighborhood estimates in point clouds. Even though the
data structure has been used in several applications and shown to be
practically relevant, it is theoretically not yet well understood. The purpose
of this paper is to present a polynomial-time algorithm to build the data
structure. Furthermore, it is investigated whether the presented algorithm is
optimal. This investigations leads to several combinatorial questions for which
partial results are given. Finally, we present several limits and experiments
regarding the quality of the obtained neighborhood relation.Comment: 33 pages, 18 Figure
Bootstrap Robust Prescriptive Analytics
We address the problem of prescribing an optimal decision in a framework
where its cost depends on uncertain problem parameters that need to be
learned from data. Earlier work by Bertsimas and Kallus (2014) transforms
classical machine learning methods that merely predict from supervised
training data into prescriptive methods
taking optimal decisions specific to a particular covariate context .
Their prescriptive methods factor in additional observed contextual information
on a potentially large number of covariates to take context specific
actions which are superior to any static decision . Any naive
use of limited training data may, however, lead to gullible decisions
over-calibrated to one particular data set. In this paper, we borrow ideas from
distributionally robust optimization and the statistical bootstrap of Efron
(1982) to propose two novel prescriptive methods based on (nw) Nadaraya-Watson
and (nn) nearest-neighbors learning which safeguard against overfitting and
lead to improved out-of-sample performance. Both resulting robust prescriptive
methods reduce to tractable convex optimization problems and enjoy a limited
disappointment on bootstrap data. We illustrate the data-driven decision-making
framework and our novel robustness notion on a small news vendor problem as
well as a small portfolio allocation problem
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
- …