56,697 research outputs found
Learning by stochastic serializations
Complex structures are typical in machine learning. Tailoring learning
algorithms for every structure requires an effort that may be saved by defining
a generic learning procedure adaptive to any complex structure. In this paper,
we propose to map any complex structure onto a generic form, called
serialization, over which we can apply any sequence-based density estimator. We
then show how to transfer the learned density back onto the space of original
structures. To expose the learning procedure to the structural particularities
of the original structures, we take care that the serializations reflect
accurately the structures' properties. Enumerating all serializations is
infeasible. We propose an effective way to sample representative serializations
from the complete set of serializations which preserves the statistics of the
complete set. Our method is competitive or better than state of the art
learning algorithms that have been specifically designed for given structures.
In addition, since the serialization involves sampling from a combinatorial
process it provides considerable protection from overfitting, which we clearly
demonstrate on a number of experiments.Comment: Submission to NeurIPS 201
Unique Hue Judgment in Different Languages: A Comparison of Korean and English
Three experiments investigated unique hues (Hering, 1878) in native Korean and English speakers. Many recent studies have shown that color categories differ across languages and cultures, challenging the proposal that a particular set of color categories is universal and potentially innate. Unique hue judgments, and selection of the best examples of those categories have also been found to vary within an English-speaking population. Here we investigated unique hue judgments and possible discrepancies between unique hue and best example judgments in two languages. Experiment 1 found that the loci of unique hues were similar for English and Korean speakers. Experiment 2 replicated and extended this result, using both single and double hue scaling. Experiment 3 showed that, in both cultures, unique hue choices depended on the range, and organization of the array from which participants chose. The results of this study suggest that unique hue judgments vary according to the experimental task, in both language
Cinnamons: A Computation Model Underlying Control Network Programming
We give the easily recognizable name "cinnamon" and "cinnamon programming" to
a new computation model intended to form a theoretical foundation for Control
Network Programming (CNP). CNP has established itself as a programming paradigm
combining declarative and imperative features, built-in search engine, powerful
tools for search control that allow easy, intuitive, visual development of
heuristic, nondeterministic, and randomized solutions. We define rigorously the
syntax and semantics of the new model of computation, at the same time trying
to keep clear the intuition behind and to include enough examples. The
purposely simplified theoretical model is then compared to both WHILE-programs
(thus demonstrating its Turing-completeness), and the "real" CNP. Finally,
future research possibilities are mentioned that would eventually extend the
cinnamon programming into the directions of nondeterminism, randomness, and
fuzziness.Comment: 7th Intl Conf. on Computer Science, Engineering & Applications
(ICCSEA 2017) September 23~24, 2017, Copenhagen, Denmar
Decomposition tables for experiments I. A chain of randomizations
One aspect of evaluating the design for an experiment is the discovery of the
relationships between subspaces of the data space. Initially we establish the
notation and methods for evaluating an experiment with a single randomization.
Starting with two structures, or orthogonal decompositions of the data space,
we describe how to combine them to form the overall decomposition for a
single-randomization experiment that is ``structure balanced.'' The
relationships between the two structures are characterized using efficiency
factors. The decomposition is encapsulated in a decomposition table. Then, for
experiments that involve multiple randomizations forming a chain, we take
several structures that pairwise are structure balanced and combine them to
establish the form of the orthogonal decomposition for the experiment. In
particular, it is proven that the properties of the design for such an
experiment are derived in a straightforward manner from those of the individual
designs. We show how to formulate an extended decomposition table giving the
sources of variation, their relationships and their degrees of freedom, so that
competing designs can be evaluated.Comment: Published in at http://dx.doi.org/10.1214/09-AOS717 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
SurveyMan: Programming and Automatically Debugging Surveys
Surveys can be viewed as programs, complete with logic, control flow, and
bugs. Word choice or the order in which questions are asked can unintentionally
bias responses. Vague, confusing, or intrusive questions can cause respondents
to abandon a survey. Surveys can also have runtime errors: inattentive
respondents can taint results. This effect is especially problematic when
deploying surveys in uncontrolled settings, such as on the web or via
crowdsourcing platforms. Because the results of surveys drive business
decisions and inform scientific conclusions, it is crucial to make sure they
are correct.
We present SurveyMan, a system for designing, deploying, and automatically
debugging surveys. Survey authors write their surveys in a lightweight
domain-specific language aimed at end users. SurveyMan statically analyzes the
survey to provide feedback to survey authors before deployment. It then
compiles the survey into JavaScript and deploys it either to the web or a
crowdsourcing platform. SurveyMan's dynamic analyses automatically find survey
bugs, and control for the quality of responses. We evaluate SurveyMan's
algorithms analytically and empirically, demonstrating its effectiveness with
case studies of social science surveys conducted via Amazon's Mechanical Turk.Comment: Submitted version; accepted to OOPSLA 201
The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data
We present a new multi-dimensional data structure, which we call the skip
quadtree (for point data in R^2) or the skip octree (for point data in R^d,
with constant d>2). Our data structure combines the best features of two
well-known data structures, in that it has the well-defined "box"-shaped
regions of region quadtrees and the logarithmic-height search and update
hierarchical structure of skip lists. Indeed, the bottom level of our structure
is exactly a region quadtree (or octree for higher dimensional data). We
describe efficient algorithms for inserting and deleting points in a skip
quadtree, as well as fast methods for performing point location and approximate
range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in
the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30
Sequences of purchases in credit card data reveal life styles in urban populations
Zipf-like distributions characterize a wide set of phenomena in physics,
biology, economics and social sciences. In human activities, Zipf-laws describe
for example the frequency of words appearance in a text or the purchases types
in shopping patterns. In the latter, the uneven distribution of transaction
types is bound with the temporal sequences of purchases of individual choices.
In this work, we define a framework using a text compression technique on the
sequences of credit card purchases to detect ubiquitous patterns of collective
behavior. Clustering the consumers by their similarity in purchases sequences,
we detect five consumer groups. Remarkably, post checking, individuals in each
group are also similar in their age, total expenditure, gender, and the
diversity of their social and mobility networks extracted by their mobile phone
records. By properly deconstructing transaction data with Zipf-like
distributions, this method uncovers sets of significant sequences that reveal
insights on collective human behavior.Comment: 30 pages, 26 figure
- …