56,697 research outputs found

    Learning by stochastic serializations

    Full text link
    Complex structures are typical in machine learning. Tailoring learning algorithms for every structure requires an effort that may be saved by defining a generic learning procedure adaptive to any complex structure. In this paper, we propose to map any complex structure onto a generic form, called serialization, over which we can apply any sequence-based density estimator. We then show how to transfer the learned density back onto the space of original structures. To expose the learning procedure to the structural particularities of the original structures, we take care that the serializations reflect accurately the structures' properties. Enumerating all serializations is infeasible. We propose an effective way to sample representative serializations from the complete set of serializations which preserves the statistics of the complete set. Our method is competitive or better than state of the art learning algorithms that have been specifically designed for given structures. In addition, since the serialization involves sampling from a combinatorial process it provides considerable protection from overfitting, which we clearly demonstrate on a number of experiments.Comment: Submission to NeurIPS 201

    Unique Hue Judgment in Different Languages: A Comparison of Korean and English

    Get PDF
    Three experiments investigated unique hues (Hering, 1878) in native Korean and English speakers. Many recent studies have shown that color categories differ across languages and cultures, challenging the proposal that a particular set of color categories is universal and potentially innate. Unique hue judgments, and selection of the best examples of those categories have also been found to vary within an English-speaking population. Here we investigated unique hue judgments and possible discrepancies between unique hue and best example judgments in two languages. Experiment 1 found that the loci of unique hues were similar for English and Korean speakers. Experiment 2 replicated and extended this result, using both single and double hue scaling. Experiment 3 showed that, in both cultures, unique hue choices depended on the range, and organization of the array from which participants chose. The results of this study suggest that unique hue judgments vary according to the experimental task, in both language

    Cinnamons: A Computation Model Underlying Control Network Programming

    Full text link
    We give the easily recognizable name "cinnamon" and "cinnamon programming" to a new computation model intended to form a theoretical foundation for Control Network Programming (CNP). CNP has established itself as a programming paradigm combining declarative and imperative features, built-in search engine, powerful tools for search control that allow easy, intuitive, visual development of heuristic, nondeterministic, and randomized solutions. We define rigorously the syntax and semantics of the new model of computation, at the same time trying to keep clear the intuition behind and to include enough examples. The purposely simplified theoretical model is then compared to both WHILE-programs (thus demonstrating its Turing-completeness), and the "real" CNP. Finally, future research possibilities are mentioned that would eventually extend the cinnamon programming into the directions of nondeterminism, randomness, and fuzziness.Comment: 7th Intl Conf. on Computer Science, Engineering & Applications (ICCSEA 2017) September 23~24, 2017, Copenhagen, Denmar

    Decomposition tables for experiments I. A chain of randomizations

    Get PDF
    One aspect of evaluating the design for an experiment is the discovery of the relationships between subspaces of the data space. Initially we establish the notation and methods for evaluating an experiment with a single randomization. Starting with two structures, or orthogonal decompositions of the data space, we describe how to combine them to form the overall decomposition for a single-randomization experiment that is ``structure balanced.'' The relationships between the two structures are characterized using efficiency factors. The decomposition is encapsulated in a decomposition table. Then, for experiments that involve multiple randomizations forming a chain, we take several structures that pairwise are structure balanced and combine them to establish the form of the orthogonal decomposition for the experiment. In particular, it is proven that the properties of the design for such an experiment are derived in a straightforward manner from those of the individual designs. We show how to formulate an extended decomposition table giving the sources of variation, their relationships and their degrees of freedom, so that competing designs can be evaluated.Comment: Published in at http://dx.doi.org/10.1214/09-AOS717 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    SurveyMan: Programming and Automatically Debugging Surveys

    Full text link
    Surveys can be viewed as programs, complete with logic, control flow, and bugs. Word choice or the order in which questions are asked can unintentionally bias responses. Vague, confusing, or intrusive questions can cause respondents to abandon a survey. Surveys can also have runtime errors: inattentive respondents can taint results. This effect is especially problematic when deploying surveys in uncontrolled settings, such as on the web or via crowdsourcing platforms. Because the results of surveys drive business decisions and inform scientific conclusions, it is crucial to make sure they are correct. We present SurveyMan, a system for designing, deploying, and automatically debugging surveys. Survey authors write their surveys in a lightweight domain-specific language aimed at end users. SurveyMan statically analyzes the survey to provide feedback to survey authors before deployment. It then compiles the survey into JavaScript and deploys it either to the web or a crowdsourcing platform. SurveyMan's dynamic analyses automatically find survey bugs, and control for the quality of responses. We evaluate SurveyMan's algorithms analytically and empirically, demonstrating its effectiveness with case studies of social science surveys conducted via Amazon's Mechanical Turk.Comment: Submitted version; accepted to OOPSLA 201

    The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data

    Full text link
    We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R^2) or the skip octree (for point data in R^d, with constant d>2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined "box"-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location and approximate range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30

    Sequences of purchases in credit card data reveal life styles in urban populations

    Full text link
    Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics and social sciences. In human activities, Zipf-laws describe for example the frequency of words appearance in a text or the purchases types in shopping patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work, we define a framework using a text compression technique on the sequences of credit card purchases to detect ubiquitous patterns of collective behavior. Clustering the consumers by their similarity in purchases sequences, we detect five consumer groups. Remarkably, post checking, individuals in each group are also similar in their age, total expenditure, gender, and the diversity of their social and mobility networks extracted by their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, this method uncovers sets of significant sequences that reveal insights on collective human behavior.Comment: 30 pages, 26 figure
    corecore