1,355 research outputs found

    A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

    Full text link
    Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page

    Support Vector Machines and Radon's Theorem

    Full text link
    A support vector machine (SVM) is an algorithm which finds a hyperplane that optimally separates labeled data points in Rn\mathbb{R}^n into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We study the possible configurations of support vectors for points in general position. In particular, we connect the possible configurations to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the positive and negative support vectors in a generic SVM configuration are projected to the separating hyperplane, then these projected points will form a Radon configuration. Further, with a particular type of general position, we show there are at most n+1n+1 support vectors. This can be used to test the level of machine precision needed in a support vector machine implementation. We also show the projections of the convex hulls of the support vectors intersect in a single Radon point, and under a small enough perturbation, the points labeled as support vectors remain labeled as support vectors. We furthermore consider computations studying the expected number of support vectors for randomly generated data

    Predicting outcomes in crowdfunding campaigns with textual, visual, and linguistic signals

    Get PDF
    This paper introduces a neural network and natural language processing approach to predict the outcome of crowdfunding startup pitches using text, speech, and video metadata in 20,188 crowdfunding campaigns. Our study emphasizes the need to understand crowdfunding from an investor’s perspective. Linguistic styles in crowdfunding campaigns that aim to trigger excitement or are aimed at inclusiveness are better predictors of campaign success than firm-level determinants. At the contrary, higher uncertainty perceptions about the state of product development may substantially reduce evaluations of new products and reduce purchasing intentions among potential funders. Our findings emphasize that positive psychological language is salient in environments where objective information is scarce and where investment preferences are taste based. Employing enthusiastic language or showing the product in action may capture an individual’s attention. Using all technology and design-related crowdfunding campaigns launched on Kickstarter, our study underscores the need to align potential consumers’ expectations with the visualization and presentation of the crowdfunding campaign

    On Model- and Data-based Approaches to Structural Health Monitoring

    Get PDF
    Structural Heath Monitoring (SHM) is the term applied to the process of periodically monitoring the state of a structural system with the aim of diagnosing damage in the structure. Over the course of the past several decades there has been ongoing interest in approaches to the problem of SHM. This attention has been sustained by the belief that SHM will allow substantial economic and life-safety benefits to be realised across a wide range of applications. Several numerical and laboratory implementations have been successfully demonstrated. However, despite this research effort, real-world applications of SHM as originally envisaged are somewhat rare. Numerous technical barriers to the broader application of SHM methods have been identified, namely: severe restrictions on the availability of damaged-state data in real-world scenarios; difficulties associated with the numerical modelling of physical systems; and limited understanding of the physical effect of system inputs (including environmental and operational loads). This thesis focuses on the roles of law-based and data-based modelling in current applications of. First, established approaches to model-based SHM are introduced, with the aid of an exemplar ‘wingbox’ structure. The study highlights the degree of difficulty associated with applying model-updating-based methods and with producing numerical models capable of accurately predicting changes in structural response due to damage. These difficulties motivate the investigation of non-deterministic, predictive modelling of structural responses taking into account both experimental and modelling uncertainties. Secondly, a data-based approach to multiple-site damage location is introduced, which may allow the quantity of experimental data required for classifier training to be drastically reduced. A conclusion of the above research is the identification of hybrid approaches, in which a forward-mode law-based model informs a data-based damage identification scheme, as an area for future wor

    The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List

    Get PDF
    We are interested in supervised ranking algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. In this work, we provide a general form of convex objective that gives high-scoring examples more importance. This “push” near the top of the list can be chosen arbitrarily large or small, based on the preference of the user. We choose ℓp-norms to provide a specific type of push; if the user sets p larger, the objective concentrates harder on the top of the list. We derive a generalization bound based on the p-norm objective, working around the natural asymmetry of the problem. We then derive a boosting-style algorithm for the problem of ranking with a push at the top. The usefulness of the algorithm is illustrated through experiments on repository data. We prove that the minimizer of the algorithm’s objective is unique in a specific sense. Furthermore, we illustrate how our objective is related to quality measurements for information retrieval

    Efficient Resources Provisioning Based on Load Forecasting in Cloud

    Get PDF
    Cloud providers should ensure QoS while maximizing resources utilization. One optimal strategy is to timely allocate resources in a fine-grained mode according to application’s actual resources demand. The necessary precondition of this strategy is obtaining future load information in advance. We propose a multi-step-ahead load forecasting method, KSwSVR, based on statistical learning theory which is suitable for the complex and dynamic characteristics of the cloud computing environment. It integrates an improved support vector regression algorithm and Kalman smoother. Public trace data taken from multitypes of resources were used to verify its prediction accuracy, stability, and adaptability, comparing with AR, BPNN, and standard SVR. Subsequently, based on the predicted results, a simple and efficient strategy is proposed for resource provisioning. CPU allocation experiment indicated it can effectively reduce resources consumption while meeting service level agreements requirements

    Cognitive Machine Individualism in a Symbiotic Cybersecurity Policy Framework for the Preservation of Internet of Things Integrity: A Quantitative Study

    Get PDF
    This quantitative study examined the complex nature of modern cyber threats to propose the establishment of cyber as an interdisciplinary field of public policy initiated through the creation of a symbiotic cybersecurity policy framework. For the public good (and maintaining ideological balance), there must be recognition that public policies are at a transition point where the digital public square is a tangible reality that is more than a collection of technological widgets. The academic contribution of this research project is the fusion of humanistic principles with Internet of Things (IoT) technologies that alters our perception of the machine from an instrument of human engineering into a thinking peer to elevate cyber from technical esoterism into an interdisciplinary field of public policy. The contribution to the US national cybersecurity policy body of knowledge is a unified policy framework (manifested in the symbiotic cybersecurity policy triad) that could transform cybersecurity policies from network-based to entity-based. A correlation archival data design was used with the frequency of malicious software attacks as the dependent variable and diversity of intrusion techniques as the independent variable for RQ1. For RQ2, the frequency of detection events was the dependent variable and diversity of intrusion techniques was the independent variable. Self-determination Theory is the theoretical framework as the cognitive machine can recognize, self-endorse, and maintain its own identity based on a sense of self-motivation that is progressively shaped by the machine’s ability to learn. The transformation of cyber policies from technical esoterism into an interdisciplinary field of public policy starts with the recognition that the cognitive machine is an independent consumer of, advisor into, and influenced by public policy theories, philosophical constructs, and societal initiatives
    corecore