2,341 research outputs found

    Optimal Crowdsourced Classification with a Reject Option in the Presence of Spammers

    Full text link
    We explore the design of an effective crowdsourcing system for an MM-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final decision. We consider the scenario where the workers have a reject option so that they are allowed to skip microtasks when they are unable to or choose not to respond to binary microtasks. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize crowd's classification performance.Comment: submitted to ICASSP 201

    On Classification in Human-driven and Data-driven Systems

    Get PDF
    Classification systems are ubiquitous, and the design of effective classification algorithms has been an even more active area of research since the emergence of machine learning techniques. Despite the significant efforts devoted to training and feature selection in classification systems, misclassifications do occur and their effects can be critical in various applications. The central goal of this thesis is to analyze classification problems in human-driven and data-driven systems, with potentially unreliable components and design effective strategies to ensure reliable and effective classification algorithms in such systems. The components/agents in the system can be machines and/or humans. The system components can be unreliable due to a variety of reasons such as faulty machines, security attacks causing machines to send falsified information, unskilled human workers sending imperfect information, or human workers providing random responses. This thesis first quantifies the effect of such unreliable agents on the classification performance of the systems and then designs schemes that mitigate misclassifications and their effects by adapting the behavior of the classifier on samples from machines and/or humans and ensure an effective and reliable overall classification. In the first part of this thesis, we study the case when only humans are present in the systems, and consider crowdsourcing systems. Human workers in crowdsourcing systems observe the data and respond individually by providing label related information to a fusion center in a distributed manner. In such systems, we consider the presence of unskilled human workers where they have a reject option so that they may choose not to provide information regarding the label of the data. To maximize the classification performance at the fusion center, an optimal aggregation rule is proposed to fuse the human workers\u27 responses in a weighted majority voting manner. Next, the presence of unreliable human workers, referred to as spammers, is considered. Spammers are human workers that provide random guesses regarding the data label information to the fusion center in crowdsourcing systems. The effect of spammers on the overall classification performance is characterized when the spammers can strategically respond to maximize their reward in reward-based crowdsourcing systems. For such systems, an optimal aggregation rule is proposed by adapting the classifier based on the responses from the workers. The next line of human-driven classification is considered in the context of social networks. The classification problem is studied to classify a human whether he/she is influential or not in propagating information in social networks. Since the knowledge of social network structures is not always available, the influential agent classification problem without knowing the social network structure is studied. A multi-task low rank linear influence model is proposed to exploit the relationships between different information topics. The proposed approach can simultaneously predict the volume of information diffusion for each topic and automatically classify the influential nodes for each topic. In the third part of the thesis, a data-driven decentralized classification framework is developed where machines interact with each other to perform complex classification tasks. However, the machines in the system can be unreliable due to a variety of reasons such as noise, faults and attacks. Providing erroneous updates leads the classification process in a wrong direction, and degrades the performance of decentralized classification algorithms. First, the effect of erroneous updates on the convergence of the classification algorithm is analyzed, and it is shown that the algorithm linearly converges to a neighborhood of the optimal classification solution. Next, guidelines are provided for network design to achieve faster convergence. Finally, to mitigate the impact of unreliable machines, a robust variant of ADMM is proposed, and its resilience to unreliable machines is shown with an exact convergence to the optimal classification result. The final part of research in this thesis considers machine-only data-driven classification problems. First, the fundamentals of classification are studied in an information theoretic framework. We investigate the nonparametric classification problem for arbitrary unknown composite distributions in the asymptotic regime where both the sample size and the number of classes grow exponentially large. The notion of discrimination capacity is introduced, which captures the largest exponential growth rate of the number of classes relative to the samples size so that there exists a test with asymptotically vanishing probability of error. Error exponent analysis using the maximum mean discrepancy is provided and the discrimination rate, i.e., lower bound on the discrimination capacity is characterized. Furthermore, an upper bound on the discrimination capacity based on Fano\u27s inequality is developed

    Augmented Human Machine Intelligence for Distributed Inference

    Get PDF
    With the advent of the internet of things (IoT) era and the extensive deployment of smart devices and wireless sensor networks (WSNs), interactions of humans and machine data are everywhere. In numerous applications, humans are essential parts in the decision making process, where they may either serve as information sources or act as the final decision makers. For various tasks including detection and classification of targets, detection of outliers, generation of surveillance patterns and interactions between entities, seamless integration of the human and the machine expertise is required where they simultaneously work within the same modeling environment to understand and solve problems. Efficient fusion of information from both human and sensor sources is expected to improve system performance and enhance situational awareness. Such human-machine inference networks seek to build an interactive human-machine symbiosis by merging the best of the human with the best of the machine and to achieve higher performance than either humans or machines by themselves. In this dissertation, we consider that people often have a number of biases and rely on heuristics when exposed to different kinds of uncertainties, e.g., limited information versus unreliable information. We develop novel theoretical frameworks for collaborative decision making in complex environments when the observers may include both humans and physics-based sensors. We address fundamental concerns such as uncertainties, cognitive biases in human decision making and derive human decision rules in binary decision making. We model the decision-making by generic humans working in complex networked environments that feature uncertainties, and develop new approaches and frameworks facilitating collaborative human decision making and cognitive multi-modal fusion. The first part of this dissertation exploits the behavioral economics concept Prospect Theory to study the behavior of human binary decision making under cognitive biases. Several decision making systems involving humans\u27 participation are discussed, and we show the impact of human cognitive biases on the decision making performance. We analyze how heterogeneity could affect the performance of collaborative human decision making in the presence of complex correlation relationships among the behavior of humans and design the human selection strategy at the population level. Next, we employ Prospect Theory to model the rationality of humans and accurately characterize their behaviors in answering binary questions. We design a weighted majority voting rule to solve classification problems via crowdsourcing while considering that the crowd may include some spammers. We also propose a novel sequential task ordering algorithm to improve system performance for classification in crowdsourcing composed of unreliable human workers. In the second part of the dissertation, we study the behavior of cognitive memory limited humans in binary decision making and develop efficient approaches to help memory constrained humans make better decisions. We show that the order in which information is presented to the humans impacts their decision making performance. Next, we consider the selfish behavior of humans and construct a unified incentive mechanism for IoT based inference systems while addressing the selfish concerns of the participants. We derive the optimal amount of energy that a selfish sensor involved in the signal detection task must spend in order to maximize a certain utility function, in the presence of buyers who value the result of signal detection carried out by the sensor. Finally, we design a human-machine collaboration framework that blends both machine observations and human expertise to solve binary hypothesis testing problems semi-autonomously. In networks featuring human-machine teaming/collaboration, it is critical to coordinate and synthesize the operations of the humans and machines (e.g., robots and physical sensors). Machine measurements affect human behaviors, actions, and decisions. Human behavior defines the optimal decision-making algorithm for human-machine networks. In today\u27s era of artificial intelligence, we not only aim to exploit augmented human-machine intelligence to ensure accurate decision making; but also expand intelligent systems so as to assist and improve such intelligence

    Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data

    Get PDF
    This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and

    Online Active Learning of Reject Option Classifiers

    Full text link
    Active learning is an important technique to reduce the number of labeled examples in supervised learning. Active learning for binary classification has been well addressed in machine learning. However, active learning of the reject option classifier remains unaddressed. In this paper, we propose novel algorithms for active learning of reject option classifiers. We develop an active learning algorithm using double ramp loss function. We provide mistake bounds for this algorithm. We also propose a new loss function called double sigmoid loss function for reject option and corresponding active learning algorithm. We offer a convergence guarantee for this algorithm. We provide extensive experimental results to show the effectiveness of the proposed algorithms. The proposed algorithms efficiently reduce the number of label examples required

    Label Selection Approach to Learning from Crowds

    Full text link
    Supervised learning, especially supervised deep learning, requires large amounts of labeled data. One approach to collect large amounts of labeled data is by using a crowdsourcing platform where numerous workers perform the annotation tasks. However, the annotation results often contain label noise, as the annotation skills vary depending on the crowd workers and their ability to complete the task correctly. Learning from Crowds is a framework which directly trains the models using noisy labeled data from crowd workers. In this study, we propose a novel Learning from Crowds model, inspired by SelectiveNet proposed for the selective prediction problem. The proposed method called Label Selection Layer trains a prediction model by automatically determining whether to use a worker's label for training using a selector network. A major advantage of the proposed method is that it can be applied to almost all variants of supervised learning problems by simply adding a selector network and changing the objective function for existing models, without explicitly assuming a model of the noise in crowd annotations. The experimental results show that the performance of the proposed method is almost equivalent to or better than the Crowd Layer, which is one of the state-of-the-art methods for Deep Learning from Crowds, except for the regression problem case.Comment: 15 pages, 1 figur
    corecore