10,648 research outputs found

    Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

    Get PDF
    Crowdsourcing systems commonly face the problem of aggregating multiple judgments provided by potentially unreliable workers. In addition, several aspects of the design of efficient crowdsourcing processes, such as defining worker's bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration of the task at hand. Bringing this together, in this work we introduce a new time--sensitive Bayesian aggregation method that simultaneously estimates a task's duration and obtains reliable aggregations of crowdsourced judgments. Our method, called BCCTime, builds on the key insight that the time taken by a worker to perform a task is an important indicator of the likely quality of the produced judgment. To capture this, BCCTime uses latent variables to represent the uncertainty about the workers' completion time, the tasks' duration and the workers' accuracy. To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers) are expected to submit their judgments. In contrast, workers with a lower propensity to valid labeling, such as spammers, bots or lazy labelers, are assumed to perform tasks considerably faster or slower than the time required by normal workers. Specifically, we use efficient message-passing Bayesian inference to learn approximate posterior probabilities of (i) the confusion matrix of each worker, (ii) the propensity to valid labeling of each worker, (iii) the unbiased duration of each task and (iv) the true label of each task. Using two real-world public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a task's duration compared to state-of-the-art methods

    Consideration of building a common platform of collaborative learning environment

    Get PDF
    This paper reports on considerations about a common and basic functions/components for building a collaborative learning environment. We make efforts to specify the technological issues towards the future standardization of this environment through our research experiences. The problem of standardization includes many embarrassed aspects, however it will extend and widen the field of applications possible within the collaborative learning paradigm, and will make possible the usage of the fruits of years of research and individual implementations of the concept of collaborative learning, from many researches, developments and experiences. So we would like to locate this problem as building a common platform

    Statistical Learning Approaches to Information Filtering

    Get PDF
    Enabling computer systems to understand human thinking or behaviors has ever been an exciting challenge to computer scientists. In recent years one such a topic, information filtering, emerges to help users find desired information items (e.g.~movies, books, news) from large amount of available data, and has become crucial in many applications, like product recommendation, image retrieval, spam email filtering, news filtering, and web navigation etc.. An information filtering system must be able to understand users' information needs. Existing approaches either infer a user's profile by exploring his/her connections to other users, i.e.~collaborative filtering (CF), or analyzing the content descriptions of liked or disliked examples annotated by the user, ~i.e.~content-based filtering (CBF). Those methods work well to some extent, but are facing difficulties due to lack of insights into the problem. This thesis intensively studies a wide scope of information filtering technologies. Novel and principled machine learning methods are proposed to model users' information needs. The work demonstrates that the uncertainty of user profiles and the connections between them can be effectively modelled by using probability theory and Bayes rule. As one major contribution of this thesis, the work clarifies the ``structure'' of information filtering and gives rise to principled solutions. In summary, the work of this thesis mainly covers the following three aspects: Collaborative filtering: We develop a probabilistic model for memory-based collaborative filtering (PMCF), which has clear links with classical memory-based CF. Various heuristics to improve memory-based CF have been proposed in the literature. In contrast, extensions based on PMCF can be made in a principled probabilistic way. With PMCF, we describe a CF paradigm that involves interactions with users, instead of passively receiving data from users in conventional CF, and actively chooses the most informative patterns to learn, thereby greatly reduce user efforts and computational costs. Content-based filtering: One major problem for CBF is the deficiency and high dimensionality of content-descriptive features. Information items (e.g.~images or articles) are typically described by high-dimensional features with mixed types of attributes, that seem to be developed independently but intrinsically related. We derive a generalized principle component analysis to merge high-dimensional and heterogenous content features into a low-dimensional continuous latent space. The derived features brings great conveniences to CBF, because most existing algorithms easily cope with low-dimensional and continuous data, and more importantly, the extracted data highlight the intrinsic semantics of original content features. Hybrid filtering: How to combine CF and CBF in an ``smart'' way remains one of the most challenging problems in information filtering. Little principled work exists so far. This thesis reveals that people's information needs can be naturally modelled with a hierarchical Bayesian thinking, where each individual's data are generated based on his/her own profile model, which itself is a sample from a common distribution of the population of user profiles. Users are thus connected to each other via this common distribution. Due to the complexity of such a distribution in real-world applications, usually applied parametric models are too restrictive, and we thus introduce a nonparametric hierarchical Bayesian model using Dirichlet process. We derive effective and efficient algorithms to learn the described model. In particular, the finally achieved hybrid filtering methods are surprisingly simple and intuitively understandable, offering clear insights to previous work on pure CF, pure CBF, and hybrid filtering

    Learning and Interpreting Multi-Multi-Instance Learning Networks

    Get PDF
    We introduce an extension of the multi-instance learning problem where examples are organized as nested bags of instances (e.g., a document could be represented as a bag of sentences, which in turn are bags of words). This framework can be useful in various scenarios, such as text and image classification, but also supervised learning over graphs. As a further advantage, multi-multi instance learning enables a particular way of interpreting predictions and the decision function. Our approach is based on a special neural network layer, called bag-layer, whose units aggregate bags of inputs of arbitrary size. We prove theoretically that the associated class of functions contains all Boolean functions over sets of sets of instances and we provide empirical evidence that functions of this kind can be actually learned on semi-synthetic datasets. We finally present experiments on text classification, on citation graphs, and social graph data, which show that our model obtains competitive results with respect to accuracy when compared to other approaches such as convolutional networks on graphs, while at the same time it supports a general approach to interpret the learnt model, as well as explain individual predictions.Comment: JML

    An assessment of DREAM, appendix E

    Get PDF
    The design realization, evaluation and modelling (DREAM) system is evaluated. A short history of the DREAM research project is given as well as the significant characteristics of DREAM as a development environment. The design notation which is the basis for the DREAM system is reviewed, and the development tools envisioned as part of DREAM are discussed. Insights into development environments and their production are presented and used to make suggestions for future work in the area of development environments

    Semantic technologies: from niche to the mainstream of Web 3? A comprehensive framework for web Information modelling and semantic annotation

    Get PDF
    Context: Web information technologies developed and applied in the last decade have considerably changed the way web applications operate and have revolutionised information management and knowledge discovery. Social technologies, user-generated classification schemes and formal semantics have a far-reaching sphere of influence. They promote collective intelligence, support interoperability, enhance sustainability and instigate innovation. Contribution: The research carried out and consequent publications follow the various paradigms of semantic technologies, assess each approach, evaluate its efficiency, identify the challenges involved and propose a comprehensive framework for web information modelling and semantic annotation, which is the thesis’ original contribution to knowledge. The proposed framework assists web information modelling, facilitates semantic annotation and information retrieval, enables system interoperability and enhances information quality. Implications: Semantic technologies coupled with social media and end-user involvement can instigate innovative influence with wide organisational implications that can benefit a considerable range of industries. The scalable and sustainable business models of social computing and the collective intelligence of organisational social media can be resourcefully paired with internal research and knowledge from interoperable information repositories, back-end databases and legacy systems. Semantified information assets can free human resources so that they can be used to better serve business development, support innovation and increase productivity

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    • …
    corecore