978 research outputs found

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract

    An introduction to crowdsourcing for language and multimedia technology research

    Get PDF
    Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible

    Creating a data collection for evaluating rich speech retrieval

    Get PDF
    We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items

    Augmenting the performance of image similarity search through crowdsourcing

    Get PDF
    Crowdsourcing is defined as “outsourcing a task that is traditionally performed by an employee to a large group of people in the form of an open call” (Howe 2006). Many platforms designed to perform several types of crowdsourcing and studies have shown that results produced by crowds in crowdsourcing platforms are generally accurate and reliable. Crowdsourcing can provide a fast and efficient way to use the power of human computation to solve problems that are difficult for machines to perform. From several different microtasking crowdsourcing platforms available, we decided to perform our study using Amazon Mechanical Turk. In the context of our research we studied the effect of user interface design and its corresponding cognitive load on the performance of crowd-produced results. Our results highlighted the importance of a well-designed user interface on crowdsourcing performance. Using crowdsourcing platforms such as Amazon Mechanical Turk, we can utilize humans to solve problems that are difficult for computers, such as image similarity search. However, in tasks like image similarity search, it is more efficient to design a hybrid human–machine system. In the context of our research, we studied the effect of involving the crowd on the performance of an image similarity search system and proposed a hybrid human–machine image similarity search system. Our proposed system uses machine power to perform heavy computations and to search for similar images within the image dataset and uses crowdsourcing to refine results. We designed our content-based image retrieval (CBIR) system using SIFT, SURF, SURF128 and ORB feature detector/descriptors and compared the performance of the system using each feature detector/descriptor. Our experiment confirmed that crowdsourcing can dramatically improve the CBIR system performance

    Hybrid human-machine information systems for data classification

    Get PDF
    Over the last decade, we have seen an intense development of machine learning approaches for solving various tasks in diverse domains. Despite the remarkable advancements in this field, there are still task categories that machine learning models fall short of the required accuracy. This is the case with tasks that require human cognitive skills, such as sentiment analysis, emotional or contextual understanding. On the other hand, human-based computation approaches, such as crowdsourcing, are popular for solving such tasks. Crowdsourcing enables access to a vast number of groups with different expertise, and if managed properly, generates high-quality results. However, crowdsourcing as a standalone approach is not scalable due to the latency and cost it brings in. Addressing the challenges and limitations that the human and machine-based approaches have distinctly requires bridging the two fields into a hybrid intelligence, seen as a promising approach to solve critical and complex real-world tasks. This thesis focuses on hybrid human-machine information systems, combining machine and human intelligence and leveraging their complementary strengths: the data processing efficiency of machine learning and the data quality generated by crowdsourcing. In this thesis, we present hybrid human-machine models to address the challenges falling into three dimensions: accuracy, latency, and cost. Solving data classification tasks in different domains has different requirements concerning accuracy, latency, and cost criteria. Motivated by this fact, we introduce a master component that evaluates these criteria to find the suitable model as a trade-off solution. In hybrid human-machine information systems, incorporating human judgments is expected to improve the accuracy of the system. Therefore, to ensure this, we focus on the human intelligence component, integrating profile-aware crowdsourcing for task assignment and data quality control mechanisms in the hybrid pipelines. The proposed conceptual hybrid human-machine models materialize in conducted experiments. Motivated by challenging scenarios and using real-world datasets, we implement the hybrid models in three experiments. Evaluations show that the implemented hybrid human-machine architectures for data classification tasks lead to better results as compared to each of the two approaches individually, improving the overall accuracy at an acceptable cost and latency
    • …
    corecore