114 research outputs found

    Improving Entity Retrieval on Structured Data

    Full text link
    The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches

    It's getting crowded! : improving the effectiveness of microtask crowdsourcing

    Get PDF
    [no abstract

    Ingest and Storage of 3D Objects in a Digital Preservation System

    Get PDF
    The DURAARK project is developing methods and tools for the Long-Term Preservation (LTP) of architectural knowledge, including approaches to: enrich Building Information Models with “as built” information from scans, semantically enrich building models with additional data sets, preserve 3D models for future reuse. This deliverable defines the necessary steps for ingest and storage of 3D objects into anexisting OAIS compliant digital preservation system. It discusses how the gaps, which were previously identified and presented in deliverable D6.6.1, have been addressed in the DURAARK project so far. Developed methods and tools will be run against the DURAARK test set. Lastly, the existing drafts of the metadata schemas buildm for descriptive information and e57m and ifcm as technical metadata schemas for E57 and IFC respectively, will be extended significantly and presented in a digital preservation context

    Human Beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing

    Get PDF
    In the 21st century, where automated systems and artificial intelligence are replacing arduous manual labor by supporting data-intensive tasks, many problems still require human intelligence. Over the last decade, by tapping into human intelligence through microtasks, crowdsourcing has found remarkable applications in a wide range of domains. In this article, the authors discuss the growth of crowdsourcing systems since the term was coined by columnist Jeff Howe in 2006. They shed light on the evolution of crowdsourced microtasks in recent times. Next, they discuss a main challenge that hinders the quality of crowdsourced results: the prevalence of malicious behavior. They reflect on crowdsourcing's advantages and disadvantages. Finally, they leave the reader with interesting avenues for future research

    A checklist to combat cognitive biases in crowdsourcing

    Full text link

    Understanding User Perceptions of Response Delays in Crowd-Powered Conversational Systems

    Get PDF
    Crowd-powered conversational systems (CPCS) are gaining considerable attention for their potential utility in a variety of application domains, for which automated conversational interfaces are still too limited. CPCS currently suffer from long response delays, which hampers their potential as conversational partners. The majority of prior work in this area has focused on demonstrating the feasibility of the approach and improving performance, while evaluation studies have primarily focused on response latency and ways to reduce it. Relatively little is currently known about how response delays in a CPCS can affect user experience. While the importance of reducing response latency is widely recognized in the broader field of human-computer interaction, little attention has been paid to how response quality, response delay, conversational context, and the complexity of the task affect how users experience the conversation, and how they perceive waiting for responses in particular. We conducted a between-subjects experiment (N = 478), to examine the influence of these four factors on the overall waiting experience of users. Results show that users 1) evaluated the waiting experience more negatively when the response delay was longer than 8 seconds, 2) underestimated the elapsed time but experienced more frustration in tasks with high complexity, 3) underestimated the elapsed time and experienced less frustration with high quality bot's utterances, 4) judged response delays to be slightly longer, and experienced more frustration in an emotion-centric CPCS compared to a task-centric CPCS. Our insights can inform the design of future CPCSs with regards to defining performance requirements and anticipating their potential impact on the user experience they can facilitate.</p

    Topic-independent modeling of user knowledge in informational search sessions

    Get PDF
    Web search is among the most frequent online activities. In this context, widespread informational queries entail user intentions to obtain knowledge with respect to a particular topic or domain. To serve learning needs better, recent research in the field of interactive information retrieval has advocated the importance of moving beyond relevance ranking of search results and considering a user’s knowledge state within learning oriented search sessions. Prior work has investigated the use of supervised models to predict a user’s knowledge gain and knowledge state from user interactions during a search session. However, the characteristics of the resources that a user interacts with have neither been sufficiently explored, nor exploited in this task. In this work, we introduce a novel set of resource-centric features and demonstrate their capacity to significantly improve supervised models for the task of predicting knowledge gain and knowledge state of users in Web search sessions. We make important contributions, given that reliable training data for such tasks is sparse and costly to obtain. We introduce various feature selection strategies geared towards selecting a limited subset of effective and generalizable features. © 2021, The Author(s)
    • …
    corecore