2,231 research outputs found

    AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

    Get PDF
    Despite decades of research and development in named entity resolution, dealing with name ambiguity is still a a challenging issue for much bibliometric-enhanced information retrieval (IR) tasks. As new bibliographic datasets are created as a result of the upward growth of publication records worldwide, more problems arise when considering the effects of errors resulting from missing data fields, duplicate entities, misspellings, extra characters, etc. As these concerns tend to be of large-scale, both the general consistency and the quality of electronic data are largely affected. This paper presents an approach to handle these name ambiguity problems through the use of crowdsourcing as a complementary means to traditional unsupervised approaches. To this end, we present “AuthCrowd”, a crowdsourcing system with the ability to decompose named entity disambiguation and entity matching tasks. Experimental results on a real-world dataset of publicly available papers published in peer-reviewed venues demonstrate the potential of our proposed approach for improving author name disambiguation. The findings further highlight the importance of adopting hybrid crowd-algorithm collaboration strategies, especially for handling complexity and quantifying bias when working with large amounts of data

    Alternative Representations of 3D-Reconstructed Heritage Data

    Get PDF
    © ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Journal on Computing and Cultural Heritage, Vol. 9, No. 1, Article 4, Publication date: November 2015. doi>10.1145/2795233By collecting images of heritage assets from members of the public and processing them to create 3D-reconstructed models, the HeritageTogether project has accomplished the digital recording of nearly 80 sites across Wales, UK. A large amount of data has been collected and produced in the form of photographs, 3D models, maps, condition reports, and more. Here we discuss some of the different methods used to realize the potential of this data in different formats and for different purposes. The data are explored in both virtual and tangible settings, and—with the use of a touch table—a combination of both. We examine some alternative representations of this community-produced heritage data for educational, research, and public engagement applications

    Applying Wikipedia to Interactive Information Retrieval

    Get PDF
    There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

    Understanding people through the aggregation of their digital footprints

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 160-172).Every day, millions of people encounter strangers online. We read their medical advice, buy their products, and ask them out on dates. Yet our views of them are very limited; we see individual communication acts rather than the person(s) as a whole. This thesis contends that socially-focused machine learning and visualization of archived digital footprints can improve the capacity of social media to help form impressions of online strangers. Four original designs are presented that each examine the social fabric of a different existing online world. The designs address unique perspectives on the problem of and opportunities offered by online impression formation. The first work, Is Britney Spears Span?, examines a way of prototyping strangers on first contact by modeling their past behaviors across a social network. Landscape of Words identifies cultural and topical trends in large online publics. Personas is a data portrait that characterizes individuals by collating heterogenous textual artifacts. The final design, Defuse, navigates and visualizes virtual crowds using metrics grounded in sociology. A reflection on these experimental endeavors is also presented, including a formalization of the problem and considerations for future research. A meta-critique by a panel of domain experts completes the discussion.by Aaron Robert Zinman.Ph.D

    Worldwide Infrastructure for Neuroevolution: A Modular Library to Turn Any Evolutionary Domain into an Online Interactive Platform

    Get PDF
    Across many scientific disciplines, there has emerged an open opportunity to utilize the scale and reach of the Internet to collect scientific contributions from scientists and non-scientists alike. This process, called citizen science, has already shown great promise in the fields of biology and astronomy. Within the fields of artificial life (ALife) and evolutionary computation (EC) experiments in collaborative interactive evolution (CIE) have demonstrated the ability to collect thousands of experimental contributions from hundreds of users across the glob. However, such collaborative evolutionary systems can take nearly a year to build with a small team of researchers. This dissertation introduces a new developer framework enabling researchers to easily build fully persistent online collaborative experiments around almost any evolutionary domain, thereby reducing the time to create such systems to weeks for a single researcher. To add collaborative functionality to any potential domain, this framework, called Worldwide Infrastructure for Neuroevolution (WIN), exploits an important unifying principle among all evolutionary algorithms: regardless of the overall methods and parameters of the evolutionary experiment, every individual created has an explicit parent-child relationship, wherein one individual is considered the direct descendant of another. This principle alone is enough to capture and preserve the relationships and results for a wide variety of evolutionary experiments, while allowing multiple human users to meaningfully contribute. The WIN framework is first validated through two experimental domains, image evolution and a new two-dimensional virtual creature domain, Indirectly Encoded SodaRace (IESoR), that is shown to produce a visually diverse variety of ambulatory creatures. Finally, an Android application built with WIN, filters, allows users to interactively evolve custom image effects to apply to personalized photographs, thereby introducing the first CIE application available for any mobile device. Together, these collaborative experiments and new mobile application establish a comprehensive new platform for evolutionary computation that can change how researchers design and conduct citizen science online
    corecore