2,231 research outputs found
AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing
Despite decades of research and development in
named entity resolution, dealing with name ambiguity is still a
a challenging issue for much bibliometric-enhanced information
retrieval (IR) tasks. As new bibliographic datasets are created as
a result of the upward growth of publication records worldwide,
more problems arise when considering the effects of errors
resulting from missing data fields, duplicate entities, misspellings,
extra characters, etc. As these concerns tend to be of large-scale,
both the general consistency and the quality of electronic data are
largely affected. This paper presents an approach to handle these
name ambiguity problems through the use of crowdsourcing as a
complementary means to traditional unsupervised approaches.
To this end, we present “AuthCrowd”, a crowdsourcing system
with the ability to decompose named entity disambiguation and
entity matching tasks. Experimental results on a real-world
dataset of publicly available papers published in peer-reviewed
venues demonstrate the potential of our proposed approach for
improving author name disambiguation. The findings further
highlight the importance of adopting hybrid crowd-algorithm
collaboration strategies, especially for handling complexity and
quantifying bias when working with large amounts of data
Alternative Representations of 3D-Reconstructed Heritage Data
© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Journal on Computing and Cultural Heritage, Vol. 9, No. 1, Article 4, Publication date: November 2015. doi>10.1145/2795233By collecting images of heritage assets from members of the public and processing them to create 3D-reconstructed models, the HeritageTogether project has accomplished the digital recording of nearly 80 sites across Wales, UK. A large amount of data has been collected and produced in the form of photographs, 3D models, maps, condition reports, and more. Here we discuss some of the different methods used to realize the potential of this data in different formats and for different purposes. The data are explored in both virtual and tangible settings, and—with the use of a touch table—a combination of both. We examine some alternative representations of this community-produced heritage data for educational, research, and public engagement applications
Recommended from our members
Scientific Utopia III: crowdsourcing science
Most scientific research is conducted by small teams of investigators who together formulate hypotheses, collect data, conduct analyses, and report novel findings. These teams operate independently as vertically integrated silos. Here we argue that scientific research that is horizontally distributed can provide substantial complementary value, aiming to maximize available resources, promote inclusiveness and transparency, and increase rigor and reliability. This alternative approach enables researchers to tackle ambitious projects that would not be possible under the standard model. Crowdsourced scientific initiatives vary in the degree of communication between project members from largely independent work curated by a coordination team to crowd collaboration on shared activities. The potential benefits and challenges of large-scale collaboration span the entire research process: ideation, study design, data collection, data analysis, reporting, and peer review. Complementing traditional small science with crowdsourced approaches can accelerate the progress of science and improve the quality of scientific research
Applying Wikipedia to Interactive Information Retrieval
There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval
Understanding people through the aggregation of their digital footprints
Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 160-172).Every day, millions of people encounter strangers online. We read their medical advice, buy their products, and ask them out on dates. Yet our views of them are very limited; we see individual communication acts rather than the person(s) as a whole. This thesis contends that socially-focused machine learning and visualization of archived digital footprints can improve the capacity of social media to help form impressions of online strangers. Four original designs are presented that each examine the social fabric of a different existing online world. The designs address unique perspectives on the problem of and opportunities offered by online impression formation. The first work, Is Britney Spears Span?, examines a way of prototyping strangers on first contact by modeling their past behaviors across a social network. Landscape of Words identifies cultural and topical trends in large online publics. Personas is a data portrait that characterizes individuals by collating heterogenous textual artifacts. The final design, Defuse, navigates and visualizes virtual crowds using metrics grounded in sociology. A reflection on these experimental endeavors is also presented, including a formalization of the problem and considerations for future research. A meta-critique by a panel of domain experts completes the discussion.by Aaron Robert Zinman.Ph.D
Worldwide Infrastructure for Neuroevolution: A Modular Library to Turn Any Evolutionary Domain into an Online Interactive Platform
Across many scientific disciplines, there has emerged an open opportunity to utilize the scale and reach of the Internet to collect scientific contributions from scientists and non-scientists alike. This process, called citizen science, has already shown great promise in the fields of biology and astronomy. Within the fields of artificial life (ALife) and evolutionary computation (EC) experiments in collaborative interactive evolution (CIE) have demonstrated the ability to collect thousands of experimental contributions from hundreds of users across the glob. However, such collaborative evolutionary systems can take nearly a year to build with a small team of researchers. This dissertation introduces a new developer framework enabling researchers to easily build fully persistent online collaborative experiments around almost any evolutionary domain, thereby reducing the time to create such systems to weeks for a single researcher. To add collaborative functionality to any potential domain, this framework, called Worldwide Infrastructure for Neuroevolution (WIN), exploits an important unifying principle among all evolutionary algorithms: regardless of the overall methods and parameters of the evolutionary experiment, every individual created has an explicit parent-child relationship, wherein one individual is considered the direct descendant of another. This principle alone is enough to capture and preserve the relationships and results for a wide variety of evolutionary experiments, while allowing multiple human users to meaningfully contribute. The WIN framework is first validated through two experimental domains, image evolution and a new two-dimensional virtual creature domain, Indirectly Encoded SodaRace (IESoR), that is shown to produce a visually diverse variety of ambulatory creatures. Finally, an Android application built with WIN, filters, allows users to interactively evolve custom image effects to apply to personalized photographs, thereby introducing the first CIE application available for any mobile device. Together, these collaborative experiments and new mobile application establish a comprehensive new platform for evolutionary computation that can change how researchers design and conduct citizen science online
- …