8 research outputs found
Contexts and Contributions: Building the Distributed Library
This report updates and expands on A Survey of Digital Library Aggregation Services, originally commissioned by the DLF as an internal report in summer 2003, and released to the public later that year. It highlights major developments affecting the ecosystem of scholarly communications and digital libraries since the last survey and provides an analysis of OAI implementation demographics, based on a comparative review of repository registries and cross-archive search services. Secondly, it reviews the state-of-practice for a cohort of digital library aggregation services, grouping them in the context of the problem space to which they most closely adhere. Based in part on responses collected in fall 2005 from an online survey distributed to the original core services, the report investigates the purpose, function and challenges of next-generation aggregation services. On a case-by-case basis, the advances in each service are of interest in isolation from each other, but the report also attempts to situate these services in a larger context and to understand how they fit into a multi-dimensional and interdependent ecosystem supporting the worldwide community of scholars. Finally, the report summarizes the contributions of these services thus far and identifies obstacles requiring further attention to realize the goal of an open, distributed digital library system
A Survey of Digital Library Aggregation Services
This report provides an overview of a diverse set of more than thirty digital library aggregation services, organizes them into functional clusters, and then evaluates them more fully from the perspective of an informed user. Most of the services under review rely wholly or partially on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), although some of them predate its inception and a few use predominantly Z39.50 protocols. In the opening section of this report, each service is annotated with its organizational affiliation, subject coverage, function, audience, status, and size. Critical issues surrounding each of these elements are presented in order to provide the reader with an appreciation of the nuances inherent in seemingly straightforward factual information, such as audience or size. Each service is then grouped into one of five functional clusters:
• open access e-print archives and servers;
• cross-archive search services and aggregators;
• from digital collections to digital library environments;
• from peer-reviewed referratories to portal services;
• specialized search engines
Recommended from our members
Computational Models of Quality for Educational Digital Resource Assessment
Educational digital libraries and peer-produced open educational resources have become integral to efforts to incorporate personalized learning into the classroom. Assuring the quality of educational content from these sources has become a major concern of the curators of such materials, and of educators who want to use them. But quality of educational materials is a multi-faceted problem, not completely understood, and often disputed. In current practice, focused manual effort by trained experts is required to assess each resource.
This work attempts to leverage the large existing corpus of work in the field of computational semantics to supplement and support human judgment in educational resource assessment. Based on an in-depth study of human expert decision processes, characterizing the quality of a resource is broken down into dimensions of quality, and further into low-level, more easily identified indicators of quality; these indicators of quality alone are strongly predictive of an expert\u27s overall quality assessment of a resource.
A corpus of 1000 resources from the Digital Library for Earth System Education (DLESE) was manually annotated for the presence or absence of seven important quality indicators. Human experts were able to make these assessments quite consistently. Using a supervised machine learning and document classification approach, a baseline computational system was able to train models for each of the seven indicators that achieved some agreement with the human annotation. By adjusting the computational system to make better use of the data set, these models were improved to achieve good agreement on all seven indicators.
To evaluate the generalizability of this approach, an additional corpus of 230 peer-produced open educational resources from the Instructional Architect (IA) project was manually annotated for quality indicators, using a slightly modified annotation protocol. In spite of the very different nature of the materials, the computational models trained on the DLESE corpus generalized to the new data to a small extent; models trained on the new data achieved mostly good agreement
Multifaceted Geotagging for Streaming News
News sources on the Web generate constant streams of information, describing the events that shape our world. In particular, geography plays a key role in the news, and understanding the geographic information present in news allows for its useful spatial browsing and retrieval. This process of understanding is called geotagging, and involves first finding in the document all textual references to geographic locations, known as toponyms, and second, assigning the correct lat/long values to each toponym, steps which are termed toponym recognition and toponym resolution, respectively. These steps are difficult due to ambiguities in natural language: some toponyms share names with non-location entities, and further, a given toponym can have many location interpretations. Removing these ambiguities is crucial for successful geotagging.
To this end, geotagging methods are described which were developed for streaming news. First, a spatio-textual search engine named STEWARD, and an interactive map-based news browsing system named NewsStand are described, which feature geotaggers as central components, and served as motivating systems and experimental testbeds for developing geotagging methods. Next, a geotagging methodology is presented that follows a multifaceted approach involving a variety of techniques. First, a multifaceted toponym recognition process is described that uses both rule-based and machine learning–based methods to ensure high toponym recall. Next, various forms of toponym resolution evidence are explored. One such type of evidence is lists of toponyms, termed comma groups, whose toponyms share a common thread in their geographic properties that enables correct resolution. In addition to explicit evidence, authors take advantage of the implicit geographic knowledge of their audiences. Understanding the local places known by an audience, termed its local lexicon, affords great performance gains when geotagging articles from local newspapers, which account for the vast majority of news on the Web. Finally, considering windows of text of varying size around each toponym, termed adaptive context, allows for a tradeoff between geotagging execution speed and toponym resolution accuracy. Extensive experimental evaluations of all the above methods, using existing and two newly-created, large corpora of streaming news, show great performance gains over several competing prominent geotagging methods