16,710 research outputs found

    DARIAH and the Benelux

    Get PDF

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    An examination of automatic video retrieval technology on access to the contents of an historical video archive

    Get PDF
    Purpose – This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users. Design/methodology/approach – A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation). Findings – The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined. Originality/value – This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections

    "Q i-jtb the Raven": Taking Dirty OCR Seriously

    Get PDF
    This article argues that scholars must understand mass digitized texts as assemblages of new editions, subsidiary editions, and impressions of their historical sources, and that these various parts require sustained bibliographic analysis and description. To adequately theorize any research conducted in large-scale text archives—including research that includes primary or secondary sources discovered through keyword search—we must avoid the myth of surrogacy proffered by page images and instead consider directly the text files they overlay. Focusing on the OCR (optical character recognition) from which most large-scale historical text data derives, this article argues that the results of this "automatic" process are in fact new editions of their source texts that offer unique insights into both the historical texts they remediate and the more recent era of their remediation. The constitution and provenance of digitized archives are, to some extent at least, knowable and describable. Just as details of type, ink, or paper, or paratext such as printer's records can help us establish the histories under which a printed book was created, details of format, interface, and even grant proposals can help us establish the histories of corpora created under conditions of mass digitization

    Crowdsourcing and Scholarly Culture: Understanding Expertise in an Age of Popularism

    Get PDF
    The increasing volume of digital material available to the humanities creates clear potential for crowdsourcing. However, tasks in the digital humanities typically do not satisfy the standard requirement for decomposition into microtasks each of which must require little expertise on behalf of the worker and little context of the broader task. Instead, humanities tasks require scholarly knowledge to perform and even where sub-tasks can be extracted, these often involve broader context of the document or corpus from which they are extracted. That is the tasks are macrotasks, resisting simple decomposition. Building on a case study from musicology, the In Concert project, we will explore both the barriers to crowdsourcing in the creation of digital corpora and also examples where elements of automatic processing or less-expert work are possible in a broader matrix that also includes expert microtasks and macrotasks. Crucially we will see that the macrotask–microtask distinction is nuanced: it is often possible to create a partial decomposition into less-expert microtasks with residual expert macrotasks, and crucially do this in ways that preserve scholarly values

    Crowdsourcing and Scholarly Culture: Understanding Expertise in an Age of Popularism

    Get PDF
    The increasing volume of digital material available to the humanities creates clear potential for crowdsourcing. However, tasks in the digital humanities typically do not satisfy the standard requirement for decomposition into microtasks each of which must require little expertise on behalf of the worker and little context of the broader task. Instead, humanities tasks require scholarly knowledge to perform and even where sub-tasks can be extracted, these often involve broader context of the document or corpus from which they are extracted. That is the tasks are macrotasks, resisting simple decomposition. Building on a case study from musicology, the In Concert project, we will explore both the barriers to crowdsourcing in the creation of digital corpora and also examples where elements of automatic processing or less-expert work are possible in a broader matrix that also includes expert microtasks and macrotasks. Crucially we will see that the macrotask–microtask distinction is nuanced: it is often possible to create a partial decomposition into less-expert microtasks with residual expert macrotasks, and crucially do this in ways that preserve scholarly values

    Special Libraries, March 1955

    Get PDF
    Volume 46, Issue 3https://scholarworks.sjsu.edu/sla_sl_1955/1002/thumbnail.jp

    Data Privacy and Dignitary Privacy: Google Spain, the Right To Be Forgotten, and the Construction of the Public Sphere

    Get PDF
    The 2014 decision of the European Court of Justice in Google Spain controversially held that the fair information practices set forth in European Union (EU) Directive 95/46/EC (Directive) require that Google remove from search results links to websites that contain true information. Google Spain held that the Directive gives persons a “right to be forgotten.” At stake in Google Spain are values that involve both privacy and freedom of expression. Google Spain badly analyzes both. With regard to the latter, Google Spain fails to recognize that the circulation of texts of common interest among strangers makes possible the emergence of a “public” capable of forming the “public opinion” that is essential for democratic self-governance. As the rise of American newspapers in the nineteenth and twentieth centuries demonstrates, the press underwrites the public sphere by creating a structure of communication both responsive to public curiosity and independent of the content of any particular news story. Google, even though it is not itself an author, sustains the contemporary virtual public sphere by creating an analogous structure of communication. With regard to privacy values, EU law, like the laws of many nations, recognizes two distinct forms of privacy. The first is data privacy, which is protected by the fair information practices contained in the Directive. These practices regulate the processing of personal information to ensure (among other things) that such information is used only for the specified purposes for which it has been legally gathered. Data privacy operates according to an instrumental logic, and it seeks to endow persons with “control” over their personal data. Data subjects need not demonstrate harm in order to establish violations of data privacy. The second form of privacy recognized by EU law is dignitary privacy. Article 7 of the Charter of Fundamental Rights of the European Union protects the dignity of persons by regulating inappropriate communications that threaten to degrade, humiliate, or mortify them. Dignitary privacy follows a normative logic designed to prevent harm to personality caused by the violation of civility rules. There are the same privacy values as those safeguarded by the American tort of public disclosure of private facts. Throughout the world, courts protect dignitary privacy by balancing the harm that a communication may cause to personality against legitimate public interests in the communication. The instrumental logic of data privacy is inapplicable to public discourse, which is why the Directive contains derogations for journalistic activities. The communicative action characteristic of the public sphere is made up of intersubjective dialogue, which is antithetical both to the instrumental rationality of data privacy and to its aspiration to ensure individual control of personal information. Because the Google search engine underwrites the public sphere in which public discourse takes place, Google Spain should not have applied fair information practices to Google searches. But the Google Spain opinion also invokes Article 7, and in the end the decision creates doctrinal rules that are roughly approximate to those used to protect dignitary privacy. The Google Spain opinion is thus deeply confused about the kind of privacy it wishes to protect. It is impossible to ascertain whether the decision seeks to protect data privacy or dignitary privacy. Google Spain is ultimately pushed in the direction of dignitary privacy because data privacy is incompatible with public discourse, whereas dignitary privacy may be reconciled with the requirements of public discourse. Insofar as freedom of expression is valued because it fosters democratic self-government, public discourse cannot serve as an effective instrument of self-determination without a modicum of civility. Yet the Google Spain decision recognizes dignitary privacy only in a rudimentary and unsatisfactory way. If it had more clearly focused on the requirements of dignitary privacy, Google Spain would not so sharply have distinguished Google links from the underlying websites to which they refer. Google Spain would not have blithely outsourced the enforcement of the right to be forgotten to a private corporation like Google

    Toward an Interactive Directory for Norfolk, Nebraska: 1899-1900

    Full text link
    We describe steps toward an interactive directory for the town of Norfolk, Nebraska for the years 1899 and 1900. This directory would extend the traditional city directory by including a wider range of entities being described, much richer information about the entities mentioned and linkages to mentions of the entities in material such as digitized historical newspapers. Such a directory would be useful to readers who browse the historical newspapers by providing structured summaries of the entities mentioned. We describe the occurrence of entities in two years of the Norfolk Weekly News, focusing on several individuals to better understand the types of information which can be gleaned from historical newspapers and other historical materials. We also describe a prototype program which coordinates information about entities from the traditional city directories, the federal census, and from newspapers. We discuss the structured coding for these entities, noting that richer coding would increasingly include descriptions of events and scenarios. We propose that rich content about individuals and communities could eventually be modeled with agents and woven into historical narratives

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    • 

    corecore