134,481 research outputs found

    Human Document Classification Using Bags of Words

    Get PDF
    Humans are remarkably adept at classifying text documents into cate-gories. For instance, while reading a news story, we are rapidly able to assess whether it belongs to the domain of finance, politics or sports. Automating this task would have applications for content-based search or filtering of digital documents. To this end, it is interesting to investigate the nature of information humans use to classify documents. Here we report experimental results suggesting that this information might, in fact, be quite simple. Using a paradigm of progressive revealing, we determined classification performance as a function of number of words. We found that subjects are able to achieve similar classification accuracy with or without syntactic information across a range of passage sizes. These results have implications for models of human text-understanding and also allow us to estimate what level of performance we can expect, in principle, from a system without requiring a prior step of complex natural language processing

    From Data Extraction to Data Leaking. Data-activism in Italian and Spanish anti-corruption campaigns

    Get PDF
    This article investigates how activists employ Information and Communication Technologies (ICTs) and engage with data-activism in grassroots struggles against corruption. Based on a comparative research design that triangulates three qualitative data sources — in-depth interviews, movements' documents and participatory platforms — the article analyses two campaigns: Riparte il Futuro in Italy and 15MpaRato in Spain. In so doing, the article casts light on how activists engage with digital data, revealing how their employment is connected to and consistent with the type of organizational structure and communication strategy of the campaign. Moreover, the article evaluates how activists engage with three specific digital data-related practices — digital data creation, data usage and data transformation. Finally, the article illustrates that grasping the features of digital data-related practices also reflects how activists perceive and enact distinct ideas of active citizenship and data transparency in their fight against corruption

    From data extraction to data leaking: Data-activism in Italian and Spanish anti-corruption campaigns

    Get PDF
    This article investigates how activists employ Information and Communication Technologies (ICTs) and engage with data-activism in grassroots struggles against corruption. Based on a comparative research design that triangulates three qualitative data sources - in-depth interviews, movements' documents and participatory platforms - the article analyses two campaigns: Riparte il Futuro in Italy and 15MpaRato in Spain. In so doing, the article casts light on how activists engage with digital data, revealing how their employment is connected to and consistent with the type of organizational structure and communication strategy of the campaign. Moreover, the article evaluates how activists engage with three specific digital data-related practices - digital data creation, data usage and data transformation. Finally, the article illustrates that grasping the features of digital data-related practices also reflects how activists perceive and enact distinct ideas of active citizenship and data transparency in their fight against corruption

    Yours ever (well, maybe): Studies and signposts in letter writing

    Get PDF
    Electronic mail and other digital communications technologies seemingly threaten to end the era of handwritten and typed letters, now affectionately seen as part of snail mail. In this essay, I analyze a group of popular and scholarly studies about letter writing-including examples of pundits critiquing the use of e-mail, etiquette manuals advising why the handwritten letter still possesses value, historians and literary scholars studying the role of letters in the past and what it tells us about our present attitudes about digital communications technologies, and futurists predicting how we will function as personal archivists maintaining every document including e-mail. These are useful guideposts for archivists, providing both a sense of the present and the past in the role, value and nature of letters and their successors. They also provide insights into how such documents should be studied, expanding our gaze beyond the particular letters, to the tools used to create them and the traditions dictating their form and function. We also can discern a role for archivists, both for contributing to the literature about documents and in using these studies and commentaries, suggesting not a new disciplinary realm but opportunities for new interdisciplinary work. Examining a documentary form makes us more sensitive to both the innovations and traditions as it shifts from the analog to the digital; we can learn not to be caught up in hysteria or nostalgia about one form over another and archivists can learn about what they might expect in their labors to document society and its institutions. At one time, paper was part of an innovative technology, with roles very similar to the Internet and e-mail today. It may be that the shifts are far less revolutionary than is often assumed. Reading such works also suggests, finally, that archivists ought to rethink how they view their own knowledge and how it is constructed and used. © 2010 Springer Science+Business Media B.V

    Global Heuristic Search on Encrypted Data (GHSED)

    Get PDF
    Important document are being kept encrypted in remote servers. In order to retrieve these encrypted data, efficient search methods needed to enable the retrieval of the document without knowing the content of the documents In this paper a technique called a global heuristic search on encrypted data (GHSED) technique will be described for search in an encrypted files using public key encryption stored on an untrusted server and retrieve the files that satisfy a certain search pattern without revealing any information about the original files. GHSED technique would satisfy the following: (1) Provably secure, the untrusted server cannot learn anything about the plaintext given only the cipher text. (2) Provide controlled searching, so that the untrusted server cannot search for a word without the user's authorization. (3) Support hidden queries, so that the user may ask the untrusted server to search for a secret word without revealing the word to the server. (4) Support query isolation, so the untrusted server learns nothing more than the search result about the plaintext

    Information seeking in the Humanities: physicality and digitality

    Get PDF
    This paper presents a brief overview of a research project that is examining the information seeking practices of humanities scholars. The results of this project are being used to develop digital resources to better support these work activities. Initial findings from a recent set of interviews is offered, revealing the importance of physical artefacts in the humanities scholars’ research processes and the limitations of digital resources. Finally, further work that is soon to be undertaken is summarised, and it is hoped that after participation in this workshop these ideas will be refined
    corecore