13,018 research outputs found

    Searching with Tags: Do Tags Help Users Find Things?

    Get PDF
    This study examines the question of whether tags can be useful in the process of information retrieval. Participants searched a social bookmarking tool specialising in academic articles (CiteULike) and an online journal database (Pubmed). Participant actions were captured using screen capture software and they were asked to describe their search process. Users did make use of tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, users also made use of controlled vocabularies in the journal database to locate useful search terms and of links to related articles supplied by the database

    'Girlfriends and Strawberry Jam’: Tagging Memories, Experiences, and Events for Future Retrieval

    Get PDF
    In this short paper we have some preliminary thoughts about tagging everyday life events in order to allow future retrieval of events or experiences related to events. Elaboration of these thoughts will be done in the context of the recently started Network of Excellence PetaMedia (Peer-to-Peer Tagged Media) and the Network of Excellence SSPNet (Social Signal Processing), to start in 2009, both funded by the European Commission's Seventh Framework Programme. Descriptions of these networks will be given later in this paper

    Towards Cleaning-up Open Data Portals: A Metadata Reconciliation Approach

    Full text link
    This paper presents an approach for metadata reconciliation, curation and linking for Open Governamental Data Portals (ODPs). ODPs have been lately the standard solution for governments willing to put their public data available for the society. Portal managers use several types of metadata to organize the datasets, one of the most important ones being the tags. However, the tagging process is subject to many problems, such as synonyms, ambiguity or incoherence, among others. As our empiric analysis of ODPs shows, these issues are currently prevalent in most ODPs and effectively hinders the reuse of Open Data. In order to address these problems, we develop and implement an approach for tag reconciliation in Open Data Portals, encompassing local actions related to individual portals, and global actions for adding a semantic metadata layer above individual portals. The local part aims to enhance the quality of tags in a single portal, and the global part is meant to interlink ODPs by establishing relations between tags.Comment: 8 pages,10 Figures - Under Revision for ICSC201

    Ensuring the discoverability of digital images for social work education : an online tagging survey to test controlled vocabularies

    Get PDF
    The digital age has transformed access to all kinds of educational content not only in text-based format but also digital images and other media. As learning technologists and librarians begin to organise these new media into digital collections for educational purposes, older problems associated with cataloguing and classifying non-text media have re-emerged. At the heart of this issue is the problem of describing complex and highly subjective images in a reliable and consistent manner. This paper reports on the findings of research designed to test the suitability of two controlled vocabularies to index and thereby improve the discoverability of images stored in the Learning Exchange, a repository for social work education and research. An online survey asked respondents to "tag", a series of images and responses were mapped against the two controlled vocabularies. Findings showed that a large proportion of user generated tags could be mapped to the controlled vocabulary terms (or their equivalents). The implications of these findings for indexing and discovering content are discussed in the context of a wider review of the literature on "folksonomies" (or user tagging) versus taxonomies and controlled vocabularies

    Bridging the gap between social tagging and semantic annotation: E.D. the Entity Describer

    Get PDF
    Semantic annotation enables the development of efficient computational methods for analyzing and interacting with information, thus maximizing its value. With the already substantial and constantly expanding data generation capacity of the life sciences as well as the concomitant increase in the knowledge distributed in scientific articles, new ways to produce semantic annotations of this information are crucial. While automated techniques certainly facilitate the process, manual annotation remains the gold standard in most domains. In this manuscript, we describe a prototype mass-collaborative semantic annotation system that, by distributing the annotation workload across the broad community of biomedical researchers, may help to produce the volume of meaningful annotations needed by modern biomedical science. We present E.D., the Entity Describer, a mashup of the Connotea social tagging system, an index of semantic web-accessible controlled vocabularies, and a new public RDF database for storing social semantic annotations

    Tagging for health information organisation and retrieval

    Get PDF
    This paper examines the tagging practices evident on CiteULike, a research oriented social bookmarking site for journal articles. Articles selected for this study were health information and medicine related. Tagging practices were examined using standard informetric measures for analysis of bibliographic information and analysis of term use. Additionally, tags were compared to descriptors assigned to the same article

    Linguistically informed and corpus informed morphological analysis of Arabic

    No full text
    Standard English PoS-taggers generally involve tag-assignment (via dictionary-lookup etc) followed by tag-disambiguation (via a context model, e.g. PoS-ngrams or Brill transformations). We want to PoS-tag our Arabic Corpus, but evaluation of existing PoS-taggers has highlighted shortcomings; in particular, about a quarter of all word tokens are not assigned a fully correct morphological analysis. Tag-assignment is significantly more complex for Arabic. An Arabic lemmatiser program can extract the stem or root, but this is not enough for full PoS-tagging; words should be decomposed into five parts: proclitics, prefixes, stem or root, suffixes and postclitics. The morphological analyser should then add the appropriate linguistic information to each of these parts of the word; in effect, instead of a tag for a word, we need a subtag for each part (and possibly multiple subtags if there are multiple proclitics, prefixes, suffixes and postclitics). Many challenges face the implementation of Arabic morphology, the rich “root-and-pattern” nonconcatenative (or nonlinear) morphology and the highly complex word formation process of root and patterns, especially if one or two long vowels are part of the root letters. Moreover, the orthographic issues of Arabic such as short vowels ( َ ُ ِ ), Hamzah (ء أ إ ؤ ئ), Taa’ Marboutah ( ة ) and Ha’ ( ه ), Ya’ ( ي ) and Alif Maksorah( ى ) , Shaddah ( ّ ) or gemination, and Maddah ( آ ) or extension which is a compound letter of Hamzah and Alif ( أا ). Our morphological analyzer uses linguistic knowledge of the language as well as corpora to verify the linguistic information. To understand the problem, we started by analyzing fifteen established Arabic language dictionaries, to build a broad-coverage lexicon which contains not only roots and single words but also multi-word expressions, idioms, collocations requiring special part-of-speech assignment, and words with special part-of-speech tags. The next stage of research was a detailed analysis and classification of Arabic language roots to address the “tail” of hard cases for existing morphological analyzers, and analysis of the roots, word-root combinations and the coverage of each root category of the Qur’an and the word-root information stored in our lexicon. From authoritative Arabic grammar books, we extracted and generated comprehensive lists of affixes, clitics and patterns. These lists were then cross-checked by analyzing words of three corpora: the Qur’an, the Corpus of Contemporary Arabic and Penn Arabic Treebank (as well as our Lexicon, considered as a fourth cross-check corpus). We also developed a novel algorithm that generates the correct pattern of the words, which deals with the orthographic issues of the Arabic language and other word derivation issues, such as the elimination or substitution of root letters
    corecore