19,982 research outputs found
Exploring The Value Of Folksonomies For Creating Semantic Metadata
Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources
Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes
In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10
Intelligent XML Tag Classification Techniques for XML Encryption Improvement
Flexibility, friendliness, and adaptability have been key components to use XML to exchange information across different networks providing the needed common syntax for various messaging systems. However excess usage of XML as a communication medium shed the light on security standards used to protect exchanged messages achieving data confidentiality and privacy.
This research presents a novel approach to secure XML messages being used in various systems with efficiency providing high security measures and high performance. system model is based on two major modules, the first to classify XML messages and define which parts of the messages to be secured assigning an importance level for each tag presented in XML message and then using XML encryption standard proposed earlier by W3C [3] to perform a partial encryption on selected parts defined in classification stage.
As a result, study aims to improve both the performance of XML encryption process and bulk message handling to achieve data cleansing efficiently
Content repositories and social networking : can there be synergies?
This paper details the novel application of Web 2.0 concepts to current services offered to Social Scientists by the ReDReSS project, carried out by the Centre for e-Science at Lancaster University. We detail plans to introduce Social Bookmarking and Social Networking concepts into the repository software developed by the project. This will result in the improved discovery of e-Science concepts and training to Social Scientists and allow for much improved linking of resources in the repository. We describe plans that use Social Networking and Social Bookmarking concepts, using Open Standards, which will promote collaboration between researchers by using information gathered on user’s use of the repository and information about the user. This will spark collaborations that would not normally be possible in the academic repository context
Folksonomy: the New Way to Serendipity
Folksonomy expands the collaborative process by allowing contributors to index content. It rests on three powerful properties: the absence of a prior taxonomy, multi-indexation and the absence of thesaurus. It concerns a more exploratory search than an entry in a search engine. Its original relationship-based structure (the three-way relationship between users, content and tags) means that folksonomy allows various modalities of curious explorations: a cultural exploration and a social exploration. The paper has two goals. Firstly, it tries to draw a general picture of the various folksonomy websites. Secundly, since labelling lacks any standardisation, folksonomies are often under threat of invasion by noise. This paper consequently tries to explore the different possible ways of regulating the self-generated indexation process.taxonomy; indexation; innovation and user-created content
A matter of words: NLP for quality evaluation of Wikipedia medical articles
Automatic quality evaluation of Web information is a task with many fields of
applications and of great relevance, especially in critical domains like the
medical one. We move from the intuition that the quality of content of medical
Web documents is affected by features related with the specific domain. First,
the usage of a specific vocabulary (Domain Informativeness); then, the adoption
of specific codes (like those used in the infoboxes of Wikipedia articles) and
the type of document (e.g., historical and technical ones). In this paper, we
propose to leverage specific domain features to improve the results of the
evaluation of Wikipedia medical articles. In particular, we evaluate the
articles adopting an "actionable" model, whose features are related to the
content of the articles, so that the model can also directly suggest strategies
for improving a given article quality. We rely on Natural Language Processing
(NLP) and dictionaries-based techniques in order to extract the bio-medical
concepts in a text. We prove the effectiveness of our approach by classifying
the medical articles of the Wikipedia Medicine Portal, which have been
previously manually labeled by the Wiki Project team. The results of our
experiments confirm that, by considering domain-oriented features, it is
possible to obtain sensible improvements with respect to existing solutions,
mainly for those articles that other approaches have less correctly classified.
Other than being interesting by their own, the results call for further
research in the area of domain specific features suitable for Web data quality
assessment
Recommended from our members
Linking Data Across Universities: An Integrated Video Lectures Dataset
This paper presents our work and experience interlinking educational information across universities through the use of Linked Data principles and technologies. More specifically this paper is focused on selecting, extracting, structuring and interlinking information of video lectures produced by 27 different educational institutions. For this purpose, selected information from several websites and YouTube channels have been scraped and structured according to well-known vocabularies, like FOAF 1, or the W3C Ontology for Media Resources 2. To integrate this information, the extracted videos have been categorized under a common classification space, the taxonomy defined by the Open Directory Project 3. An evaluation of this categorization process has been conducted obtaining a 98% degree of coverage and 89% degree of correctness. As a result of this process a new Linked Data dataset has been released containing more than 14,000 video lectures from 27 different institutions and categorized under a common classification scheme
Ensuring the discoverability of digital images for social work education : an online tagging survey to test controlled vocabularies
The digital age has transformed access to all kinds of educational content not only in text-based format but also digital images and other media. As learning technologists and librarians begin to organise these new media into digital collections for educational purposes, older problems associated with cataloguing and classifying non-text media have re-emerged. At the heart of this issue is the problem of describing complex and highly subjective images in a reliable and consistent manner. This paper reports on the findings of research designed to test the suitability of two controlled vocabularies to index and thereby improve the discoverability of images stored in the Learning Exchange, a repository for social work education and research. An online survey asked respondents to "tag", a series of images and responses were mapped against the two controlled vocabularies. Findings showed that a large proportion of user generated tags could be mapped to the controlled vocabulary terms (or their equivalents). The implications of these findings for indexing and discovering content are discussed in the context of a wider review of the literature on "folksonomies" (or user tagging) versus taxonomies and controlled vocabularies
- …