3,178 research outputs found

    Thematic Annotation: extracting concepts out of documents

    Get PDF
    Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

    Exploring The Value Of Folksonomies For Creating Semantic Metadata

    No full text
    Finding good keywords to describe resources is an on-going problem: typically we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well populated source of unstructured tags describing web resources. This paper explores the value of the folksonomy tags as potential source of keyword metadata by examining the relationship between folksonomies, community produced annotations, and keywords extracted by machines. The experiment has been carried-out in two ways: subjectively, by asking two human indexers to evaluate the quality of the generated keywords from both systems; and automatically, by measuring the percentage of overlap between the folksonomy set and machine generated keywords set. The results of this experiment show that the folksonomy tags agree more closely with the human generated keywords than those automatically generated. The results also showed that the trained indexers preferred the semantics of folksonomy tags compared to keywords extracted automatically. These results can be considered as evidence for the strong relationship of folksonomies to the human indexer’s mindset, demonstrating that folksonomies used in the del.icio.us bookmarking service are a potential source for generating semantic metadata to annotate web resources

    User modeling for exploratory search on the Social Web. Exploiting social bookmarking systems for user model extraction, evaluation and integration

    Get PDF
    Exploratory search is an information seeking strategy that extends be- yond the query-and-response paradigm of traditional Information Retrieval models. Users browse through information to discover novel content and to learn more about the newly discovered things. Social bookmarking systems integrate well with exploratory search, because they allow one to search, browse, and filter social bookmarks. Our contribution is an exploratory tag search engine that merges social bookmarking with exploratory search. For this purpose, we have applied collaborative filtering to recommend tags to users. User models are an im- portant prerequisite for recommender systems. We have produced a method to algorithmically extract user models from folksonomies, and an evaluation method to measure the viability of these user models for exploratory search. According to our evaluation web-scale user modeling, which integrates user models from various services across the Social Web, can improve exploratory search. Within this thesis we also provide a method for user model integra- tion. Our exploratory tag search engine implements the findings of our user model extraction, evaluation, and integration methods. It facilitates ex- ploratory search on social bookmarks from Delicious and Connotea and pub- lishes extracted user models as Linked Data

    Text mining with exploitation of user\u27s background knowledge : discovering novel association rules from text

    Get PDF
    The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments. This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user\u27s background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty). The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance. The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules

    prototypical implementations

    Get PDF
    In this technical report, we present prototypical implementations of innovative tools and methods developed according to the working plan outlined in Technical Report TR-B-09-05 [23]. We present an ontology modularization and integration framework and the SVoNt server, the server-side end of an SVN- based versioning system for ontologies in the Corporate Ontology Engineering pillar. For the Corporate Semantic Collaboration pillar, we present the prototypical implementation of a light-weight ontology editor for non-experts and an ontology based expert finder system. For the Corporate Semantic Search pillar, we present a prototype for algorithmic extraction of relations in folksonomies, a tool for trend detection using a semantic analyzer, a tool for automatic classification of web documents using Hidden Markov models, a personalized semantic recommender for multimedia content, and a semantic search assistant developed in co-operation with the Museumsportal Berlin. The prototypes complete the next milestone on the path to an integral Cor- porate Semantic Web architecture based on the three pillars Corporate Ontol- ogy Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in [23]

    Semantic modelling of user interests based on cross-folksonomy analysis

    Get PDF
    The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combine

    An Enhanced Web Data Learning Method for Integrating Item, Tag and Value for Mining Web Contents

    Get PDF
    The Proposed System Analyses the scopes introduced by Web 2.0 and collaborative tagging systems, several challenges have to be addressed too, notably, the problem of information overload. Recommender systems are among the most successful approaches for increasing the level of relevant content over the 201C;noise.201D; Traditional recommender systems fail to address the requirements presented in collaborative tagging systems. This paper considers the problem of item recommendation in collaborative tagging systems. It is proposed to model data from collaborative tagging systems with three-mode tensors, in order to capture the three-way correlations between users, tags, and items. By applying multiway analysis, latent correlations are revealed, which help to improve the quality of recommendations. Moreover, a hybrid scheme is proposed that additionally considers content-based information that is extracted from items. We propose an advanced data mining method using SVD that combines both tag and value similarity, item and user preference. SVD automatically extracts data from query result pages by first identifying and segmenting the query result records in the query result pages and then aligning the segmented query result records into a table, in which the data values from the same attribute are put into the same column. Specifically, we propose new techniques to handle the case when the query result records based on user preferences, which may be due to the presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested-structure that may exist in the query result records

    An adaptable and personalised e-learning system applied to computer science programmes design

    Get PDF
    With the rapid advances in E-learning systems, personalisation and adaptability have now become important features in the education technology. In this paper, we describe the development of an architecture for A Personalised and Adaptable E-Learning System (APELS) that attempts to contribute to advancements in this field. APELS aims to provide a personalised and adaptable learning environment to users from the freely available resources on the Web. An ontology was employed to model a specific learning subject and to extract the relevant learning resources from the Web based on a learner's model (the learners background, needs and learning styles). The APELS system uses natural language processing techniques to evaluate the content extracted from relevant resources against a set of learning outcomes as defined by standard curricula to enable the appropriate learning of the subject. An application in the computer science field is used to illustrate the working mechanisms of the APELS system and its evaluation based on the ACM/IEEE computing curriculum. An experimental evaluation was conducted with domain experts to evaluate whether APELS can produce the right learning material that suits the learning needs of a learner. The results show that the produced content by APELS is of a good quality and satisfies the learning outcomes for teaching purposes
    corecore