1,331 research outputs found

    Compound key word generation from document databases using a hierarchical clustering art model

    Get PDF
    The growing availability of databases on the information highways motivates the development of new processing tools able to deal with a heterogeneous and changing information environment. A highly desirable feature of data processing systems handling this type of information is the ability to automatically extract its own key words. In this paper we address the specific problem of creating semantic term associations from a text database. The proposed method uses a hierarchical model made up of Fuzzy Adaptive Resonance Theory (ART) neural networks. First, the system uses several Fuzzy ART modules to cluster isolated words into semantic classes, starting from the database raw text. Next, this knowledge is used together with coocurrence information to extract semantically meaningful term associations. These associations are asymmetric and one-to-many due to the polisemy phenomenon. The strength of the associations between words can be measured numerically. Besides this, they implicitly define a hierarchy between descriptors. The underlying algorithm is appropriate for employment on large databases. The operation of the system is illustrated on several real databases

    An experiment with ontology mapping using concept similarity

    Get PDF
    This paper describes a system for automatically mapping between concepts in different ontologies. The motivation for the research stems from the Diogene project, in which the project's own ontology covering the ICT domain is mapped to external ontologies, in order that their associated content can automatically be included in the Diogene system. An approach involving measuring the similarity of concepts is introduced, in which standard Information Retrieval indexing techniques are applied to concept descriptions. A matrix representing the similarity of concepts in two ontologies is generated, and a mapping is performed based on two parameters: the domain coverage of the ontologies, and their levels of granularity. Finally, some initial experimentation is presented which suggests that our approach meets the project's unique set of requirements

    A Framework for Personalized Content Recommendations to Support Informal Learning in Massively Diverse Information WIKIS

    Get PDF
    Personalization has proved to achieve better learning outcomes by adapting to specific learners’ needs, interests, and/or preferences. Traditionally, most personalized learning software systems focused on formal learning. However, learning personalization is not only desirable for formal learning, it is also required for informal learning, which is self-directed, does not follow a specified curriculum, and does not lead to formal qualifications. Wikis among other informal learning platforms are found to attract an increasing attention for informal learning, especially Wikipedia. The nature of wikis enables learners to freely navigate the learning environment and independently construct knowledge without being forced to follow a predefined learning path in accordance with the constructivist learning theory. Nevertheless, navigation on information wikis suffer from several limitations. To support informal learning on Wikipedia and similar environments, it is important to provide easy and fast access to relevant content. Recommendation systems (RSs) have long been used to effectively provide useful recommendations in different technology enhanced learning (TEL) contexts. However, the massive diversity of unstructured content as well as user base on such information oriented websites poses major challenges when designing recommendation models for similar environments. In addition to these challenges, evaluation of TEL recommender systems for informal learning is rather a challenging activity due to the inherent difficulty in measuring the impact of recommendations on informal learning with the absence of formal assessment and commonly used learning analytics. In this research, a personalized content recommendation framework (PCRF) for information wikis as well as an evaluation framework that can be used to evaluate the impact of personalized content recommendations on informal learning from wikis are proposed. The presented recommendation framework models learners’ interests by continuously extrapolating topical navigation graphs from learners’ free navigation and applying graph structural analysis algorithms to extract interesting topics for individual users. Then, it integrates learners’ interest models with fuzzy thesauri for personalized content recommendations. Our evaluation approach encompasses two main activities. First, the impact of personalized recommendations on informal learning is evaluated by assessing conceptual knowledge in users’ feedback. Second, web analytics data is analyzed to get an insight into users’ progress and focus throughout the test session. Our evaluation revealed that PCRF generates highly relevant recommendations that are adaptive to changes in user’s interest using the HARD model with rank-based mean average precision (MAP@k) scores ranging between 100% and 86.4%. In addition, evaluation of informal learning revealed that users who used Wikipedia with personalized support could achieve higher scores on conceptual knowledge assessment with average score of 14.9 compared to 10.0 for the students who used the encyclopedia without any recommendations. The analysis of web analytics data show that users who used Wikipedia with personalized recommendations visited larger number of relevant pages compared to the control group, 644 vs 226 respectively. In addition, they were also able to make use of a larger number of concepts and were able to make comparisons and state relations between concepts

    Ontology mapping by concept similarity

    Get PDF
    This paper presents an approach to the problem of mapping ontologies. The motivation for the research stems from the Diogene Project which is developing a web training environment for ICT professionals. The system includes high quality training material from registered content providers, and free web material will also be made available through the project's "Web Discovery" component. This involves using web search engines to locate relevant material, and mapping the ontology at the core of the Diogene system to other ontologies that exist on the Semantic Web. The project's approach to ontology mapping is presented, and an evaluation of this method is described

    Fuzzy Model Fragment Retrieval

    Get PDF

    An office document retrieval system with the capability of processing incomplete and vague queries

    Get PDF
    TEXPROS (TEXt PROcessing System) is an intelligent document processing system. The system is a combination of filing and retrieval systems, which supports storing, classifying, categorizing, retrieving and reproducing documents, as well as extracting, browsing, retrieving and synthesizing information from a variety of documents. This dissertation presents a retrieval system for TEXPROS, which is capable of processing incomplete or vague queries and providing semantically meaningful responses to the users. The design of the retrieval system is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Secondly, there is a query transformation mechanism which consists of context construction and algebraic query formulation modules. Given an incomplete query, the context construction module searches the system for the required terms and constructs a query that has a complete representation. The resulting query is then formulated into an algebraic query. Thirdly, in practice, the user may not have a precise notion of what he is looking for. A browsing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries responds with an empty answer to the user, a query generalization mechanism is used to give the user a cooperative explanation for the empty answer. The generalizations of any given failed queries (i.e., with an empty answer) are derived by applying both the folder and type substitutions and weakening the search criteria in the original query. An efficient way is investigated for determining whether the empty answer is genuine and whether the original query reflects erroneous presuppositions, and therefore answering any failed query with a meaningful and cooperative response. It incorporates with a methodical approach to reducing the search space of generalized subqueries by analyzing the results of executing the query generalization and by efficiently applying the possible substitutions in a query to generate a small subset of relevant subqueries which are to be evaluated

    Text Clumping for Technical Intelligence

    Get PDF

    Measuring the landscape of civil war : evaluating geographic coding decisions with historic data from the Mau Mau rebellion

    Get PDF
    This research has been supported by grants from the Air Force Office of Scientific Research (FA9550-09-1-0314) and the Department of Defense Minerva Initiative through the Office of Naval Research (N00014-14-0071).Subnational conflict research increasingly utilizes georeferenced event datasets to understand contentious politics and violence. Yet, how exactly locations are mapped to particular geographies, especially from unstructured text sources such as newspaper reports and archival records, remains opaque and few best practices exist for guiding researchers through the subtle but consequential decisions made during geolocation. We begin to address this gap by developing a systematic approach to georeferencing that articulates the strategies available, empirically diagnoses problems of bias created by both the data-generating process and researcher-controlled tasks, and provides new generalizable tools for simultaneously optimizing both the recovery and accuracy of coordinates. We then empirically evaluate our process and tools against new microlevel data on the Mau Mau Rebellion (Colonial Kenya 1952-1960), drawn from 20,000 pages of recently declassified British military intelligence reports. By leveraging a subset of this data that includes map codes alongside natural language location descriptions, we demonstrate how inappropriately georeferencing data can have important downstream consequences in terms of systematically biasing coefficients or altering statistical significance and how our tools can help alleviate these problems.PostprintPeer reviewe

    Understanding User Intentions in Vertical Image Search

    Get PDF
    With the development of Internet and Web 2.0, large volume of multimedia contents have been made online. It is highly desired to provide easy accessibility to such contents, i.e. efficient and precise retrieval of images that satisfies users' needs. Towards this goal, content-based image retrieval (CBIR) has been intensively studied in the research community, while text-based search is better adopted in the industry. Both approaches have inherent disadvantages and limitations. Therefore, unlike the great success of text search, Web image search engines are still premature. In this thesis, we present iLike, a vertical image search engine which integrates both textual and visual features to improve retrieval performance. We bridge the semantic gap by capturing the meaning of each text term in the visual feature space, and re-weight visual features according to their significance to the query terms. We also bridge the user intention gap since we are able to infer the "visual meanings" behind the textual queries. Last but not least, we provide a visual thesaurus, which is generated from the statistical similarity between the visual space representation of textual terms. Experimental results show that our approach improves both precision and recall, compared with content-based or text-based image retrieval techniques. More importantly, search results from iLike are more consistent with users' perception of the query terms
    • …
    corecore