345 research outputs found

    THE USE OF RECOMMENDER SYSTEMS IN WEB APPLICATIONS – THE TROI CASE

    Get PDF
    Avoiding digital marketing, surveys, reviews and online users behavior approaches on digital age are the key elements for a powerful businesses to fail, there are some systems that should preceded some artificial intelligence techniques. In this direction, the use of data mining for recommending relevant items as a new state of the art technique is increasing user satisfaction as well as the business revenues. And other related information gathering approaches in order to our systems thing and acts like humans. To do so there is a Recommender System that will be elaborated in this thesis. How people interact, how to calculate accurately and identify what people like or dislike based on their online previous behaviors. The thesis includes also the methodologies recommender system uses, how math equations helps Recommender Systems to calculate user’s behavior and similarities. The filters are important on Recommender System, explaining if similar users like the same product or item, which is the probability of neighbor user to like also. Here comes collaborative filters, neighborhood filters, hybrid recommender system with the use of various algorithms the Recommender Systems has the ability to predict whether a particular user would prefer an item or not, based on the user’s profile and their activities. The use of Recommender Systems are beneficial to both service providers and users. Thesis cover also the strength and weaknesses of Recommender Systems and how involving Ontology can improve it. Ontology-based methods can be used to reduce problems that content-based recommender systems are known to suffer from. Based on Kosovar’s GDP and youngsters job perspectives are desirable for improvements, the demand is greater than the offer. I thought of building an intelligence system that will be making easier for Kosovars to find the appropriate job that suits their profile, skills, knowledge, character and locations. And that system is called TROI Search engine that indexes and merge all local operating job seeking websites in one platform with intelligence features. Thesis will present the design, implementation, testing and evaluation of a TROI search engine. Testing is done by getting user experiments while using running environment of TROI search engine. Results show that the functionality of the recommender system is satisfactory and helpful

    Human-competitive automatic topic indexing

    Get PDF
    Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance. Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples. This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages

    Using domain-specific knowledge to improve information retrieval performance

    Get PDF
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Resource discovery in heterogeneous digital content environments

    Get PDF
    The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice.The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice

    Concept graphs: Applications to biomedical text categorization and concept extraction

    Get PDF
    As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

    Mining Meaning from Wikipedia

    Get PDF
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.Comment: An extensive survey of re-using information in Wikipedia in natural language processing, information retrieval and extraction and ontology building. Accepted for publication in International Journal of Human-Computer Studie

    Recent Trends in Software Engineering Research As Seen Through Its Publications

    Get PDF
    This study provides some insight into the field of software engineering through analysis of its recent research publications. Data for this study are taken from the ACM\u27s Guide to Computing Literature (GUIDE) They include both the professionally assigned Computing Classification System (CCS) descriptors and the title text of each software engineering publication reviewed by the GUIDE from 1998 through 2001. The first part of this study provides a snapshot of software engineering by applying co-word analysis techniques to the data. This snapshot indicates recent themes or areas of interest, which, when compared with the results from earlier studies, reveal current trends in software engineering. Software engineering continues to have no central focus. Concepts like software development, process improvement, applications, parallelism, and user interfaces are persistent and, thus, help define the field, but they provide little guidance for researchers or developers of academic curricula. Of more interest and use are the specific themes illuminated by this study, which provide a clearer indication of the current interests of the field. Two prominent themes are the related issues of programming-in-the-large and best practices. Programming-in-the-large is the term often applied to large-scale and long-term software development, where project and people management, code reusability, performance measures, documentation, and software maintenance issues take on special importance. These issues began emerging in earlier periods, but seem to have risen to prominence during the current period. Another important discovery is the trend in software development toward using networking and the Internet. Many network- and Internet-related descriptors were added to the CCS in 1998. The prominent appearance and immediate use of these descriptors during this period indicate that this is a real trend and not just an aberration caused by their recent addition. The titles of the period reflect the prominent themes and trends. In addition to corroborating the keyword analysis, the title text confirms the relevance of the CCS and its most recent revision. By revealing current themes and trends in software engineering, this study provides some guidance to the developers of academic curricula and indicates directions for further research and study

    Automatic Image Annotation using Image Clustering in Multi – Agent Society

    Get PDF
    The rapid growth of the internet provides tremendous resource for information in different domains (text, image, voice, and many others). This growth introduces new challenge to hit an exact match due to huge number of document returned by search engines where millions of items can be returned for certain subject. Images have been important resources for information, and billions of images are searched to fulfill user demands, which face the mentioned challenge. Automatic image annotation is a promising methodology for image retrieval. However most current annotation models are not yet sophisticated enough to produce high quality annotations. This thesis presents online intelligent indexing for image repositories based on their contents, although content based indexing and retrieving systems have been introduced, this thesis is adding an intelligent technique to re-index images upon better understanding for its composed concepts. Collaborative Agent scheme has been developed to promote objects of an image to concepts and re-index it according to domain specifications. Also this thesis presents automatic annotation system based on the interaction between intelligent agents. Agent interaction is synonym to socialization behavior dominating Agent society. The presented system is exploiting knowledge evolution revenue due to the socialization to charge up the annotation process
    corecore