345 research outputs found
THE USE OF RECOMMENDER SYSTEMS IN WEB APPLICATIONS – THE TROI CASE
Avoiding digital marketing, surveys, reviews and online users behavior approaches on digital age are the key elements for a powerful businesses to fail, there are some systems that should preceded some artificial intelligence techniques. In this direction, the use of data mining for recommending relevant items as a new state of the art technique is increasing user satisfaction as well as the business revenues. And other related information gathering approaches in order to our systems thing and acts like humans. To do so there is a Recommender System that will be elaborated in this thesis. How people interact, how to calculate accurately and identify what people like or dislike based on their online previous behaviors. The thesis includes also the methodologies recommender system uses, how math equations helps Recommender Systems to calculate user’s behavior and similarities. The filters are important on Recommender System, explaining if similar users like the same product or item, which is the probability of neighbor user to like also. Here comes collaborative filters, neighborhood filters, hybrid recommender system with the use of various algorithms the Recommender Systems has the ability to predict whether a particular user would prefer an item or not, based on the user’s profile and their activities. The use of Recommender Systems are beneficial to both service providers and users. Thesis cover also the strength and weaknesses of Recommender Systems and how involving Ontology can improve it. Ontology-based methods can be used to reduce problems that content-based recommender systems are known to suffer from. Based on Kosovar’s GDP and youngsters job perspectives are desirable for improvements, the demand is greater than the offer. I thought of building an intelligence system that will be making easier for Kosovars to find the appropriate job that suits their profile, skills, knowledge, character and locations. And that system is called TROI Search engine that indexes and merge all local operating job seeking websites in one platform with intelligence features. Thesis will present the design, implementation, testing and evaluation of a TROI search engine. Testing is done by getting user experiments while using running environment of TROI search engine. Results show that the functionality of the recommender system is satisfactory and helpful
Human-competitive automatic topic indexing
Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance.
Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples.
This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
Using domain-specific knowledge to improve information retrieval performance
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal
Resource discovery in heterogeneous digital content environments
The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice.The concept of 'resource discovery' is central to our understanding of how users explore, navigate, locate and retrieve information resources. This submission for a PhD by Published Works examines a series of 11 related works which explore topics pertaining to resource discovery, each demonstrating heterogeneity in their digital discovery context. The assembled works are prefaced by nine chapters which seek to review and critically analyse the contribution of each work, as well as provide contextualization within the wider body of research literature. A series of conceptual sub-themes is used to organize and structure the works and the accompanying critical commentary. The thesis first begins by examining issues in distributed discovery contexts by studying collection level metadata (CLM), its application in 'information landscaping' techniques, and its relationship to the efficacy of federated item-level search tools. This research narrative continues but expands in the later works and commentary to consider the application of Knowledge Organization Systems (KOS), particularly within Semantic Web and machine interface contexts, with investigations of semantically aware terminology services in distributed discovery. The necessary modelling of data structures to support resource discovery - and its associated functionalities within digital libraries and repositories - is then considered within the novel context of technology-supported curriculum design repositories, where questions of human-computer interaction (HCI) are also examined. The final works studied as part of the thesis are those which investigate and evaluate the efficacy of open repositories in exposing knowledge commons to resource discovery via web search agents. Through the analysis of the collected works it is possible to identify a unifying theory of resource discovery, with the proposed concept of (meta)data alignment described and presented with a visual model. This analysis assists in the identification of a number of research topics worthy of further research; but it also highlights an incremental transition by the present author, from using research to inform the development of technologies designed to support or facilitate resource discovery, particularly at a 'meta' level, to the application of specific technologies to address resource discovery issues in a local context. Despite this variation the research narrative has remained focussed on topics surrounding resource discovery in heterogeneous digital content environments and is noted as having generated a coherent body of work. Separate chapters are used to consider the methodological approaches adopted in each work and the contribution made to research knowledge and professional practice
Concept graphs: Applications to biomedical text categorization and concept extraction
As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification.
This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics
Mining Meaning from Wikipedia
Wikipedia is a goldmine of information; not just for its many readers, but
also for the growing community of researchers who recognize it as a resource of
exceptional scale and utility. It represents a vast investment of manual effort
and judgment: a huge, constantly evolving tapestry of concepts and relations
that is being applied to a host of tasks.
This article provides a comprehensive description of this work. It focuses on
research that extracts and makes use of the concepts, relations, facts and
descriptions found in Wikipedia, and organizes the work into four broad
categories: applying Wikipedia to natural language processing; using it to
facilitate information retrieval and information extraction; and as a resource
for ontology building. The article addresses how Wikipedia is being used as is,
how it is being improved and adapted, and how it is being combined with other
structures to create entirely new resources. We identify the research groups
and individuals involved, and how their work has developed in the last few
years. We provide a comprehensive list of the open-source software they have
produced.Comment: An extensive survey of re-using information in Wikipedia in natural
language processing, information retrieval and extraction and ontology
building. Accepted for publication in International Journal of Human-Computer
Studie
Recent Trends in Software Engineering Research As Seen Through Its Publications
This study provides some insight into the field of software engineering through analysis of its recent research publications. Data for this study are taken from the ACM\u27s Guide to Computing Literature (GUIDE) They include both the professionally assigned Computing Classification System (CCS) descriptors and the title text of each software engineering publication reviewed by the GUIDE from 1998 through 2001.
The first part of this study provides a snapshot of software engineering by applying co-word analysis techniques to the data. This snapshot indicates recent themes or areas of interest, which, when compared with the results from earlier studies, reveal current trends in software engineering.
Software engineering continues to have no central focus. Concepts like software development, process improvement, applications, parallelism, and user interfaces are persistent and, thus, help define the field, but they provide little guidance for researchers or developers of academic curricula.
Of more interest and use are the specific themes illuminated by this study, which provide a clearer indication of the current interests of the field. Two prominent themes are the related issues of programming-in-the-large and best practices.
Programming-in-the-large is the term often applied to large-scale and long-term software development, where project and people management, code reusability, performance measures, documentation, and software maintenance issues take on special importance. These issues began emerging in earlier periods, but seem to have risen to prominence during the current period.
Another important discovery is the trend in software development toward using networking and the Internet. Many network- and Internet-related descriptors were added to the CCS in 1998. The prominent appearance and immediate use of these descriptors during this period indicate that this is a real trend and not just an aberration caused by their recent addition.
The titles of the period reflect the prominent themes and trends. In addition to corroborating the keyword analysis, the title text confirms the relevance of the CCS and its most recent revision.
By revealing current themes and trends in software engineering, this study provides some guidance to the developers of academic curricula and indicates directions for further research and study
Automatic Image Annotation using Image Clustering in Multi – Agent Society
The rapid growth of the internet provides tremendous resource for
information in different domains (text, image, voice, and many others). This
growth introduces new challenge to hit an exact match due to huge number
of document returned by search engines where millions of items can be
returned for certain subject. Images have been important resources for
information, and billions of images are searched to fulfill user demands,
which face the mentioned challenge. Automatic image annotation is a
promising methodology for image retrieval. However most current
annotation models are not yet sophisticated enough to produce high quality
annotations. This thesis presents online intelligent indexing for image
repositories based on their contents, although content based indexing and
retrieving systems have been introduced, this thesis is adding an intelligent
technique to re-index images upon better understanding for its composed
concepts. Collaborative Agent scheme has been developed to promote
objects of an image to concepts and re-index it according to domain
specifications. Also this thesis presents automatic annotation system based
on the interaction between intelligent agents. Agent interaction is synonym
to socialization behavior dominating Agent society. The presented system is
exploiting knowledge evolution revenue due to the socialization to charge up
the annotation process
- …