606 research outputs found

    Improving web search by categorization, clustering, and personalization

    Get PDF
    This research combines Web snippet1 categorization, clustering and personalization techniques to recommend relevant results to users. RIB - Recommender Intelligent Browser which categorizes Web snippets using socially constructed Web directory such as the Open Directory Project (ODP) is to bedeveloped. By comparing the similarities between the semantics of each ODP category represented by the category-documents and the Web snippets, the Web snippets are organized into a hierarchy. Meanwhile, the Web snippets are clustered to boost the quality of the categorization. Based on an automatically formed user profile which takes into consideration desktop computer informationand concept drift, the proposed search strategy recommends relevant search results to users. This research also intends to verify text categorization, clustering, and feature selection algorithms in the context where only Web snippets are available

    Optimized Model of Recommendation System for E-Commerce Website

    Get PDF
    The purpose of this work is to optimize the recommendation system by creating a new model of recommender system with different services in a global e-commerce website. In this model the most effective data sources are integrated to increase the accuracy of recommendations system, which provides the client more intuitive browsing categories interface. The sources used for this model are the user2019;s searching log on the global website, and data referred extracted from search engines, more clicked URLs, highly rated items, and the recommendation algorithms of new users and new items. In additions, user2019;s interests based on locations, and the hot releases items recommended by the admin or shop owner of the e-commerce website according to the website marketing strategy. When the users browse the website, the data sources will automatically combine to incorporate the derived structure and associate items for each category into a new browsing recommendation interface. The advantages of this model will assist the users to discover their real interested items with flexibility and high efficiency; it also provides some solutions for some serious problems and challenges that exist in the current recommendation services. Data mining technology and clustering algorithms have been proposed and applied to perform the idea of this work. ASP.NET is the implementation tool for the application website, Microsoft SQL server is used for database management

    On Two Web IR Boosting Tools: Clustering and Ranking

    Get PDF
    This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience. In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly. There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs. When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis. An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies

    Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

    Get PDF
    This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the ?y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-?rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted external\u27 metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second

    An application of the FIS-CRM model to the FISS metasearcher: Using fuzzy synonymy and fuzzy generality for representing concepts in documents

    Get PDF
    AbstractThe main objective of this work is to improve the quality of the results produced by the Internet search engines. In order to achieve it, the FIS-CRM model (Fuzzy Interrelations and Synonymy based Concept Representation Model) is proposed as a mechanism for representing the concepts (not only terms) contained in any kind of document. This model, based on the vector space model, incorporates a fuzzy readjustment process of the term weights of each document. The readjustment lies on the study of two types of fuzzy interrelations between terms: the fuzzy synonymy interrelation and the fuzzy generality interrelations (“broader than” and “narrower than” interrelations). The model has been implemented in the FISS metasearcher (Fuzzy Interrelations and Synonymy based Searcher) that, using a soft-clustering algorithm (based on the SISC algorithm), dynamically produces a hierarchical structure of groups of “conceptually related” documents (snippets of web pages, in this case)

    An Efficient Web Page Recommendation Based on Preference Footprint to Browsed Pages

    Get PDF
    This paper proposes a new scheme for web page recommendation which reflects the preference of each user to the recommended pages in an efficient and effective manner. The basic idea of the scheme is to combine the notion of preference footprint to browsed pages with the collaborative filtering. More concretely, we introduce the notion of "tags" similar to conventional SBS (Social Bookmark Service), and attach all tags associated with a user to a page when it is browsed by him. We implemented a prototype of the proposed scheme, and conducted preliminary experiments to evaluate the performance of the scheme. The result of experiments indicates that it takes less than 0.5 sec to reorder a list of 500 URLs received from a search engine according to the preference of users

    Colombus: providing personalized recommendations for drifting user interests

    Get PDF
    The query formulationg process if often a problematic activity due to the cognitive load that it imposes to users. This issue is further amplified by the uncertainty of searchers with regards to their searching needs and their lack of training on effective searching techniques. Also, given the tremendous growth of the world wide web, the amount of imformation users find during their daily search episodes is often overwhelming. Unfortunatelly, web search engines do not follow the trends and advancements in this area, while real personalization features have yet to appear. As a result, keeping up-to-date with recent information about our personal interests is a time-consuming task. Also, often these information requirements change by sliding into new topics. In this case, the rate of change can be sudden and abrupt, or more gradual. Taking into account all these aspects, we believe that an information assistant, a profile-aware tool capable of adapting to users’ evolving needs and aiding them to keep track of their personal data, can greatly help them in this endeavor. Information gathering from a combination of explicit and implicit feedback could allow such systems to detect their search requirements and present additional information, with the least possible effort from them. In this paper, we describe the design, development and evaluation of Colombus, a system aiming to meet individual needs of the searchers. The system’s goal is to pro-actively fetch and present relevant, high quality documents on regular basis. Based entirely on implicit feedback gathering, our system concentrates on detecting drifts in user interests and accomodate them effectively in their profiles with no additional interaction from their side. Current methodologies in information retrieval do not support the evaluation of such systems and techniques. Lab-based experiments can be carried out in large batches but their accuracy often questione. On the other hand, user studies are much more accurate, but setting up a user base for large-scale experiments is often not feasible. We have designed a hybrid evaluation methodology that combines large sets of lab experiments based on searcher simulations together with user experiments, where fifteen searchers used the system regularly for 15 days. At the first stage, the simulation experiments were aiming attuning Colombus, while the various component evaluation and results gathering was carried out at the second stage, throughout the user study. A baseline system was also employed in order to make a direct comparison of Colombus against a current web search engine. The evaluation results illustrate that the Personalized Information Assistant is effective in capturing and satisfying users’ evolving information needs and providing additional information on their behalf
    corecore