1,255 research outputs found

    Intelligent Support for Information Retrieval of Web Documents

    Get PDF
    The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions

    Enhanced PL-WAP tree method for incremental mining of sequential patterns.

    Get PDF
    Sequential mining as web usage mining has been used in improving web site design, increasing volume of e-business and providing marketing decision support. This thesis proposes PL4UP and EPL4UP algorithms which use the PLWAP tree structure to incrementally update sequential patterns. PL4UP does not scan old DB except when previous small 1-itemsets become large in updated database during which time its scans only all transactions in the old database that contain any small itemsets. EPL4UP rebuilds the old PLWAP tree using only the list of previous small itemsets once rather than scanning the entire old database twice like original PLWAP. PL4UP and EPL4UP first update old frequent patterns on the small PLWAP tree built for only the incremented part of the database, then they compare new added patterns generated from the small tree with the old frequent patterns to reduce the number of patterns to be checked on the old PLWAP tree. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .C47. Source: Masters Abstracts International, Volume: 42-03, page: 0959. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    Mobile Search Engine using Clustering and Query Expansion

    Get PDF
    Internet content is growing exponentially and searching for useful content is a tedious task that we all deal with today. Mobile phones lack of screen space and limited interaction methods makes traditional search engine interface very inefficient. As the use of mobile internet continues to grow there is a need for an effective search tool. I have created a mobile search engine that uses clustering and query expansion to find relevant web pages efficiently. Clustering organizes web pages into groups that reflect different components of a query topic. Users can ignore clusters that they find irrelevant so they are not forced to sift through a long list of off-topic web pages. Query expansion uses query results, dictionaries, and cluster labels to formulate additional terms to manipulate the original query. The new manipulated query gives a more in depth result that eliminates noise. I believe that these two techniques are effective and can be combined to make the ultimate mobile search engine

    Bidirectional Growth based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns

    Get PDF
    Web sequential patterns are important for analyzing and understanding users behaviour to improve the quality of service offered by the World Wide Web. Web Prefetching is one such technique that utilizes prefetching rules derived through Cyclic Model Analysis of the mined Web sequential patterns. The more accurate the prediction and more satisfying the results of prefetching if we use a highly efficient and scalable mining technique such as the Bidirectional Growth based Directed Acyclic Graph. In this paper, we propose a novel algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of web sequential Patterns (BGCAP) that effectively combines these strategies to generate prefetching rules in the form of 2-sequence patterns with Periodicity and threshold of Cyclic Behaviour that can be utilized to effectively prefetch Web pages, thus reducing the users perceived latency. As BGCAP is based on Bidirectional pattern growth, it performs only (log n+1) levels of recursion for mining n Web sequential patterns. Our experimental results show that prefetching rules generated using BGCAP is 5-10 percent faster for different data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition, BGCAP generates about 5-15 percent more prefetching rules than TD-Mine.Comment: 19 page

    Optimized Model of Recommendation System for E-Commerce Website

    Get PDF
    The purpose of this work is to optimize the recommendation system by creating a new model of recommender system with different services in a global e-commerce website. In this model the most effective data sources are integrated to increase the accuracy of recommendations system, which provides the client more intuitive browsing categories interface. The sources used for this model are the user2019;s searching log on the global website, and data referred extracted from search engines, more clicked URLs, highly rated items, and the recommendation algorithms of new users and new items. In additions, user2019;s interests based on locations, and the hot releases items recommended by the admin or shop owner of the e-commerce website according to the website marketing strategy. When the users browse the website, the data sources will automatically combine to incorporate the derived structure and associate items for each category into a new browsing recommendation interface. The advantages of this model will assist the users to discover their real interested items with flexibility and high efficiency; it also provides some solutions for some serious problems and challenges that exist in the current recommendation services. Data mining technology and clustering algorithms have been proposed and applied to perform the idea of this work. ASP.NET is the implementation tool for the application website, Microsoft SQL server is used for database management

    Adaptive content mapping for internet navigation

    Get PDF
    The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database
    • …
    corecore