744 research outputs found

    Website Content Extraction Using Web Structure Analysis

    Get PDF
    The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient to relevant information within huge repository of data. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results. For this project a domain-oriented approach to Web data extraction and discuss it application to extracting news from Web Sites. It will use the abstraction method to identify important sections in a web document. The relevance information will be taken account and will be highlighted in order to develop a focused web content output. The fact-finding and data about the project are gathered from various sources such as internet, and books. The methodology used is a Waterfall Model that involves several phases which are Planning, Analysis, Design and Implementation. The result of this project is the display and review of web content extraction and how it being currently being developed which the goals is to give more usability and easiness toward web users

    Web Mining-Based Objective Metrics for Measuring Website Navigatability

    Get PDF
    Web site design is critical to the success of electronic commerce and digital government. Effective design requires appropriate evaluation methods and measurement metrics. The current research examines Web site navigability, a fundamental structural aspect of Web site design. We define Web site navigability as the extent to which a visitor can use a Web site’s hyperlink structure to locate target contents successfully in an easy and efficient manner. We propose a systematic Web site navigability evaluation method built on Web mining techniques. To complement the subjective self-reported metrics commonly used by previous research, we develop three objective metrics for measuring Web site navigability on the basis of the Law of Surfing. We illustrate the use of the proposed methods and measurement metrics with two large Web sites

    Website Content Extraction Using Web Structure Analysis

    Get PDF
    The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient to relevant information within huge repository of data. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results. For this project a domain-oriented approach to Web data extraction and discuss it application to extracting news from Web Sites. It will use the abstraction method to identify important sections in a web document. The relevance information will be taken account and will be highlighted in order to develop a focused web content output. The fact-finding and data about the project are gathered from various sources such as internet, and books. The methodology used is a Waterfall Model that involves several phases which are Planning, Analysis, Design and Implementation. The result of this project is the display and review of web content extraction and how it being currently being developed which the goals is to give more usability and easiness toward web users

    Visual Architecture based Web Information Extraction

    Full text link

    Web Mining for Web Personalization

    Get PDF
    Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented

    The use of web analytics on a small data set in an online media company : shifter´s case study

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe primary struggle in data analysis is the lack of talent in performing relevant and fit-to-business analyzes that retrieve knowledge and provides concise and clear action plans to today’s startups and small enterprises that exist online. Tracking, knowing and understanding the navigational patterns of user behavior for a 3 month period collection and using an Excel spreadsheet tool obtained a context for each piece of content produced and published by Shifter, an online media company. Investigations made after acquiring Shifter’s data resulted in recommendations for rethink and redesign the editorial content of the business to answer different community’s needs

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
    • …
    corecore