744 research outputs found
Website Content Extraction Using Web Structure Analysis
The Web poses itself as the largest data repository ever available in the history of
humankind. Major efforts have been made in order to provide efficient to relevant
information within huge repository of data. Although several techniques have been
developed to the problem of Web data extraction, their use is still not spread, mostly
because of the need for high human intervention and the low quality of the extraction
results. For this project a domain-oriented approach to Web data extraction and discuss
it application to extracting news from Web Sites. It will use the abstraction method to
identify important sections in a web document. The relevance information will be taken
account and will be highlighted in order to develop a focused web content output. The
fact-finding and data about the project are gathered from various sources such as
internet, and books. The methodology used is a Waterfall Model that involves several
phases which are Planning, Analysis, Design and Implementation. The result of this
project is the display and review of web content extraction and how it being currently
being developed which the goals is to give more usability and easiness toward web
users
Web Mining-Based Objective Metrics for Measuring Website Navigatability
Web site design is critical to the success of electronic commerce and digital government. Effective design requires appropriate evaluation methods and measurement metrics. The current research examines Web site navigability, a fundamental structural aspect of Web site design. We define Web site navigability as the extent to which a visitor can use a Web siteâs hyperlink structure to locate target contents successfully in an easy and efficient manner. We propose a systematic Web site navigability evaluation method built on Web mining techniques. To complement the subjective self-reported metrics commonly used by previous research, we develop three objective metrics for measuring Web site navigability on the basis of the Law of Surfing. We illustrate the use of the proposed methods and measurement metrics with two large Web sites
Website Content Extraction Using Web Structure Analysis
The Web poses itself as the largest data repository ever available in the history of
humankind. Major efforts have been made in order to provide efficient to relevant
information within huge repository of data. Although several techniques have been
developed to the problem of Web data extraction, their use is still not spread, mostly
because of the need for high human intervention and the low quality of the extraction
results. For this project a domain-oriented approach to Web data extraction and discuss
it application to extracting news from Web Sites. It will use the abstraction method to
identify important sections in a web document. The relevance information will be taken
account and will be highlighted in order to develop a focused web content output. The
fact-finding and data about the project are gathered from various sources such as
internet, and books. The methodology used is a Waterfall Model that involves several
phases which are Planning, Analysis, Design and Implementation. The result of this
project is the display and review of web content extraction and how it being currently
being developed which the goals is to give more usability and easiness toward web
users
Web Mining for Web Personalization
Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented
The use of web analytics on a small data set in an online media company : shifter´s case study
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe primary struggle in data analysis is the lack of talent in performing relevant and fit-to-business analyzes that retrieve knowledge and provides concise and clear action plans to todayâs startups and small enterprises that exist online. Tracking, knowing and understanding the navigational patterns of user behavior for a 3 month period collection and using an Excel spreadsheet tool obtained a context for each piece of content produced and published by Shifter, an online media company. Investigations made after acquiring Shifterâs data resulted in recommendations for rethink and redesign the editorial content of the business to answer different communityâs needs
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
- âŚ