2 research outputs found

    Online News Headline Extraction

    Get PDF
    This paper presents the online headline news extraction application. According to research, today's online news has grown 11 % year over year. Users nowadays are overwhehned with too much on the internet. The current online news also is not visible for user to read the news; this is because it is full of the advertisement and other umelated thing besides the news itself. This paper pmposes the proposal of an EHeadlines News Extraction Framework that illustrated the extracted information on the news. This project will only cover the news reported or news available on the local online English newspaper and at the mean time try to extract the headlines of the news frrst. At the end of the project, it will highlight the application that can illustrate the extracted information on the news

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements
    corecore