Search CORE

2 research outputs found

Online News Headline Extraction

Author: Anuar Anis Akmal
Publication venue: Universiti Teknologi PETRONAS
Publication date: 01/05/2011
Field of study

This paper presents the online headline news extraction application. According to research, today's online news has grown 11 % year over year. Users nowadays are overwhehned with too much on the internet. The current online news also is not visible for user to read the news; this is because it is full of the advertisement and other umelated thing besides the news itself. This paper pmposes the proposal of an EHeadlines News Extraction Framework that illustrated the extracted information on the news. This project will only cover the news reported or news available on the local online English newspaper and at the mean time try to extract the headlines of the news frrst. At the end of the project, it will highlight the application that can illustrate the extracted information on the news

UTPedia

A teachable semi-automatic web information extraction system based on evolved regular expression patterns

Author: Nor Zainah Siau (7169549)
Publication venue
Publication date: 01/01/2014
Field of study

This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

Loughborough University Institutional Repository