15 research outputs found

    Data pre-processing of website browsing record: An initial step for web page classification

    Get PDF
    The Internet utilization has resulted in an increase in the number of web pages on the World Wide Web. The classification of web pages is required to organize the growing number of web pages. A web page classification system is proposed to be constructed using a deep learning algorithm. The initial step for web page classification is data pre-processing. The website browsing record is used as a dataset in this study. The raw dataset needs to be pre-processing to fetch the cleaned data by removing missing value data, redundant data, and error data. There are many steps in data pre-processing which include data cleaning and web content pre-processing. The main contribution of this paper is to investigate how to do data pre-processing on website browsing records that focusing on the Game and Online Video web pages that will be utilized as the dataset to construct the web page classification model. After doing the data pre-processing, the number of datasets will be reduced. This shows many datasets have been removed because it is inactive and not suitable to be used in this study as the dataset of Game and Online Video web pages

    A model of web page classification using convolutional neural network (CNN): a tool to prevent internet addiction

    Get PDF
    Game and Online Video Streaming are the most frequently visited web pages. Internet addiction may be negatively impacted by users who spend too much time on these types of web pages. Access to Game and Online Video Streaming web pages needs to be limited in order to combat the issue of internet addiction. Therefore, a tool that can categorize incoming web pages based on their content is required. This paper is proposing a web page classification model using a Convolutional Neural Network (CNN) to classify the web page whether it is a Game or Online Video Streaming based on the pattern of words in the word cloud image generated from the web page text content. The proposed web page classification model has achieved 85.6% accuracy

    A survey on technique for solving web page classification problem

    Get PDF
    Nowadays, the number of web pages on the World Wide Web has been increasing due to the popularity of the Internet usage. The web page classification is needed in order to organize the increasing number of web pages. There are many web page classification techniques that have been proposed by the other researchers. However, there is no comprehensive survey on the performance of the techniques for the web page classification. In this paper, surveys of the different web page classification techniques with the result of the techniques achieved are presented. The existing works of web page classification are reviewed. Based on the survey, we found that the neural network technique namely Convolutional Neural Network (CNN) produce high F-measure value and meet the real-time requirement for classification compared to the other machine learning technique

    Use Word Cloud Image Of Web Page Text Content On Convolutional Neural Network (CNN) For Classification Of Web Pages

    Get PDF
    In today's environment, people can easily use the internet to find information by visiting web pages. Most people like to visit web pages that offer games and videos to watch online. People who spend a lot of time on web pages like these can become addicted to the internet and it can have a bad effect on them. Access to web pages that offer games and streaming videos needs to be limited to stop people from being addicted to the internet. It needs a tool that can classify web pages category based on its content. Due to lack of matrix representation that unable to handle long web page text content, this study uses a technique which is word cloud image to visualize the words that has been extracted from the text content web page after performing data pre-processing. The most popular words from the text content web page are displayed in big size and appear in center of the word cloud image. The most popular words are the words that frequently appear in the text content web page, and it related to describe what the web page content is about. The Convolutional Neural Network (CNN) identifies the pattern of words displayed in the central areas of the word cloud image to classify the category that the web page belongs to. The proposed model for classifying web pages has an accuracy of 0.86. The proposed model can be used, for example, by the institution to set rules and limit the usage of the internet for the users to surf the web pages that offer games and streaming videos. It will be one of the ways to prevent users from getting internet addiction

    A Convolutional Neural Network (CNN) Classification Model for Web Page: A Tool for Improving Web Page Category Detection Accuracy

    Get PDF
    Game and Online Video Streaming are the most viewed web pages. Users who spend too much time on these types of web pages may suffer from internet addiction. Access to Game and Online Video Streaming web pages should be restricted to combat internet addiction. A tool is required to recognise the category of web pages based on the text content of the web pages. Due to the unavailability of a matrix representation that can handle long web page text content, this study employs a document representation known as word cloud image to visualise the words extracted from the text content web page after data pre-processing. The most popular words are shown in large size and appear in the centre of the word cloud image. The most common words are the words that appear frequently in the text content web page and are related to describing what the web page content is about. The Convolutional Neural Network (CNN) recognises the pattern of words presented in the core portions of the word cloud image to categorise the category to which the web page belongs. The proposed model for web page classification has been compared with the other web page classification models. It shows the good result that achieved an accuracy of 85.6%. It can be used as a tool that helps to make identifying the category of web pages more accurat

    Web page classification using convolutional neural network (CNN) towards eliminating internet addiction

    Get PDF
    In the modern world, everyone has access to the internet as a source of information by surfing the web pages. The most popular web page surf is on Game and Online Video Streaming. Users who are spending too much time on these kinds of web pages may lead to a negative impact on Internet addiction. To overcome the internet addiction problem, access to Game and Online Video Streaming web pages needs to be restricted. Thus, a mechanism that can classify the category of the incoming web page based on the web page content is needed. This paper is proposing a web page classification model using a Convolutional Neural Network (CNN) to classify the web page, then identify whether it is a Game or Online Video Streaming based on the pattern of words in the word cloud image taken from the web page text content. The proposed web page classification model has achieved 82.22 % accuracy to detect the pre-classifled web pages

    Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification

    Get PDF
    The increased usage of the internet worldwide has led to an abundance of web pages designed to supply information to internet users. The use of web page classification is becoming increasingly necessary to organize the growing number of web pages. This classification model serves as a tool to restrict internet usage to specific categories of web pages. To develop the classification model, it’s crucial to check the quality of the dataset, as it determines the performance of the web page classification model. Raw datasets are typically unreliable and subject to noise, which complicates data analysis. This is why data pre-processing is necessary to prepare the dataset properly. In this study, website browsing records serve as the dataset. The primary goal of this paper is to investigate data pre-processing techniques for website browsing records, focusing on Game and Online Video Streaming web pages. Data pre-processing involves two main steps: data cleaning and web content pre-processing. After completing the data cleaning process, the datasets are reduced from the original. This demonstrates that many datasets can be eliminated due to their inactivity or unsuitability as the datasets for Game and Online Video Streaming web pages. Meanwhile, web content pre-processing removes noise from an HTML document, retaining only relevant words that can represent the web page by creating a word cloud image. Convolutional Neural Networks (CNN) will be used to construct a model for categorizing web pages to determine whether they fall under Game or Online Video Streaming. The pre-processed data will be used as the input for this model

    Automatic Topic-Based Web Page Classification Using Deep Learning

    Get PDF
    The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive amount of web pages by classifying them to different categories based on the web page content. Different machine learning algorithms have been employed as web page classifiers to categorise the web pages. However, there is lack of study that review classification of web pages using deep learning. In this study, the automatic topic-based classification of web pages utilising deep learning that has been proposed by many key researchers are reviewed. The relevant research papers are selected from reputable research databases. The review process looked at the dataset, features, algorithm, pre-processing used in classification of web pages, document representation technique and performance of the web page classification model. The document representation technique used to represent the web page features is an important aspect in the classification of web pages as it affects the performance of the web page classification model. The integral web page feature is the textual content. Based on the review, it was found that the image based web page classification showed higher performance compared to the text based web page classification. Due to lack of matrix representation that can effectively handle long web page text content, a new document representation technique which is word cloud image can be used to visualize the words that have been extracted from the text content web page

    Development of Attendance Management System: an Experience

    Get PDF
    Class teacher need to take students’ daily attendance for every school days and analyze of attendance data. It is hassle for teacher to make analysis of attendance data manually and ensure no error prone in that pro-cess. This study proposes to develop a web based application, Attendance Management System (AMS) to solve this problem. This system is part of SMA (Sekolah Menengah Agama) Management System and is implementing in Pahang religious schools under supervision of Jabatan Agama Islam Pa-hang (JAIP). This system includes functions for class teacher to record stu-dent attendance, Guru Penolong Kanan (GPK) to generate school attendance report and the discipline teacher to view truancy statistics. The result shows that this system can help the school management to manage the student attendance

    Dashboard information model for social research network sites

    Get PDF
    The Social Research Network Sites (SRNS) is an online platform used by researchers for research related activities. Due to huge amounts of information in the current SRNS, sometimes this information overwhelms the researchers. A research-related dashboard information model is proposed to minimize the information overflow in the SRNS and it provides awareness on research-related information. The analysis on the relevance of having a dashboard has been done, and the results shows that it is a significant tool in assisting the researcher needs on monitoring their own research performance, monitoring research trends and alerting them with upcoming events. The proposed dashboard items that are possible to be included in the dashboard information model are identified based on analysis from literature studies and by review on the current SRNS. A survey was conducted in order to validate the dashboard items. Based on the result of factor analysis, the dashboard items can be grouped into three which are publication impact, publication achievements and alert on upcoming events. From the three group of the dashboard items, the dashboard information model is developed that has three dashboard components which are researcher performance (M1), impact of researcher publication (M2) and research events alert (M3). Then, we design a mock-up prototyping which represent the dashboard information model. The mock-up prototyping has been used for the dashboard information model verification purpose through interview with selected researchers. The result from the interview has shown that the researchers accepted and intended to use the mock-up prototyping that representing the dashboard information model. A few suggestions for enhancement of the dashboard items to be included in the dashboard information model have been received from the feedbacks. The dashboard information model that has been established is useful to be embedded in SRNS in order to aware the researchers on the research-related information. The embedded of the dashboard information model in the SRNS can attract more users to use the SRNS. The developers of SRNS can utilize the dashboard information model as a guideline in developing a better SRNS for the researchers
    corecore