Search CORE

27 research outputs found

Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development

Author: Dan Andrei SITAR TAUT
Daniel MICAN
Publication venue
Publication date
Field of study

From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases

Research Papers in Economics

PHÂN LOẠI THƯ RÁC VỚI GIẢI THUẬT BOOSTING CÂY QUYẾT ĐỊNH NGẪU NHIÊN XIÊN PHÂN ĐƠN GIẢN

Author: Huỳnh Phụng Toàn
Nguyễn Minh Trung
Nguyễn Vũ Lâm
Đỗ Thanh Nghị
Publication venue: Can Tho University Publisher
Publication date: 01/05/2011
Field of study

Trong bài viết này chúng tôi đưa ra hướng tiếp cận học tự động để phát hiện thư rác với giải thuật Boosting cây quyết định ngẫu nhiên xiên phân đơn giản (Boosting of Random Oblique Decision Stump). Để thực hiện, đầu tiên phải tạo ra tập dữ liệu gồm một bộ sưu tập các thư rác và thư không phải là thư rác. Kế tiếp thực hiện tiền xử lý dữ liệu, bao gồm các bước phân tích từ vựng, chọn tập hợp từ hữu dụng để phân loại thư rác, xây dựng mô hình túi từ. Bước tiền xử lý sinh ra tập dữ liệu có số chiều rất lớn, chúng tôi đề nghị giải thuật mới có tên là Boosting cây quyết định ngẫu nhiên xiên phân đơn giản cho phép phân lớp hiệu quả tập dữ liệu này. Kết quả thực nghiệm trên tập dữ liệu thực thu thập từ 1143 thư rác và 778 thư không phải thư rác cho thấy giải thuật do chúng tôi đề nghị phân lớp chính xác hơn so với giải thuật SVM và Naùve Bayes qua các tiêu chí so sánh như Accuracy, F1-Measure, Precision, TP Rate và TN Rate

Directory of Open Access Journals

Cognitive Spam Recognition Using Hadoop and Multicast-Update

Author: K. Chandrasekaran
Mukund YR
Sunil Sandeep Nayak
Publication venue: RonPub
Publication date: 01/01/2015
Field of study

In today's world of exponentially growing technology, spam is a very common issue faced by users on the internet. Spam not only hinders the performance of a network, but it also wastes space and time, and causes general irritation and presents a multitude of dangers - of viruses, malware, spyware and consequent system failure, identity theft, and other cyber criminal activity. In this context, cognition provides us with a method to help improve the performance of the distributed system. It enables the system to learn what it is supposed to do for different input types as different classifications are made over time and this learning helps it increase its accuracy as time passes. Each system on its own can only do so much learning, because of the limited sample set of inputs that it gets to process. However, in a network, we can make sure that every system knows the different kinds of inputs available and learns what it is supposed to do with a better success rate. Thus, distribution and combination of this cognition across different components of the network leads to an overall improvement in the performance of the system. In this paper, we describe a method to make machines cognitively label spam using Machine Learning and the Naive Bayesian approach. We also present two possible methods of implementation - using a MapReduce Framework (hadoop), and also using messages coupled with a multicast-send based network - with their own subtypes, and the pros and cons of each. We finally present a comparative analysis of the two main methods and provide a basic idea about the usefulness of the two in various different scenarios

RonPub -- Research Online Publishing

A communication model with limited information-processing capacity of recipients

Author: Chittenden
Dai
Davenport
Fahlman
Goldman
Goodman
Greco
Houston
Hunter
Khong
Loder
Martin
McFadden
McWilliams
Pashler
Pavlov
Regan
Rubenking
Rudolph
Rust
Senge
Starch
Stix
Turley
Van Zandt
Publication venue: 'Wiley'
Publication date: 01/01/2008
Field of study

We develop a system dynamics model of message-based communication, where the information-processing capacity of message recipients is limited. Profit-seeking broadcasters send messages, but only some of these messages are valuable to recipients. Recipients cannot determine whether or not a message is valuable until it is processed. Information overload occurs when more messages arrive than recipients can process. Numerical experiments test alternative approaches for mitigating information overload. We show that message filtering can increase the flow of for-profit communication. Market-based mechanisms, while aimed at improving the social outcome, can actually lead to suboptimal results and to a complete collapse of for-profit communication. Copyright © 2008 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/61449/1/407_ftp.pd

CiteSeerX

Crossref

Deep Blue Documents at the University of Michigan

Hybrid approach for spam email detection

Author: Syed Hamed Syed Mohd. Anwar Alhabshi
Publication venue
Publication date: 01/06/2018
Field of study

On this era, email is a convenient way to enable the user to communicate everywhere in the world which it has the internet. It is because of the economic and fast method of communication. The email message can send to the single user or distribute to the group. Majority of the users does not know the life exclusive of e-mail. For this issue, it becomes an email as the medium of communication of a malicious person. This project aimed at Spam Email. This project concentrated on a hybrid approach namely Neural Network (NN) and Particle Swarm Optimization (PSO) designed to detect the spam emails. The comparisons between the hybrid approach for NN_PSO with GA algorithm and NN classifiers to show the best performance for spam detection. The Spambase used contains 1813 as spams (39.40%) and 2788 as non-spam (60.6%) implemented on these algorithms. The comparisons performance criteria based on accuracy, false positive, false negative, precision, recall and f-measure. The feature selection used by applying GA algorithm to reducing the redundant and irrelevant features. The performance of F-Measure shows that the hybrid NN_PSO, GA_NN and NN are 94.10%, 92.60% and 91.39% respectively. The results recommended using the hybrid of NN_PSO with GA algorithm for the best performance for spam email detection

Universiti Teknologi Malaysia Institutional Repository

Internet personalization as a marketing trend

Author: Kupec Michael
Publication venue: Faculty of Economics University of West Bohemia
Publication date: 01/01/2022
Field of study

The market is a key entity that sets the pace of change in various areas of business, not least marketing. The changes that have been taking place in the market over the last three decades mean that marketers need to stay up-to-date and in step with the developments of the times. This means, in particular, following the development of technology and its implementation in business activities. In marketing, new marketing trends have been applied as businesses have dealt with the impact of the pandemic crisis associated with COVID-19. The coronavirus crisis accelerated a fundamental change in the communication of businesses, organisations, governments, etc. The trend is towards personalised, active communication towards the customer, made possible by the rapid development of information technology and its computing power. This paper deals with the personalization of web pages and their use in the marketing processes of a company with emphasis on the current trends in this area. However, it also mentions past and anticipated future trends in Internet personalization. The aim of the text is then to provide a theoretical comparison of current web personalization techniques for marketing processes and to propose a framework for their use in practice. The paper presents research questions and uses secondary data analysis in the form of a literature review and content analysis. The text contributes to the discussion by identifying the practical application of the defined web personalization techniques and their position in marketing process strategies designed for online activities

DSpace at University of West Bohemia