27 research outputs found

    Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development

    Get PDF
    From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases

    PHÂN LOẠI THƯ RÁC VỚI GIẢI THUẬT BOOSTING CÂY QUYẾT ĐỊNH NGẪU NHIÊN XIÊN PHÂN ĐƠN GIẢN

    Get PDF
    Trong bài viết này chúng tôi đưa ra hướng tiếp cận học tự động để phát hiện thư rác với giải thuật Boosting cây quyết định ngẫu nhiên xiên phân đơn giản (Boosting of Random Oblique Decision Stump). Để thực hiện, đầu tiên phải tạo ra tập dữ liệu gồm một bộ sưu tập các thư rác và thư không phải là thư rác. Kế tiếp thực hiện tiền xử lý dữ liệu, bao gồm các bước phân tích từ vựng, chọn tập hợp từ hữu dụng để phân loại thư rác, xây dựng mô hình túi từ. Bước tiền xử lý sinh ra tập dữ liệu có số chiều rất lớn, chúng tôi đề nghị giải thuật mới có tên là Boosting cây quyết định ngẫu nhiên xiên phân đơn giản cho phép phân lớp hiệu quả tập dữ liệu này. Kết quả thực nghiệm trên tập dữ liệu thực thu thập từ 1143 thư rác và 778 thư không phải thư rác cho thấy giải thuật do chúng tôi đề nghị phân lớp chính xác hơn so với giải thuật SVM và Naùve Bayes qua các tiêu chí so sánh như Accuracy, F1-Measure, Precision, TP Rate và TN Rate

    Cognitive Spam Recognition Using Hadoop and Multicast-Update

    Get PDF
    In today's world of exponentially growing technology, spam is a very common issue faced by users on the internet. Spam not only hinders the performance of a network, but it also wastes space and time, and causes general irritation and presents a multitude of dangers - of viruses, malware, spyware and consequent system failure, identity theft, and other cyber criminal activity. In this context, cognition provides us with a method to help improve the performance of the distributed system. It enables the system to learn what it is supposed to do for different input types as different classifications are made over time and this learning helps it increase its accuracy as time passes. Each system on its own can only do so much learning, because of the limited sample set of inputs that it gets to process. However, in a network, we can make sure that every system knows the different kinds of inputs available and learns what it is supposed to do with a better success rate. Thus, distribution and combination of this cognition across different components of the network leads to an overall improvement in the performance of the system. In this paper, we describe a method to make machines cognitively label spam using Machine Learning and the Naive Bayesian approach. We also present two possible methods of implementation - using a MapReduce Framework (hadoop), and also using messages coupled with a multicast-send based network - with their own subtypes, and the pros and cons of each. We finally present a comparative analysis of the two main methods and provide a basic idea about the usefulness of the two in various different scenarios

    A communication model with limited information-processing capacity of recipients

    Full text link
    We develop a system dynamics model of message-based communication, where the information-processing capacity of message recipients is limited. Profit-seeking broadcasters send messages, but only some of these messages are valuable to recipients. Recipients cannot determine whether or not a message is valuable until it is processed. Information overload occurs when more messages arrive than recipients can process. Numerical experiments test alternative approaches for mitigating information overload. We show that message filtering can increase the flow of for-profit communication. Market-based mechanisms, while aimed at improving the social outcome, can actually lead to suboptimal results and to a complete collapse of for-profit communication. Copyright © 2008 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/61449/1/407_ftp.pd

    Hybrid approach for spam email detection

    Get PDF
    On this era, email is a convenient way to enable the user to communicate everywhere in the world which it has the internet. It is because of the economic and fast method of communication. The email message can send to the single user or distribute to the group. Majority of the users does not know the life exclusive of e-mail. For this issue, it becomes an email as the medium of communication of a malicious person. This project aimed at Spam Email. This project concentrated on a hybrid approach namely Neural Network (NN) and Particle Swarm Optimization (PSO) designed to detect the spam emails. The comparisons between the hybrid approach for NN_PSO with GA algorithm and NN classifiers to show the best performance for spam detection. The Spambase used contains 1813 as spams (39.40%) and 2788 as non-spam (60.6%) implemented on these algorithms. The comparisons performance criteria based on accuracy, false positive, false negative, precision, recall and f-measure. The feature selection used by applying GA algorithm to reducing the redundant and irrelevant features. The performance of F-Measure shows that the hybrid NN_PSO, GA_NN and NN are 94.10%, 92.60% and 91.39% respectively. The results recommended using the hybrid of NN_PSO with GA algorithm for the best performance for spam email detection

    Internet personalization as a marketing trend

    Get PDF
    The market is a key entity that sets the pace of change in various areas of business, not least marketing. The changes that have been taking place in the market over the last three decades mean that marketers need to stay up-to-date and in step with the developments of the times. This means, in particular, following the development of technology and its implementation in business activities. In marketing, new marketing trends have been applied as businesses have dealt with the impact of the pandemic crisis associated with COVID-19. The coronavirus crisis accelerated a fundamental change in the communication of businesses, organisations, governments, etc. The trend is towards personalised, active communication towards the customer, made possible by the rapid development of information technology and its computing power. This paper deals with the personalization of web pages and their use in the marketing processes of a company with emphasis on the current trends in this area. However, it also mentions past and anticipated future trends in Internet personalization. The aim of the text is then to provide a theoretical comparison of current web personalization techniques for marketing processes and to propose a framework for their use in practice. The paper presents research questions and uses secondary data analysis in the form of a literature review and content analysis. The text contributes to the discussion by identifying the practical application of the defined web personalization techniques and their position in marketing process strategies designed for online activities
    corecore