Search CORE

10,703 research outputs found

A Survey of Model Used for Web User’s Browsing Behavior Prediction

Author: Chavda Pradipsinh K.
Dhobi Jitendra S.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/03/2015
Field of study

The motivation behind the work is that the prediction of web user’s browsing behavior while serving the Internet, reduces the user’s browsing access time and avoids the visit of unnecessary pages to ease network traffic. Various models such as fuzzy interference models, support vector machines (SVMs), artificial neural networks (ANNs), association rule mining (ARM), k-nearest neighbor(kNN) Markov model, Kth order Markov model, all-Kth Markov model and modified Markov model were proposed to handle Web page prediction problem. Many times, the combination of two or more models were used to achieve higher prediction accuracy. This research work introduces the Support Vector Machines for web page prediction. The advantages of using support vector machines is that it offers most robust and accurate classification due to their generalized properties with its solid theoretical foundation and proven effectiveness. Web contains enormous amount of data and web data increases exponentially but the training time for Support vector machine is very large. That is, SVM’s suffer from a widely recognized scalability problem in both memory requirement and computation time when the input dataset is too large. To address this, I aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). So proposed approach will solve the scalability problem of present SVM algorithm. Keywords:Web Page Prediction, Support Vector Machines, Hadoop, MapReduce, HDFS

International Institute for Science, Technology and Education (IISTE): E-Journals

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

A Comparative Study of Artificial Neural Network and Genetic Algorithm in Search Engine Optimization

Author: Mohamad Madon Mizani
Mohd. Yasin Suhaila
Publication venue: 'Penerbit UTHM'
Publication date: 25/05/2023
Field of study

Search engine optimization applies search principles in search engines to assign a higher ranking to the most suitable webpage.  Nowadays, information searching is done ubiquitously on the World Wide Web with the help of search engines. However, the process needs to be efficient and produces accurate results at the same time. In this research, the objectives are to implement and evaluate the Artificial Neural Network and Genetic Algorithms. The accuracy result for both algorithms is compared by implementing keyword ranking, Search Engine Result Page visibility and time retrieval for document-based and e-commerce websites. To achieve them, firstly the problem and data are defined. Next, two datasets are imported from Kaggle and transformed into a more useful format. Then, the Artificial Neural Network and Genetic Algorithms are implemented on these datasets in Python using Jupyter Notebook tools. Subsequently, the accuracy of keyword ranking, Search Engine Result Page visibility and time retrieval for these datasets are observed based on the output and graph displayed. Lastly, an analysis of the results is performed. Conclusively, the Genetic Algorithm demonstrates a higher percentage of accuracy results than Artificial Neural Network algorithm in keyword ranking and SERP visibility. However, the accuracy results of time retrieval are vice versa. The results in Genetic Algorithm shows 9.0%, 9.0% and 3.0% in e-commerce dataset for keyword ranking and 4.0%, 51.0% and 1.0% in document-based dataset for SERP visibility. Next, Artificial Neural Network algorithm shows result 8.0%, 7.0% and 7.0% in e-commerce dataset and 3.0%, 50.0% and 4.0% in document-based dataset for time retrieval. Therefore, the results validated the ability of the Genetic Algorithm as one of the most applied algorithms in the search engine optimization field

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Recommended from our members

Enhancing recall and precision of web search using genetic algorithm

Author: Al-Dallal Ammar Sami
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2012
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Due to rapid growth of the number of Web pages, web users encounter two main problems, namely: many of the retrieved documents are not related to the user query which is called low precision, and many of relevant documents have not been retrieved yet which is called low recall. Information Retrieval (IR) is an essential and useful technique for Web search; thus, different approaches and techniques are developed. Because of its parallel mechanism with high-dimensional space, Genetic Algorithm (GA) has been adopted to solve many of optimization problems where IR is one of them. This thesis proposes searching model which is based on GA to retrieve HTML documents. This model is called IR Using GA or IRUGA. It is composed of two main units. The first unit is the document indexing unit to index the HTML documents. The second unit is the GA mechanism which applies selection, crossover, and mutation operators to produce the final result, while specially designed fitness function is applied to evaluate the documents. The performance of IRUGA is investigated using the speed of convergence of the retrieval process, precision at rank N, recall at rank N, and precision at recall N. In addition, the proposed fitness function is compared experimentally with Okapi-BM25 function and Bayesian inference network model function. Moreover, IRUGA is compared with traditional IR using the same fitness function to examine the performance in terms of time required by each technique to retrieve the documents. The new techniques developed for document representation, the GA operators and the fitness function managed to achieves an improvement over 90% for the recall and precision measures. And the relevance of the retrieved document is much higher than that retrieved by the other models. Moreover, a massive comparison of techniques applied to GA operators is performed by highlighting the strengths and weaknesses of each existing technique of GA operators. Overall, IRUGA is a promising technique in Web search domain that provides a high quality search results in terms of recall and precision

Brunel University Research Archive

A matter of words: NLP for quality evaluation of Wikipedia medical articles

Author: B Stvilia
DMW Powers
E Marzini
F Cabitza
G Pasi
K Wecel
K Wu
M Hall
NV Chawla
O Bodenreider
SA Azer
TL Saaty
TM Cover
Publication venue
Publication date: 01/01/2016
Field of study

Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Artificial Immune Systems - Models, algorithms and applications

Author: Abbod MF
Al-Enezi JR
Alsharhan S
Publication venue: Academic Research Publishing Agency
Publication date: 01/01/2010
Field of study

Copyright © 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun

Brunel University Research Archive

Weed/Plant Classification Using Evolutionary Optimised Ensemble Based On Local Binary Patterns

Author: Lease Basil Andy
Publication venue: Curtin University
Publication date: 01/01/2022
Field of study

This thesis presents a novel pixel-level weed classification through rotation-invariant uniform local binary pattern (LBP) features for precision weed control. Based on two-level optimisation structure; First, Genetic Algorithm (GA) optimisation to select the best rotation-invariant uniform LBP configurations; Second, Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in the Neural Network (NN) ensemble to select the best combinations of voting weights of the predicted outcome for each classifier. The model obtained 87.9% accuracy in CWFID public benchmark

espace@Curtin