Search CORE

11 research outputs found

A SimRank based Ensemble Method for Resolving Challenges of Partition Clustering Methods

Author: Patibandla R S M Lakshmi
Veeranjaneyulu N
Publication venue: NISCAIR-CSIR, India
Publication date: 01/04/2020
Field of study

323–327Traditional clustering techniques alone cannot resolve all challenges of partition-based clustering methods. In the partition based clustering, particularly in variants of K-means, initial cluster centre selection is a significant and crucial point. The dependency of final cluster is totally based on initial cluster centres; hence, this process is delineated to be most significant in the entire clustering operation. The random selection of initial cluster centres is unstable, since different cluster centre points are achieved during each run of the algorithm. Ensemble based clustering methods resolve challenges of partition-based methods. The clustering ensembles join several partitions generated by different clustering algorithms into a single clustering solution. The proposed ensemble methodology resolves initial centroid problems and improves the efficiency of cluster results. This method finds centroid selection through overall mean distance measure. The SimRank based similarity matrix find that the bipartite graph helps to ensemble

NOPR

Network problems detection and classification by analyzing syslog data

Author: Jarghon Fidaa A. M.
Publication venue
Publication date: 01/01/2016
Field of study

Network troubleshooting is an important process which has a wide research field. The first step in troubleshooting procedures is to collect information in order to diagnose the problems. Syslog messages which are sent by almost all network devices contain a massive amount of data related to the network problems. It is found that in many studies conducted previously, analyzing syslog data which can be a guideline for network problems and their causes was used. Detecting network problems could be more efficient if the detected problems have been classified in terms of network layers. Classifying syslog data needs to identify the syslog messages that describe the network problems for each layer, taking into account the different formats of various syslog for vendors’ devices. This study provides a method to classify syslog messages that indicates the network problem in terms of network layers. The method used data mining tool to classify the syslog messages while the description part of the syslog message was used for classification process. Related syslog messages were identified; features were then selected to train the classifiers. Six classification algorithms were learned; LibSVM, SMO, KNN, Naïve Bayes, J48, and Random Forest. A real data set which was obtained from the Universiti Utara Malaysia’s (UUM) network devices is used for the prediction stage. Results indicate that SVM shows the best performance during the training and prediction stages. This study contributes to the field of network troubleshooting, and the field of text data classification

Universiti Utara Malaysia: UUM eTheses

Comparative analysis of classification techniques for network fault management

Author: Almomani Omar
Fazea Yousef
Jarghon Fidaa
Madi Mohammed Kamel M.
Saaidah Adeeb Al
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2020
Field of study

Network troubleshooting is a significant process. Many studies were conducted about it. The first step in the troubleshooting procedures is represented in collecting information. It's collected in order to identify the problems. Syslog messages which are sent by almost all network devices include a massive amount of data that concern the network problems. Based on several studies, it was found that analyzing syslog data (which) can be a guideline for network problems and their causes. The detection of network problems can become more efficient if the detected problems have been classified based on the network layers. Classifying syslog data requires identifying the syslog messages that describe the network problems for each layer. It also requires taking into account the formats of syslog for vendors' devices. The present study aimed to propose a method for classifying the syslog messages which identify the network problem.This classification is conducted based on the network layers. This method uses data mining instrument to classify the syslog messages. The description part of the syslog message was used for carrying out the classification process.The relevant syslog messages were identified. The features were then selected to train the classifiers. Six classification algorithms were learned; LibSVM, SMO, KNN, Naïve Bayes, J48, and Random Forest. A real data set was obtained from an educational network device. This dataset was used for the prediction stage. It was found that that LibSVM outperforms other classifiers in terms of the probability rate of the classified instances where it was in the range of 89.90%-32.80%. Furthermore, the validation results indicate that the probability rate of the correctly classified instances is >70%. © 2020 Turkiye Klinikleri. All rights reserved

DSpace@HKU

A web content mining application for detecting relevant pages using Jaccard similarity

Author: Jalal Ahmed Adeeb
Jasim Abdulrahman Ahmed
Mahawish Amar A.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2022
Field of study

The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity

ZENODO

Institute of Advanced Engineering and Science

A SimRank based Ensemble Method for Resolving Challenges of Partition Clustering Methods

Author: Patibandla R S M Lakshmi
Veeranjaneyulu N
Publication venue: CSIR-National Institute of Science Communication and Policy Research (NIScPR)
Publication date: 19/11/2022
Field of study

Traditional clustering techniques alone cannot resolve all challenges of partition-based clustering methods. In the partition based clustering, particularly in variants of K-means, initial cluster centre selection is a significant and crucial point. The dependency of final cluster is totally based on initial cluster centres; hence, this process is delineated to be most significant in the entire clustering operation. The random selection of initial cluster centres is unstable, since different cluster centre points are achieved during each run of the algorithm. Ensemble based clustering methods resolve challenges of partition-based methods. The clustering ensembles join several partitions generated by different clustering algorithms into a single clustering solution. The proposed ensemble methodology resolves initial centroid problems and improves the efficiency of cluster results. This method finds centroid selection through overall mean distance measure. The SimRank based similarity matrix find that the bipartite graph helps to ensemble

Online Publishing @ NISCAIR

An experimental study for the Cross Domain Author Profiling classification

Author: Cagnina Leticia
Errecalde Marcelo Luis
Garciarena Ucelay María José
Villegas María Paula
Publication venue
Publication date: 28/12/2015
Field of study

Author Profiling is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to the potential applications in security, crime detection and marketing, among others. An interesting point is to study the robustness of a classifier when it is trained with a dataset and tested with others containing different characteristics. Commonly this is called cross domain experimentation. Although different cross domain studies have been done for datasets in English language, for Spanish it has recently begun. In this context, this work presents a study of cross domain classification for the author profiling task in Spanish. The experimental results showed that using corpora with different levels of formality we can obtain robust classifiers for the author profiling task in Spanish language.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

Author: Feierherd Guillermo Eugenio
Pesado Patricia Mabel
Russo Claudia Cecilia
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 08/02/2017
Field of study

CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

Author: Feierherd Guillermo Eugenio
Pesado Patricia Mabel
Russo Claudia Cecilia
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 01/01/2016
Field of study

Computer Science & Technology Series

Author
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 01/01/2016
Field of study

Centro de Servicios en Gestión de Información

Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

Author: Feierherd Guillermo Eugenio
Pesado Patricia Mabel
Russo Claudia Cecilia
Publication venue: 'Universidad Nacional de La Plata'
Publication date: 01/01/2016
Field of study

Servicio de Difusión de la Creación Intelectual