47 research outputs found
Evaluation of Differential Evolution Algorithm with Various Mutation Strategies for Clustering Problems
Evolutionary Algorithms (EAs) based pattern recognition has emerged as an alternative solution to data analysis problems to enhance the efficiency and accuracy of mining processes. Differential Evolution (DE) is one rival and powerful instance of EAs, and DE has been successfully used for cluster analysis in recent years. Mutation strategy, one of the main processes of DE, uses scaled differences of individuals that are chosen randomly from the population to generate a mutant (trial) vector. The achievement of the DE algorithm for solving optimization problems highly relies on an adopted mutation strategy. In this paper, an empirical study was presented to investigate the effectiveness of six frequently used mutation strategies for solving clustering problems. The experimental tests were conducted on the most widely used data set for EAs based clustering, and the quality of cluster solutions and convergence characteristics of DE variants were evaluated. The obtained results pointed out that the mutation strategies that use the guidance information from the best solution mange to find more stable results whereas the random mutation strategies are able to find high quality solutions with slower convergence rate. This study aims to provide some information and insights to develop better DE mutation schemes for clustering
Developing Multi-level Marketing System using Indexing Mechanism
Multi-level Marketing (MLM) system hasbecome very popular in international marketplace because of itsindependent distributors development system and commissionpayment method. The nature of Multi-level Marketing system isthat parent company has a lot of independent distributors andthere also have a lot of down-line distributors of independentdistributors. Indexes are used to speed up the retrieval ofrecords in response to certain such conditions. Unpredictabledown-line numbers can be growth in MLM, therefore, it willtake tie and a lot of space in memory to find the member pointsfor each member and down-lines. An index is thus a special kindof stored file, in which each entry consists of precisely twovalues, a data value and a pointer. The data value is a value forsome field of the indexed file, and the pointer identifies a recordof that file that has that value for that field. It features and stepsare the most suitable to use in Multi-level Marketing syste
Web-Based Rice Disease Diagnosis System Using Case-Based Reasoning
Computer based methods are increasingly usedto improve the quality of making decisions in many areas.Decision Support systems use knowledge, facts and reasoningtechniques to solve problems that normally require theexpertise, experiences and abilities of human experts. Thissystem will present the implementation of web-based ricedisease diagnostic system using case-based reasoningtechnique. Although there are many approach for manyapproach for classification in data mining, Case BaseReasoning(CBR) is used for testing rice diagnosis. In this ricediagnosis are taken as the cases of Case Base. Moreover, cosinesimilarity algorithm is used for measuring the similar cases
Home Decoration Service Provider Selection Using Analytic Network Process (ANP)
Nowadays, everybody is facing manyproblems with making decisions and futureplanning of global and regional works all over theworld. The analytic network process (ANP) is oneof the most widely used multiple criteria decisionmaking (MCDM) methods. The ANP is highlyrecommended because ANP allows interdependentinfluences specified in the model. In many realworld cases, there is interdependence and feedbackamong the elements and alternatives. The ANP is auseful tool for prediction and representing avariety of competitors with their surmisedinteractions and their relative strengths to wieldinfluence in making a decision. Feedback anddependence improve the priorities derived fromjudgments and make prediction more accurate. TheAnalytic Network Process (ANP) technique hasbeen developed for service provider selection inhome decoration. Without a proper and accuratemethod for selecting the most appropriatecontractor, the performance of the project will beaffected. The ANP can lead to real life answers thatare matched by actual measurements in the realworld
Error Detection Of HTML Code Using Pushdown Automata (PDA)
Pushdown Automata (PDA) is one of the hierarchies of automata theory models appropriate for the design of compiler. This system implemented to detect the compile errors of the HTML program when basic beginners who write the HTML program in Notepad run this program in web browser such as Microsoft Internet Explorer, Netscape, Mozilla Firefox and so on. Now, basic beginners don’t know their errors after compiling HTML program. This thesis detects the compile errors by using the Pushdown Automata (PDA). The Pushdown Automata (PDA) uses the stack which can pop and push off the information. The advent of such system will provide the beginners with a guideline through which the beginners can see their errors of the HTML program. This system intends the basic beginners for teaching aids
Adaptive Duplicate Detection in XML Document Based on Hash Function
The task of detecting duplicate records thatrepresents the same real world object in multipledata sources, commonly known as duplicatedetection and it is relevant in data cleaning anddata integration applications. Numerous approachesboth for duplicate detection in relational and XMLdata exist. As XML becomes increasingly popularfor data representation, algorithms to detectduplicates in XML documents are required.Previous domain independent solutions to thisproblem relied on standard textual similarityfunctions (e.g., edit distance, cosine metric) betweenobjects. However, such approaches result in largenumbers of false positives if we want to identifydomain-specific abbreviations and conventions.In this paper, we present a generalizedframework for duplicate detection, specialized toXML. The aim of this research is to develop anefficient algorithm for detecting duplicate incomplex XML documents and to reduce number offalse positive by using hash function algorithm
Text Normalization and Classification System for Internet Forum
Internet forum is one of the most common modes of knowledge sharing through text. An internet forum is an online discussion site. From a technical point of view, forums are web applications managing user-generated text contents. Text normalization is converting „informally inputted‟ text into the canonical form, by eliminating „noises‟ in the text and detecting paragraph and sentence boundaries in the text and take case restoration and suggest valid words for each invalid word in the text by using dictionary. Text classification is the process of grouping text item into related predefined classes or categories to make it easier for the user to find it. The system intends to normalize and classify the internet forum. For text normalization, Cascaded approach is used and for classification Naïve Bayes (NB) method is used. In the system, hold out method is used to evaluate the system‟s performance
Clustering Technique based on Concept Weight to Text Documents
Documents clustering become an essentialtechnology with the popularity of the Internet.That also means that fast and high-qualitydocument clustering technique play core topics.Text clustering or shortly clustering is aboutdiscovering semantically related groups in anunstructured collection of documents. Clusteringhas been very popular for a long time because itprovides unique ways of digesting andgeneralizing large amounts of information. Oneof the issue of clustering is to extract properfeature (terms) of a problem domain. Theexisting clustering technology mainly focuses onterm weight calculation. To achieve moreaccurate document clustering, more informativefeatures including concept weight are important.Feature Selection is important for clusteringprocess because some of the irrelevant orredundant feature may misguide the clusteringresults. To counteract this issue, the proposedsystem uses the concept weight for clustering inaccordance with the principles of ontology. To acertain extent, it has resolved the semanticproblem in specific areas
Multi-category Classification of Web Pages by using Random Forest Classifier
To classify Web objects into predefined semanticstructure is called the Web Page classification. Oneof the most essential technique for Web Mining isthe automatic web page classification given that theweb is a huge repository of various informationincluding images, videos etc. And there is a need forcategorization web pages to satisfy user needs. Theclassification of web pages into each categoryexclusively relies on man power which cost muchtime and effort. To alleviate this manuallyclassification problem, more researchers focus onthe issue of web pages classification technology. Inthis paper, we proposed Random Forest Classifier(RF) based on random forest method for multicategoryweb page classification. The proposed RFclassifier can classify web pages efficientlyaccording to their corresponding class without usingother feature selection methods. We compared theaccuracy of the proposed approach to decision treeclassifier using in the same Yahoo web pages. Theexperiments have shown that the proposed approachis suitable for the multi-category web pageclassification
Automatic Extraction of Data Record from Web Page based on Visual Features
The Web is increasingly becoming a verylarge information source. However, theinformation is visually structured such that it iseasy for humans to recognize data records andpresentation patterns, but not for computers. Asweb sites are getting more complicated, theconstruction of web information extractionsystem becomes more troublesome and timeconsuming.Hence, tools for the mining of dataregions, data records and data items need to bedeveloped in order to provide value addedservices. Large number of techniques has beenproposed to address this problem, but all of themhave inherent limitations. In this paper, wepropose an approach for automatic data recordextraction method from web page, which we callVision based Extraction of data Record (VER).The approach is based on the observation thatvisual similarity of the data record in webdocument. Firstly, we adopt VIPS (Vision-basedPage Segmentation) algorithm to partition a webpage into semantic blocks. Then, blocks areclustered by proposed block clustering methodaccording to the appearance similarity. Amongthese clusters, we identify data region and finallyextract data record from data region