Search CORE

222,481 research outputs found

Complex graph stream mining

Author: Pan S
Publication venue
Publication date: 01/01/2015
Field of study

University of Technology Sydney. Faculty of Engineering and Information Technology.Recent years have witnessed a dramatic increase of information due to the ever development of modern technologies. The large scale of information makes data analysis, particularly data mining and knowledge discovery tasks, unprecedentedly challenging. First, data is becoming more and more interconnected. In a variety of domains such as social networks, chemical compounds, and XML documents, data is no longer represented by a flat table with instance-feature format, but exhibits complex structures indicating dependency relationships. Second, data is evolving more and more dynamically. Emerging applications such as social networks continuously generate information over time. Third, the learning tasks in many real-life applications become more and more complicated in that there are various constraints on the number of labelled data, class distributions, misclassification costs, or the number of learning tasks etc. Considering the above challenges, this research aims to investigate theoretical foundations, study new algorithm designs and system frameworks to enable the mining of complex graph streams from three aspects, including (1) Correlated Graph Stream Mining, (2) Graph Stream Classifications, and (3) Complex Task Graph Classification. In particular, correlated graph stream mining intends to carry out structured pattern search and support the query of similar graphs from a graph stream. Due to the dynamic changing nature of the streaming data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. Therefore, we proposed a novel algorithm, CGStream, to identify correlated graphs from a data stream, by using a sliding window, which covers a number of consecutive batches of stream data records. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithms. Graph stream classification aims to build effective and efficient classification models for graph streams with continuous growing volumes and dynamic changes. We proposed two methods for complex graph stream classification. Due to the inherent complexity of graph structure, labelling graph data is very expensive. To solve this problem, we proposed a gLSU algorithm, which aims to select discriminative subgraph features with minimum redundancy by using both labelled and unlabelled graphs for graph streams. The second approach handles graph streams with imbalanced class distributions and noise. Both frameworks use an instance weighting scheme to capture the underlying concept drifts of graph streams and achieve significant performance gain on benchmark graph streams. Complex task graph classification aims to address the graph classification problems with complex constraints. We studied two complex task graph classification problems, cost-sensitive graph classification of large-scale graphs and multi-task graph classification. As in medical diagnosis the misclassification cost/risk for different classes is inherently different and large scale graph classification is highly demanded in real-life applications, we proposed a CogBoost algorithm for cost-sensitive classification of large scale graphs. To overcome the limitation of insufficient labelled graphs for a specific learning task, we further proposed effective algorithms to leverage multiple graph learning tasks to select subgraph features and regularize multiple tasks to achieve better generalization performance for all learning tasks

OPUS - University of Technology Sydney

Rain Fall Prediction using Ada Boost Machine Learning Ensemble Algorithm

Author: Dr. P. Senthil Kumar
M. Naga Swathi
Publication venue: Jamal Mohamed College Publication
Publication date: 24/07/2023
Field of study

Every government takes initiative for the well-being of their citizens in terms of environment and climate in which they live. Global warming is one of the reason for climate change. With the help of machine learning algorithms in the flash light of Artificial Intelligence and Data Mining techniques, weather predictions not only rainfall, lightings, thunder outbreaks, etc. can be predicted. Management of water reservoirs, flooding, traffic - control in smart cities, sewer system functioning and agricultural production are the hydro-meteorological factors that affect human life very drastically. Due to dynamic nature of atmosphere, existing Statistical techniques (Support Vector Machine (SVM), Decision Tree (DT) and logistic regression (LR)) fail to provide good accuracy for rainfall forecasting. Different weather features (Temperature, Relative Humidity, Dew Point, Solar Radiation and Precipitable Water Vapour) are extracted for rainfall prediction. In this research work, data analysis using machine learning ensemble algorithm like Adaptive Boosting (Ada Boost) is proposed. Dataset used for this classification application is taken from hydrological department, India from 1901-2015. Overall, proposed algorithm is feasible to be used in order to qualitatively predict rainfall with the help of R tool and Ada Boost algorithm. Accuracy rate and error false rates are compared with the existing Support Vector Machine (SVM) algorithm and the proposed one gives the better result

Journal of Advanced Applied Scientific Research (JOAASR)

Analisis Perbandingan Algoritma Svm Dan Knn Untuk Klasifikasi Anime Bergenre Drama

Author: Mulyana Dadang Iskandar
Pramansah Vika Vitaloka
Silfia Titi
Publication venue: 'Universitas Janabadra'
Publication date: 31/05/2022
Field of study

There are many genres of anime such as drama, action, romance, comedy, and so on. However, because there are so many anime genres, it is quite difficult for viewers to find anime whose genre they like, such as the drama genre which tells about everyday human life which is quite light in nature. From these problems, a classification method is needed to classify anime that belongs to the drama genre. Classification has several algorithms including Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). SVM and KNN algorithms have been widely used and have a good level of accuracy. In this study, a comparative analysis will be carried out between the two algorithms, the dataset used is 12,294 data and 2 genre classes, namely drama and non-drama, the attribute of the anime dataset is 7. The results obtained in this study indicate that the K-Nearest Neighbors Algorithm (KNN) ) get a training accuracy value of 100% and a test accuracy value of 84%. And also the Support Vector Machine (SVM) algorithm gets a training accuracy value of 83% and a test accuracy value of 82%. The results of the accuracy values of the two algorithms indicate that the K-Nearest Neighbors (KNN) algorithm has a better testing accuracy than the Support Vector Machine (SVM) with a fairly thin difference between the two algorithms

E-Journal Universitas Janabadra

Cost-Sensitive Classification Methods for the Detection of Smuggled Nuclear Material in Cargo Containers

Author: Webster Jennifer B
Publication venue
Publication date: 16/12/2013
Field of study

Classification problems arise in so many different parts of life – from sorting machine parts to diagnosing a disease. Humans make these classifications utilizing vast amounts of data, filtering observations for useful information, and then making a decision based on a subjective level of cost/risk of classifying objects incorrectly. This study investigates the translation of the human decision process into a mathematical problem in the context of a border security problem: How does one find special nuclear material being smuggled inside large cargo crates while balancing the cost of invasively searching suspect containers against the risk of al lowing radioactive material to escape detection? This may be phrased as a classification problem in which one classifies cargo containers into two categories – those containing a smuggled source and those containing only innocuous cargo. This task presents numerous challenges, e.g., the stochastic nature of radiation and the low signal-to-noise ratio caused by background radiation and cargo shielding. In the course of this work, we will break the analysis of this problem into three major sections – the development of an optimal decision rule, the choice of most useful measurements or features, and the sensitivity of developed algorithms to physical variations. This will include an examination of how accounting for the cost/risk of a decision affects the formulation of our classification problem. Ultimately, a support vector machine (SVM) framework with F -score feature selection will be developed to provide nearly optimal classification given a constraint on the reliability of detection provided by our algorithm. In particular, this can decrease the fraction of false positives by an order of magnitude over current methods. The proposed method also takes into account the relationship between measurements, whereas current methods deal with detectors independently of one another

Texas A&M Repository

Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

Author: Murtagh Fionn
Publication venue
Publication date: 01/01/2008
Field of study

The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.Comment: 26 page

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

De Montfort University Open Research Archive

Training a Feed-forward Neural Network with Artificial Bee Colony Based Backpropagation Method

Author: Das Achintya
Nandy Sudarshan
Sarkar Partha Pratim
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 12/09/2012
Field of study

Back-propagation algorithm is one of the most widely used and popular techniques to optimize the feed forward neural network training. Nature inspired meta-heuristic algorithms also provide derivative-free solution to optimize complex problem. Artificial bee colony algorithm is a nature inspired meta-heuristic algorithm, mimicking the foraging or food source searching behaviour of bees in a bee colony and this algorithm is implemented in several applications for an improved optimized outcome. The proposed method in this paper includes an improved artificial bee colony algorithm based back-propagation neural network training method for fast and improved convergence rate of the hybrid neural network learning method. The result is analysed with the genetic algorithm based back-propagation method, and it is another hybridized procedure of its kind. Analysis is performed over standard data sets, reflecting the light of efficiency of proposed method in terms of convergence speed and rate.Comment: 14 Pages, 11 figure

arXiv.org e-Print Archive

Crossref

Predicting customer's gender and age depending on mobile phone data

Author: Aljoumaa Kadan
AlZuabi Ibrahim Mousa
Jafar Assef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

In the age of data driven solution, the customer demographic attributes, such as gender and age, play a core role that may enable companies to enhance the offers of their services and target the right customer in the right time and place. In the marketing campaign, the companies want to target the real user of the GSM (global system for mobile communications), not the line owner. Where sometimes they may not be the same. This work proposes a method that predicts users' gender and age based on their behavior, services and contract information. We used call detail records (CDRs), customer relationship management (CRM) and billing information as a data source to analyze telecom customer behavior, and applied different types of machine learning algorithms to provide marketing campaigns with more accurate information about customer demographic attributes. This model is built using reliable data set of 18,000 users provided by SyriaTel Telecom Company, for training and testing. The model applied by using big data technology and achieved 85.6% accuracy in terms of user gender prediction and 65.5% of user age prediction. The main contribution of this work is the improvement in the accuracy in terms of user gender prediction and user age prediction based on mobile phone data and end-to-end solution that approaches customer data from multiple aspects in the telecom domain

arXiv.org e-Print Archive

Directory of Open Access Journals

Comparative Evaluation of Packet Classification Algorithms for Implementation on Resource Constrained Systems

Author: Alessio E.
Baldi Mario
Degioanni L.
Risso Fulvio Giovanni Ottavio
Stirano F.
Varenni G.
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

This paper provides a comparative evaluation of a number of known classification algorithms that have been considered for both software and hardware implementation. Differently from other sources, the comparison has been carried out on implementations based on the same principles and design choices. Performance measurements are obtained by feeding the implemented classifiers with various traffic traces in the same test scenario. The comparison also takes into account implementation feasibility of the considered algorithms in resource constrained systems (e.g. embedded processors on special purpose network platforms). In particular, the comparison focuses on achieving a good compromise between performance, memory usage, flexibility and code portability to different target platforms

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino