Search CORE

77 research outputs found

Negative Effects of Incentivised Viral Campaigns for Activity in Social Networks

Author: Jankowski Jarosław
Kazienko Przemysław
Michalski Radosław
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/03/2013
Field of study

Viral campaigns are crucial methods for word-of-mouth marketing in social communities. The goal of these campaigns is to encourage people for activity. The problem of incentivised and non-incentivised campaigns is studied in the paper. Based on the data collected within the real social networking site both approaches were compared. The experimental results revealed that a highly motivated campaign not necessarily provides better results due to overlapping effect. Additional studies have shown that the behaviour of individual community members in the campaign based on their service profile can be predicted but the classification accuracy may be limited.Comment: In proceedings of the 2nd International Conference on Social Computing and its Applications, SCA 201

arXiv.org e-Print Archive

Crossref

Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

Author: Hilario Melanie
Kalousis Alexandros
Nguyen Phong
Wang Jun
Publication venue
Publication date: 04/10/2012
Field of study

The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements

arXiv.org e-Print Archive

Crossref

RERO DOC Digital Library

Efficient treatment of outliers and class imbalance for diabetes prediction

Author: KORKONTZELOS YANNIS
NNAMOKO NONSO
Publication venue: 'Elsevier BV'
Publication date: 30/04/2020
Field of study

Edge Hill University Research Information Repository

A Comparitive Study On Different Classification Algorithms Using Airline Dataset

Author: Prasad A. Jagdale, Deepa Abin, Dr. K. Rajeswari
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2018
Field of study

The paper presents comparison of five differentclassification algorithms performed on airline dataset to find out best accuracy. Here the used dataset is consist of 250 different type of records of airline information consisting flight timing, cities, days of weeks, delay timing.Classification algorithms are performed on the data set and the best result is given by Random Forest

International Journal on Recent and Innovation Trends in Computing and Communication

Separation of pulsar signals from noise with supervised machine learning algorithms

Author: Bethapudi Suryarao
Desai Shantanu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computin

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

A Review on Advanced Decision Trees for Efficient & Effective k-NN Classification

Author: Ms. Madhavi Pujari, Mr. Chetan Awati, Ms. Sonam Kharade
Publication venue: Auricle Global Society of Education and Research
Publication date: 26/02/2018
Field of study

K Nearest Neighbor (KNN) strategy is a notable classification strategy in data mining and estimations in light of its direct execution and colossal arrangement execution. In any case, it is outlandish for ordinary KNN strategies to select settled k esteem to all tests. Past courses of action assign different k esteems to different test tests by the cross endorsement strategy however are typically tedious. This work proposes new KNN strategies, first is a KTree strategy to learn unique k esteems for different test or new cases, by including a training arrange in the KNN classification. This work additionally proposes a change rendition of KTree technique called K*Tree to speed its test organize by putting additional data of the training tests in the leaf node of KTree, for example, the training tests situated in the leaf node, their KNNs, and the closest neighbor of these KNNs. K*Tree, which empowers to lead KNN arrangement utilizing a subset of the training tests in the leaf node instead of all training tests utilized in the recently KNN techniques. This really reduces the cost of test organize

International Journal on Future Revolution in Computer Science & Communication Engineering

Forests of Stumps

Author: Alharthi Amirah
Taylor Charles C.
Voss Jochen
Publication venue
Publication date: 26/05/2021
Field of study

Many numerical studies (Hansen and Salamon (1990), Schapire (1990)) indicate that bagged decision stumps perform more accurately than a single stump. In this work, we will investigate two approaches to create a forest of stumps for classification. The first method is bagging with stumps, that is growing a stump on different bootstrap sample size drawn from the training dataset. The second method is Gini-sampled stumps, where we sample split points with probability proportional to the Gini index. These two methods are combined with two aggregation methods: Majority vote and weighted vote. We use simulation studies to compare the performance and consumed time for these two methods. The computing time of generating split points by Gini-sampled stumps is less than half of the time needed to generate split points from bootstrap samples. Also, weighted vote aggregation results in more accurate performance than majority vote aggregation

KITopen

Analysis of group evolution prediction in complex networks

Author: Bródka Piotr
Kazienko Przemysław
Koziarski Michał
Saganowski Stanisław
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the recently discovered groups. The precise GEP modularity enabled us to carry out extensive and versatile empirical studies on many real-world complex / social networks to analyze the impact of numerous setups and parameters like time window type and size, group detection method, evolution chain length, prediction models, etc. Additionally, many new predictive features reflecting the group state at a given time have been identified and tested. Some other research problems like enriching learning evolution chains with external data have been analyzed as well

arXiv.org e-Print Archive

Directory of Open Access Journals

Feature Selection and Generalisation for Retrieval of Textual Cases

Author: G. Sakkis
G. Salton
J. Jarmulak
M. Lenz
M. Lenz
S. Das
T. Mitchell
Publication venue: Proceeding of the 7-th European Conference on Case-Based Reasoning. Lecture Notes in Artificial Intelligence
Publication date: 01/01/2004
Field of study

Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a signiﬁcant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show signiﬁcant improvement in retrieval accuracy whenever gener¬alised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination

Research at Sofia University

CiteSeerX

Crossref