77 research outputs found

    Negative Effects of Incentivised Viral Campaigns for Activity in Social Networks

    Full text link
    Viral campaigns are crucial methods for word-of-mouth marketing in social communities. The goal of these campaigns is to encourage people for activity. The problem of incentivised and non-incentivised campaigns is studied in the paper. Based on the data collected within the real social networking site both approaches were compared. The experimental results revealed that a highly motivated campaign not necessarily provides better results due to overlapping effect. Additional studies have shown that the behaviour of individual community members in the campaign based on their service profile can be predicted but the classification accuracy may be limited.Comment: In proceedings of the 2nd International Conference on Social Computing and its Applications, SCA 201

    Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

    Get PDF
    The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements

    A Comparitive Study On Different Classification Algorithms Using Airline Dataset

    Get PDF
    The paper presents comparison of five differentclassification algorithms performed on airline dataset to find out best accuracy. Here the used dataset is consist of 250 different type of records of airline information consisting flight timing, cities, days of weeks, delay timing.Classification algorithms are performed on the data set and the best result is given by Random Forest

    Separation of pulsar signals from noise with supervised machine learning algorithms

    Full text link
    We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computin

    A Review on Advanced Decision Trees for Efficient & Effective k-NN Classification

    Get PDF
    K Nearest Neighbor (KNN) strategy is a notable classification strategy in data mining and estimations in light of its direct execution and colossal arrangement execution. In any case, it is outlandish for ordinary KNN strategies to select settled k esteem to all tests. Past courses of action assign different k esteems to different test tests by the cross endorsement strategy however are typically tedious. This work proposes new KNN strategies, first is a KTree strategy to learn unique k esteems for different test or new cases, by including a training arrange in the KNN classification. This work additionally proposes a change rendition of KTree technique called K*Tree to speed its test organize by putting additional data of the training tests in the leaf node of KTree, for example, the training tests situated in the leaf node, their KNNs, and the closest neighbor of these KNNs. K*Tree, which empowers to lead KNN arrangement utilizing a subset of the training tests in the leaf node instead of all training tests utilized in the recently KNN techniques. This really reduces the cost of test organize

    Forests of Stumps

    Get PDF
    Many numerical studies (Hansen and Salamon (1990), Schapire (1990)) indicate that bagged decision stumps perform more accurately than a single stump. In this work, we will investigate two approaches to create a forest of stumps for classification. The first method is bagging with stumps, that is growing a stump on different bootstrap sample size drawn from the training dataset. The second method is Gini-sampled stumps, where we sample split points with probability proportional to the Gini index. These two methods are combined with two aggregation methods: Majority vote and weighted vote. We use simulation studies to compare the performance and consumed time for these two methods. The computing time of generating split points by Gini-sampled stumps is less than half of the time needed to generate split points from bootstrap samples. Also, weighted vote aggregation results in more accurate performance than majority vote aggregation

    Analysis of group evolution prediction in complex networks

    Full text link
    In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the recently discovered groups. The precise GEP modularity enabled us to carry out extensive and versatile empirical studies on many real-world complex / social networks to analyze the impact of numerous setups and parameters like time window type and size, group detection method, evolution chain length, prediction models, etc. Additionally, many new predictive features reflecting the group state at a given time have been identified and tested. Some other research problems like enriching learning evolution chains with external data have been analyzed as well

    Feature Selection and Generalisation for Retrieval of Textual Cases

    Get PDF
    Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a significant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show significant improvement in retrieval accuracy whenever gener¬alised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination
    corecore