33 research outputs found

    Time-Series Link Prediction Using Support Vector Machines

    Get PDF
    The prominence of social networks motivates developments in network analysis, such as link prediction, which deals with predicting the existence or emergence of links on a given network. The Vector Auto Regression (VAR) technique has been shown to be one of the best for time-series based link prediction. One VAR technique implementation uses an unweighted adjacency matrix and five additional matrices based on the similarity metrics of Common Neighbor, Adamic-Adar, Jaccard’s Coefficient, Preferential Attachment and Research Allocation Index. In our previous work, we proposed the use of the Support Vector Machines (SVM) for such prediction task, and, using the same set of matrices, we gained better results. A dataset from DBLP was used to test the performance of the VAR and SVM link prediction models for two lags. In this study, we extended the VAR and SVM models by using three, four, and five lags, and these showed that both VAR and SVM improved with more data from the lags. The VAR and SVM models achieved their highest ROC-AUC values of 84.96% and 86.32% respectively using five lags compared to lower AUC values of 84.26% and 84.98% using two lags. Moreover, we identified that improving the predictive abilities of both models is constrained by the difficulty in the prediction of new links, which we define as links that do not exist in any of the corresponding lags. Hence, we created separate VAR and SVM models for the prediction of new links. The highest ROC-AUC was still achieved by using SVM with five lags, although at a lower value of 73.85%. The significant drop in the performance of VAR and SVM predictors for the prediction of new links indicate the need for more research in this problem space. Moreover, results showed that SVM can be used as an alternative method for time-series based link prediction

    NBP 2.0: Updated Next Bar Predictor, an Improved Algorithmic Music Generator

    Get PDF
    Deep neural network advancements have enabled machines to produce melodies emulating human-composed music. However, the implementation of such machines is costly in terms of resources. In this paper, we present NBP 2.0, a refinement of the previous model next bar predictor (NBP) with two notable improvements: first, transforming each training instance to anchor all the notes to its musical scale, and second, changing the model architecture itself. NBP 2.0 maintained its straightforward and lightweight implementation, which is an advantage over the baseline models. Improvements were assessed using quantitative and qualitative metrics and, based on the results, the improvements from these changes made are notable

    Link Prediction in a Weighted Network Using Support Vector Machine

    Get PDF
    Link prediction is a field under network analysis that deals with the existence or emergence of links. In this study, we investigate the effect of using weighted networks for two link prediction techniques, which are the Vector Auto Regression (VAR) technique and our proposed modified VAR that uses Support Vector Machine (SVM). Using a co-authorship network from DBLP as the dataset and the Area Under the Receiver Operating Curve (AUC-ROC) as the fitness metric, the results show that the performance of both VAR and SVM are surprisingly lower in the weighted network than in the unweighted network. In an attempt to improve the results in the weighted network, we incorporated features from the unweighted network into the features of the weighted network. This enhancement improved the performance of both VAR and SVM, but the results are still inferior to those in the unweighted networks. We identified that the true positive rate was generally lower in the weighted network, thus resulting to a lower AUC

    Improving the vector auto regression technique for time-series link prediction by using support vector machine

    Get PDF
    Predicting links between the nodes of a graph has become an important Data Mining task because of its direct applications to biology, social networking, communication surveillance, and other domains. Recent literature in time-series link prediction has shown that the Vector Auto Regression (VAR) technique is one of the most accurate for this problem. In this study, we apply Support Vector Machine (SVM) to improve the VAR technique that uses an unweighted adjacency matrix along with 5 matrices: Common Neighbor (CN), Adamic-Adar (AA), Jaccard’s Coefficient (JC), Preferential Attachment (PA), and Research Allocation Index (RA). A DBLP dataset covering the years from 2003 until 2013 was collected and transformed into time-sliced graph representations. The appropriate matrices were computed from these graphs, mapped to the feature space, and then used to build baseline VAR models with lag of 2 and some corresponding SVM classifiers. Using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) as the main fitness metric, the average result of 82.04% for the VAR was improved to 84.78% with SVM. Additional experiments to handle the highly imbalanced dataset by oversampling with SMOTE and undersampling with K-means clusters, however, did not improve the average AUC-ROC of the baseline SVM

    Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents

    Get PDF
    Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an ANN with a small dataset to accurately classify whether Filipino call center agents’ pronunciations are neutral or not based on their employer’s standards. Isolated utterances of the ten most commonly used words in the call center were recorded from eleven agents creating a dataset of 110 utterances. Two learning specialists were consulted to establish ground truths and Cohen’s Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation. Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score of 0.92

    Troika Generative Adversarial Network (T-GAN): A Synthetic Image Generator That Improves Neural Network Training for Handwriting Classification

    Get PDF
    Training an artificial neural network for handwriting classification requires a sufficiently sized annotated dataset in order to avoid overfitting. In the absence of sufficient instances, data augmentation techniques are normally considered. In this paper, we propose the troika generative adversarial network (T-GAN) for data augmentation to address the scarcity of publicly labeled handwriting datasets. T-GAN has three generator subnetworks architectured to have some weight-sharing in order to learn the joint distribution from three specific domains. We used T-GAN to augment the data from a subset of the IAM Handwriting Database. We then compared this with other data augmentation techniques by measuring the improvements brought by each technique to the handwriting classification accuracies in three types of artificial neural networks (ANNs): deep ANN, convolutional neural network (CNN), and deep CNN. The data augmentation technique involving the T-GAN yielded the highest accuracy improvements in each of the three ANN classifier types – outperforming the standard techniques of image rotation, affine transformation, and combination of these two – as well as the technique that uses another GAN-based model, the coupled GAN (CoGAN). Furthermore, a paired t-test between the 10-fold cross-validation results of the T-GAN and CoGAN, the second-best augmentation technique in this study, on a deep CNN-made classifier confirmed the superiority of the data augmentation technique that uses the T-GAN. Finally, when the generated synthetic data instances from the T-GAN were further enhanced using the pepper noise removal and median filter, the classification accuracy of the trained CNN and deep CNN classifiers were further improved to 93.54% and 95.45%, respectively. Each of these is a big improvement from the original accuracies of 67.43% and 68.32%, respectively of the 2 classifiers trained on the original unaugmented dataset. Thus, data augmentation using T-GAN – coupled with the mentioned two image noise removal techniques – can be a preferred pre-training technique for augmenting handwriting datasets with insufficient data samples
    corecore