41 research outputs found

    Predicting protein-protein interactions as a one-class classification problem

    Get PDF
    Protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been used to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. While it is easy to get a dataset of interacting protein as positive example, there is no experimentally confirmed non-interacting protein to be considered as a negative set. Therefore, in this paper we solve this problem as a one-class classification problem using One-Class SVM (OCSVM). Using only positive examples (interacting protein pairs) for training, the OCSVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with reliable accuracy

    Using Self-labeling and Co-Training to Enhance Bots Labeling in Twitter

    Get PDF
    The rapid evolution in social bots have required efficient solutions to detect them in real-time. In fact, obtaining labeled stream datasets that contains variety of bots is essential for this classification task. Despite that, it is one of the challenging issues for this problem. Accordingly, finding appropriate techniques to label unlabeled data is vital to enhance bot detection. In this paper, we investigate two labeling techniques for semi-supervised learning to evaluate different performances for bot detection. We examine self-training and co-training. Our results show that self-training with maximum confidence outperformed by achieving a score of 0.856 for F1 measure and 0.84 for AUC. Random Forest classifier in both techniques performed better compared to other classifiers. In co-training, using single view approach with random forest classifier using less features achieved slightly better compared to single view with more features. Using multi-view of features in co-training in general achieved similar results for different splits

    Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter

    Get PDF
    The last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naĂŻve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall)

    Data stream mining techniques: a review

    Get PDF
    A plethora of infinite data is generated from the Internet and other information sources. Analyzing this massive data in real-time and extracting valuable knowledge using different mining applications platforms have been an area for research and industry as well. However, data stream mining has different challenges making it different from traditional data mining. Recently, many studies have addressed the concerns on massive data mining problems and proposed several techniques that produce impressive results. In this paper, we review real time clustering and classification mining techniques for data stream. We analyze the characteristics of data stream mining and discuss the challenges and research issues of data steam mining. Finally, we present some of the platforms for data stream mining

    Characteristics of Similar-Context Trending Hashtags in Twitter: A Case Study

    Get PDF
    © 2020, Springer Nature Switzerland AG. Twitter is a popular social networking platform that is widely used in discussing and spreading information on global events. Twitter trending hashtags have been one of the topics for researcher to study and analyze. Understanding the posting behavior patterns as the information flows increase by rapid events can help in predicting future events or detection manipulation. In this paper, we investigate similar-context trending hashtags to characterize general behavior of specific-trend and generic-trend within same context. We demonstrate an analysis to study and compare such trends based on spatial, temporal, content, and user activity. We found that the characteristics of similar-context trends can be used to predict future generic trends with analogous spatiotemporal, content, and user features. Our results show that more than 70% users participate in location-based hashtag belongs to the location of the hashtag. Generic trends aim to have more influence in users to participate than specific trends with geographical context. The retweet ratio in specific trends is higher than generic trends with more than 79%

    A Synaptic Pruning-Based Spiking Neural Network for Hand-Written Digits Classification

    Get PDF
    A spiking neural network model inspired by synaptic pruning is developed and trained to extract features of hand-written digits. The network is composed of three spiking neural layers and one output neuron whose firing rate is used for classification. The model detects and collects the geometric features of the images from the Modified National Institute of Standards and Technology database (MNIST). In this work, a novel learning rule is developed to train the network to detect features of different digit classes. For this purpose, randomly initialized synaptic weights between the first and second layers are updated using average firing rates of pre- and postsynaptic neurons. Then, using a neuroscience-inspired mechanism named, “synaptic pruning” and its predefined threshold values, some of the synapses are deleted. Hence, these sparse matrices named, “information channels” are constructed so that they show highly specific patterns for each digit class as connection matrices between the first and second layers. The “information channels” are used in the test phase to assign a digit class to each test image. In addition, the role of feed-back inhibition as well as the connectivity rates of the second and third neural layers are studied. Similar to the abilities of the humans to learn from small training trials, the developed spiking neural network needs a very small dataset for training, compared to the conventional deep learning methods that have shown a very good performance on the MNIST dataset. This work introduces a new class of brain-inspired spiking neural networks to extract the features of complex data images

    Integration of genome-wide expression and methylation data: Relevance to aging and Alzheimer\u27s disease

    Get PDF
    The progressive and latent nature of neurodegenerative diseases, such as Alzheimer\u27s disease (AD) indicates the role of epigenetic modification in disease susceptibility. Previous studies from our lab show that developmental exposure to lead (Pb) perturbs the expression of AD-associated proteins. In order to better understand the role of DNA methylation as an epigenetic modifications mechanism in gene expression regulation, an integrative study of global gene expression and methylation profiles is essential. Given the different formats of gene expression and methylation data, combining these data for integrative analysis can be challenging. In this paper we describe a method to integrate and analyze gene expression and methylation arrays. Methylation array raw data contain the signal intensities of each probe of CpG sites, whereas gene expression data measure the signal intensity values of genes. In order to combine these data, methylation data of CpG sites have to be associated with genes

    Genome-wide expression and methylation profiling in the aged rodent brain due to early-life Pb exposure and its relevance to aging

    Get PDF
    In this study, we assessed global gene expression patterns in adolescent mice exposed to lead (Pb) as infants and their aged siblings to identify reprogrammed genes. Global expression on postnatal day 20 and 700 was analyzed and genes that were down- and up-regulated (≄2 fold) were identified, clustered and analyzed for their relationship to DNA methylation. About 150 genes were differentially expressed in old age. In normal aging, we observed an up-regulation of genes related to the immune response, metal-binding, metabolism and transcription/transduction coupling. Prior exposure to Pb revealed a repression in these genes suggesting that disturbances in developmental stages of the brain compromise the ability to defend against age-related stressors, thus promoting the neurodegenerative process. Overexpression and repression of genes corresponded with their DNA methylation profile

    Bot-Mgat: A Transfer Learning Model Based On A Multi-View Graph Attention Network To Detect Social Bots

    Get PDF
    Twitter, as a popular social network, has been targeted by different bot attacks. Detecting social bots is a challenging task, due to their evolving capacity to avoid detection. Extensive research efforts have proposed different techniques and approaches to solving this problem. Due to the scarcity of recently updated labeled data, the performance of detection systems degrades when exposed to a new dataset. Therefore, semi-supervised learning (SSL) techniques can improve performance, using both labeled and unlabeled examples. In this paper, we propose a framework based on the multi-view graph attention mechanism using a transfer learning (TL) approach, to predict social bots. We called the framework \u27Bot-MGAT\u27, which stands for bot multi-view graph attention network. The framework used both labeled and unlabeled data. We used profile features to reduce the overheads of the feature engineering. We executed our experiments on a recent benchmark dataset that included representative samples of social bots with graph structural information and profile features only. We applied cross-validation to avoid uncertainty in the model\u27s performance. Bot-MGAT was evaluated using graph SSL techniques: single graph attention networks (GAT), graph convolutional networks (GCN), and relational graph convolutional networks (RGCN). We compared Bot-MGAT to related work in the field of bot detection. The results of Bot-MGAT with TL outperformed, with an accuracy score of 97.8%, an F1 score of 0.9842, and an MCC score of 0.9481

    Real Time Detection of Social Bots on Twitter Using Machine Learning and Apache Kafka

    Get PDF
    Social media networks, like Facebook and Twitter, are increasingly becoming important part of most people\u27s lives. Twitter provides a useful platform for sharing contents, ideas, opinions, and promoting products and election campaigns. Due to the increased popularity, it became vulnerable to malicious attacks caused by social bots. Social bots are automated accounts created for different purposes. They are involved in spreading rumors and false information, cyberbullying, spamming, and manipulating the ecosystem of social network. Most of the social bots detection methods rely on the utilization of offline data for both training and testing. In this paper, we use Apache Kafka, a big data analytics tool to stream data from Twitter API in real time. We use profile information (metadata) as features. A machine learning technique is applied to predict the type of the incoming data (human or bot). In addition, the paper presents technical details of how to configure these different tools
    corecore