Search CORE

5 research outputs found

A Cross-Repository Model for Predicting Popularity in GitHub

Author: Bidoki Neda Hajiakhoond
Garibay Ivan
Keathley Heather
Sukthankar Gita
Publication venue
Publication date: 01/12/2018
Field of study

Social coding platforms, such as GitHub, can serve as natural laboratories for studying the diffusion of innovation through tracking the pattern of code adoption by programmers. This paper focuses on the problem of predicting the popularity of software repositories over time; our aim is to forecast the time series of popularity-related events (code forks and watches). In particular, we are interested in cross-repository patterns-how do events on one repository affect other repositories? Our proposed LSTM (Long Short-Term Memory) recurrent neural network integrates events across multiple active repositories, outperforming a standard ARIMA (Auto-Regressive Integrated Moving Average) time series prediction based on the single repository. The ability of the LSTM to leverage cross-repository information gives it a significant edge over standard time series forecasting.Comment: 6 page

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Code quality in pull requests: an empirical study

Author: Nikkola Vili
Publication venue
Publication date: 10/10/2019
Field of study

Pull requests are a common practice for contributing and reviewing contributions, and are employed both in open-source and industrial contexts. Compared to the traditional code review process adopted in the 1970s and 1980s, pull requests allow a more lightweight reviewing approach. One of the main goals of code reviews is to find defects in the code, allowing project maintainers to easily integrate external contributions into a project and discuss the code contributions. The goal of this work is to understand whether code quality is actually considered when pull requests are accepted. Specifically, we aim at understanding whether code quality issues such as code smells, antipatterns, and coding style violations in the pull request code affect the chance of its acceptance when reviewed by a maintainer of the project. We conducted a case study among 28 Java open-source projects, analyzing the presence of 4.7 M code quality issues in 36 K pull requests. We analyzed further correlations by applying Logistic Regression and seven machine learning techniques (Decision Tree, Random Forest, Extremely Randomized Trees, AdaBoost, Gradient Boosting, XGBoost). Unexpectedly, code quality turned out not to affect the acceptance of a pull request at all. As suggested by other works, other factors such as the reputation of the maintainer and the importance of the feature delivered might be more important than code quality in terms of pull request acceptance

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Stochastic Sampling and Machine Learning Techniques for Social Media State Production

Author: Hajiakhoond Bidoki Neda
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

The rise in the importance of social media platforms as communication tools has been both a blessing and a curse. For scientists, they offer an unparalleled opportunity to study human social networks. However, these platforms have also been used to propagate misinformation and hate speech with alarming velocity and frequency. The overarching aim of our research is to leverage the data from social media platforms to create and evaluate a high-fidelity, at-scale computational simulation of online social behavior which can provide a deep quantitative understanding of adversaries\u27 use of the global information environment. Our hope is that this type of simulation can be used to predict and understand the spread of misinformation, false narratives, fraudulent financial pump and dump schemes, and cybersecurity threats. To do this, our research team has created an agent-based model that can handle a variety of prediction tasks. This dissertation introduces a set of sampling and deep learning techniques that we developed to predict specific aspects of the evolution of online social networks that have proven to be challenging to accurately predict with the agent-based model. First, we compare different strategies for predicting network evolution with sampled historical data based on community features. We demonstrate that our community-based model outperforms the global one at predicting population, user, and content activity, along with network topology over different datasets. Second, we introduce a deep learning model for burst prediction. Bursts may serve as a signal of topics that are of growing real-world interest. Since bursts can be caused by exogenous phenomena and are indicative of burgeoning popularity, leveraging cross-platform social media data is valuable for predicting bursts within a single social media platform. An LSTM model is proposed in order to capture the temporal dependencies and associations based upon activity information. These volume predictions can also serve as a valuable input for our agent-based model. Finally, we conduct an exploration of Graph Convolutional Networks to investigate the value of weak-ties in classifying academic literature with the use of graph convolutional neural networks. Our experiments look at the results of treating weak-ties as if they were strong-ties to determine if that assumption improves performance. We also examine how node removal affects prediction accuracy by selecting nodes according to different centrality measures. These experiments provide insight for which nodes are most important for the performance of targeted graph convolutional networks. Graph Convolutional Networks are important in the social network context as the sociological and anthropological concept of \u27homophily\u27 allows for the method to use network associations in assisting the attribute predictions in a social network

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

What factors influence the reviewer assignment to pull requests?

Author: Agrawal
Alexandre Plastino
Boehm
Daricélio M. Soares
de Lima Júnior
Fayyad
Gousios
Hall
Han
Jiang
Leonardo Murta
Manoel L. de Lima Júnior
Rahman
Soares
Soares
Steinmacher
Tong
Witten
Ying
Yu
Yu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Data for: What Factors Influence the Reviewer Assignment to Pull Requests?

Author: Moreira Soares D (via Mendeley Data)
Publication venue
Publication date: 13/02/2018
Field of study

Dataset used in experiments and all extracted association rules

Electronic Archiving System