289 research outputs found
How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?
Although the open source model bears many advantages in software development,
open source projects are always hard to sustain. Previous research on open
source sustainability mainly focuses on projects that have already reached a
certain level of maturity (e.g., with communities, releases, and downstream
projects). However, limited attention is paid to the development of
(sustainable) open source projects in their infancy, and we believe an
understanding of early sustainability determinants is crucial for project
initiators, incubators, newcomers, and users.
In this paper, we aim to explore the relationship between early participation
factors and long-term project sustainability. We leverage a novel methodology
combining the Blumberg model of performance and machine learning to predict the
sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost
model based on early participation (first three months of activity) in 290,255
GitHub projects and we interpret the model using LIME. We quantitatively show
that early participants have a positive effect on project's future sustained
activity if they have prior experience in OSS project incubation and
demonstrate concentrated focus and steady commitment. Participation from
non-code contributors and detailed contribution documentation also promote
project's sustained activity. Compared with individual projects, building a
community that consists of more experienced core developers and more active
peripheral developers is important for organizational projects. This study
provides unique insights into the incubation and recognition of sustainable
open source projects, and our interpretable prediction approach can also offer
guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and
Symposium on the Foundations of Software Engineering (ESEC/FSE 2023
Personalized First Issue Recommender for Newcomers in Open Source Projects
Many open source projects provide good first issues (GFIs) to attract and
retain newcomers. Although several automated GFI recommenders have been
proposed, existing recommenders are limited to recommending generic GFIs
without considering differences between individual newcomers. However, we
observe mismatches between generic GFIs and the diverse background of
newcomers, resulting in failed attempts, discouraged onboarding, and delayed
issue resolution. To address this problem, we assume that personalized first
issues (PFIs) for newcomers could help reduce the mismatches. To justify the
assumption, we empirically analyze 37 newcomers and their first issues resolved
across multiple projects. We find that the first issues resolved by the same
newcomer share similarities in task type, programming language, and project
domain. These findings underscore the need for a PFI recommender to improve
over state-of-the-art approaches. For that purpose, we identify features that
influence newcomers' personalized selection of first issues by analyzing the
relationship between possible features of the newcomers and the characteristics
of the newcomers' chosen first issues. We find that the expertise preference,
OSS experience, activeness, and sentiment of newcomers drive their personalized
choice of the first issues. Based on these findings, we propose a Personalized
First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate
issues for a given newcomer by leveraging the identified influential features.
We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects.
The evaluation results show that PFIRec outperforms existing first issue
recommenders, potentially doubling the probability that the top recommended
issue is suitable for a specific newcomer and reducing one-third of a
newcomer's unsuccessful attempts to identify suitable first issues, in the
median.Comment: The 38th IEEE/ACM International Conference on Automated Software
Engineering (ASE 2023
PEGA: Personality-Guided Preference Aggregator for Ephemeral Group Recommendation
Recently, making recommendations for ephemeral groups which contain dynamic
users and few historic interactions have received an increasing number of
attention. The main challenge of ephemeral group recommender is how to
aggregate individual preferences to represent the group's overall preference.
Score aggregation and preference aggregation are two commonly-used methods that
adopt hand-craft predefined strategies and data-driven strategies,
respectively. However, they neglect to take into account the importance of the
individual inherent factors such as personality in the group. In addition, they
fail to work well due to a small number of interactive records. To address
these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for
ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to
define the concept of Group Personality. We then use the personality attention
mechanism to aggregate group preferences. The role of personality in our
approach is twofold: (1) To estimate individual users' importance in a group
and provide explainability; (2) to alleviate the data sparsity issue that
occurred in ephemeral groups. The experimental results demonstrate that our
model significantly outperforms the state-of-the-art methods w.r.t. the score
of both Recall and NDCG on Amazon and Yelp datasets
Cascaded Detail-Preserving Networks for Super-Resolution of Document Images
The accuracy of OCR is usually affected by the quality of the input document
image and different kinds of marred document images hamper the OCR results.
Among these scenarios, the low-resolution image is a common and challenging
case. In this paper, we propose the cascaded networks for document image
super-resolution. Our model is composed by the Detail-Preserving Networks with
small magnification. The loss function with perceptual terms is designed to
simultaneously preserve the original patterns and enhance the edge of the
characters. These networks are trained with the same architecture and different
parameters and then assembled into a pipeline model with a larger
magnification. The low-resolution images can upscale gradually by passing
through each Detail-Preserving Network until the final high-resolution images.
Through extensive experiments on two scanning document image datasets, we
demonstrate that the proposed approach outperforms recent state-of-the-art
image super-resolution methods, and combining it with standard OCR system lead
to signification improvements on the recognition results
Comparative venom gland transcriptome analysis of the scorpion Lychas mucronatus reveals intraspecific toxic gene diversity and new venomous components
<p>Abstract</p> <p>Background</p> <p><it>Lychas mucronatus </it>is one scorpion species widely distributed in Southeast Asia and southern China. Anything is hardly known about its venom components, despite the fact that it can often cause human accidents. In this work, we performed a venomous gland transcriptome analysis by constructing and screening the venom gland cDNA library of the scorpion <it>Lychas mucronatus </it>from Yunnan province and compared it with the previous results of Hainan-sourced <it>Lychas mucronatus</it>.</p> <p>Results</p> <p>A total of sixteen known types of venom peptides and proteins are obtained from the venom gland cDNA library of Yunnan-sourced <it>Lychas mucronatus</it>, which greatly increase the number of currently reported scorpion venom peptides. Interestingly, we also identified nineteen atypical types of venom molecules seldom reported in scorpion species. Surprisingly, the comparative transcriptome analysis of Yunnan-sourced <it>Lychas mucronatus </it>and Hainan-sourced <it>Lychas mucronatus </it>indicated that enormous diversity and vastly abundant difference could be found in venom peptides and proteins between populations of the scorpion <it>Lychas mucronatus </it>from different geographical regions.</p> <p>Conclusions</p> <p>This work characterizes a large number of venom molecules never identified in scorpion species. This result provides a comparative analysis of venom transcriptomes of the scorpion <it>Lychas mucronatus </it>from different geographical regions, which thoroughly reveals the fact that the venom peptides and proteins of the same scorpion species from different geographical regions are highly diversified and scorpion evolves to adapt a new environment by altering the primary structure and abundance of venom peptides and proteins.</p
DACSR: Decoupled-Aggregated End-to-End Calibrated Sequential Recommendation
Sequential recommendations have made great strides in accurately predicting
the future behavior of users. However, seeking accuracy alone may bring side
effects such as unfair and overspecialized recommendation results. In this
work, we focus on the calibrated recommendations for sequential recommendation,
which is connected to both fairness and diversity. On the one hand, it aims to
provide fairer recommendations whose preference distributions are consistent
with users' historical behaviors. On the other hand, it can improve the
diversity of recommendations to a certain degree. But existing methods for
calibration have mainly relied on the post-processing on the candidate lists,
which require more computation time in generating recommendations. In addition,
they fail to establish the relationship between accuracy and calibration,
leading to the limitation of accuracy. To handle these problems, we propose an
end-to-end framework to provide both accurate and calibrated recommendations
for sequential recommendation. We design an objective function to calibrate the
interests between recommendation lists and historical behaviors. We also
provide distribution modification approaches to improve the diversity and
mitigate the effect of imbalanced interests. In addition, we design a
decoupled-aggregated model to improve the recommendation. The framework assigns
two objectives to two individual sequence encoders, and aggregates the outputs
by extracting useful information. Experiments on benchmark datasets validate
the effectiveness of our proposed model
Numerical study of tidal effect on the water flux across the Korea/Tsushima Strait
Tremendous amounts of materials and energy are transported from the East China Sea (ECS) to the East/Japan Sea (EJS) through the Korea/Tsushima Strait (KTS). Tides undoubtedly play an important role in regulating ocean circulation on the broad continental shelf of the ECS, while the effects of tides on the water exchange between the ECS and EJS remain unclear. Using a three-dimensional Regional Oceanic Modeling System (ROMS) circulation model, we conducted numerical experiments with tides, without tides, and only barotropic tides. The results showed that the water flux across the KTS can increase by up to 13% (in summer) when excluding tides from the numerical simulation. To understand how tidal forcing regulates the KTS water flux, we performed a dynamic diagnostic analysis and revealed that the variation in sea surface height under tidal effect is the main reason for the water flux variation across the KTS. The tidal effect can adjust the sea surface height, weaken the pressure gradient and reduce the water flux across the KTS, which affect the intensity of water exchange between the ECS and EJS. The tidal effect can alter sea level difference between the Taiwan Strait and the KTS, which influences the KTS water flux. Tides can also influence the KTS water flux by altering the sea surface height through interaction with topography and stratification. We also found that tidal effect weakens the northward intrusion of the Yellow Sea Warm Current in winter and in turn enhances the water flux across the KTS according to volume conservation. These modeling results imply that tides must be considered when simulating the ocean environment of the northwestern Pacific Ocean
- …