Search CORE

289 research outputs found

How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?

Author: He Hao
Xiao Wenxin
Xu Weiwei
Zhang Yuxia
Zhou Minghui
Publication venue
Publication date: 28/09/2023
Field of study

Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost model based on early participation (first three months of activity) in 290,255 GitHub projects and we interpret the model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

arXiv.org e-Print Archive

Personalized First Issue Recommender for Newcomers in Open Source Projects

Author: He Hao
Li Jingyue
Qiu Ruiqiao
Xiao Wenxin
Zhou Minghui
Publication venue
Publication date: 17/08/2023
Field of study

Many open source projects provide good first issues (GFIs) to attract and retain newcomers. Although several automated GFI recommenders have been proposed, existing recommenders are limited to recommending generic GFIs without considering differences between individual newcomers. However, we observe mismatches between generic GFIs and the diverse background of newcomers, resulting in failed attempts, discouraged onboarding, and delayed issue resolution. To address this problem, we assume that personalized first issues (PFIs) for newcomers could help reduce the mismatches. To justify the assumption, we empirically analyze 37 newcomers and their first issues resolved across multiple projects. We find that the first issues resolved by the same newcomer share similarities in task type, programming language, and project domain. These findings underscore the need for a PFI recommender to improve over state-of-the-art approaches. For that purpose, we identify features that influence newcomers' personalized selection of first issues by analyzing the relationship between possible features of the newcomers and the characteristics of the newcomers' chosen first issues. We find that the expertise preference, OSS experience, activeness, and sentiment of newcomers drive their personalized choice of the first issues. Based on these findings, we propose a Personalized First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate issues for a given newcomer by leveraging the identified influential features. We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects. The evaluation results show that PFIRec outperforms existing first issue recommenders, potentially doubling the probability that the top recommended issue is suitable for a specific newcomer and reducing one-third of a newcomer's unsuccessful attempts to identify suitable first issues, in the median.Comment: The 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

arXiv.org e-Print Archive

PEGA: Personality-Guided Preference Aggregator for Ephemeral Group Recommendation

Author: Chen Xin
He Liang
Hu Wenxin
Shi Liye
Wu Wen
Ye Guangze
Publication venue
Publication date: 18/04/2023
Field of study

Recently, making recommendations for ephemeral groups which contain dynamic users and few historic interactions have received an increasing number of attention. The main challenge of ephemeral group recommender is how to aggregate individual preferences to represent the group's overall preference. Score aggregation and preference aggregation are two commonly-used methods that adopt hand-craft predefined strategies and data-driven strategies, respectively. However, they neglect to take into account the importance of the individual inherent factors such as personality in the group. In addition, they fail to work well due to a small number of interactive records. To address these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to define the concept of Group Personality. We then use the personality attention mechanism to aggregate group preferences. The role of personality in our approach is twofold: (1) To estimate individual users' importance in a group and provide explainability; (2) to alleviate the data sparsity issue that occurred in ephemeral groups. The experimental results demonstrate that our model significantly outperforms the state-of-the-art methods w.r.t. the score of both Recall and NDCG on Amazon and Yelp datasets

arXiv.org e-Print Archive

Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Author: Fu Zhichao
He Liang
Hu Wenxin
Kong Yu
Yang Jing
Ye Hao
Zheng Yingbin
Publication venue
Publication date: 25/11/2019
Field of study

The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results. Among these scenarios, the low-resolution image is a common and challenging case. In this paper, we propose the cascaded networks for document image super-resolution. Our model is composed by the Detail-Preserving Networks with small magnification. The loss function with perceptual terms is designed to simultaneously preserve the original patterns and enhance the edge of the characters. These networks are trained with the same architecture and different parameters and then assembled into a pipeline model with a larger magnification. The low-resolution images can upscale gradually by passing through each Detail-Preserving Network until the final high-resolution images. Through extensive experiments on two scanning document image datasets, we demonstrate that the proposed approach outperforms recent state-of-the-art image super-resolution methods, and combining it with standard OCR system lead to signification improvements on the recognition results

arXiv.org e-Print Archive

Crossref

Comparative venom gland transcriptome analysis of the scorpion Lychas mucronatus reveals intraspecific toxic gene diversity and new venomous components

Author: Ruiming Zhao
Wenxin Li
Yawen He
Yibao Ma
Yingliang Wu
Zhijian Cao
Zhiyong Di
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>Lychas mucronatus </it>is one scorpion species widely distributed in Southeast Asia and southern China. Anything is hardly known about its venom components, despite the fact that it can often cause human accidents. In this work, we performed a venomous gland transcriptome analysis by constructing and screening the venom gland cDNA library of the scorpion <it>Lychas mucronatus </it>from Yunnan province and compared it with the previous results of Hainan-sourced <it>Lychas mucronatus</it>. Results A total of sixteen known types of venom peptides and proteins are obtained from the venom gland cDNA library of Yunnan-sourced <it>Lychas mucronatus</it>, which greatly increase the number of currently reported scorpion venom peptides. Interestingly, we also identified nineteen atypical types of venom molecules seldom reported in scorpion species. Surprisingly, the comparative transcriptome analysis of Yunnan-sourced <it>Lychas mucronatus </it>and Hainan-sourced <it>Lychas mucronatus </it>indicated that enormous diversity and vastly abundant difference could be found in venom peptides and proteins between populations of the scorpion <it>Lychas mucronatus </it>from different geographical regions. Conclusions This work characterizes a large number of venom molecules never identified in scorpion species. This result provides a comparative analysis of venom transcriptomes of the scorpion <it>Lychas mucronatus </it>from different geographical regions, which thoroughly reveals the fact that the venom peptides and proteins of the same scorpion species from different geographical regions are highly diversified and scorpion evolves to adapt a new environment by altering the primary structure and abundance of venom peptides and proteins.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DACSR: Decoupled-Aggregated End-to-End Calibrated Sequential Recommendation

Author: Jiayi Chen
Liang He
Liye Shi
Wei Zheng
Wen Wu
Wenxin Hu
Xi Chen
Yu Ji
Publication venue
Publication date: 21/10/2022
Field of study

Sequential recommendations have made great strides in accurately predicting the future behavior of users. However, seeking accuracy alone may bring side effects such as unfair and overspecialized recommendation results. In this work, we focus on the calibrated recommendations for sequential recommendation, which is connected to both fairness and diversity. On the one hand, it aims to provide fairer recommendations whose preference distributions are consistent with users' historical behaviors. On the other hand, it can improve the diversity of recommendations to a certain degree. But existing methods for calibration have mainly relied on the post-processing on the candidate lists, which require more computation time in generating recommendations. In addition, they fail to establish the relationship between accuracy and calibration, leading to the limitation of accuracy. To handle these problems, we propose an end-to-end framework to provide both accurate and calibrated recommendations for sequential recommendation. We design an objective function to calibrate the interests between recommendation lists and historical behaviors. We also provide distribution modification approaches to improve the diversity and mitigate the effect of imbalanced interests. In addition, we design a decoupled-aggregated model to improve the recommendation. The framework assigns two objectives to two individual sequence encoders, and aggregates the outputs by extracting useful information. Experiments on benchmark datasets validate the effectiveness of our proposed model

arXiv.org e-Print Archive

Directory of Open Access Journals

Numerical study of tidal effect on the water flux across the Korea/Tsushima Strait

Author: Baoshu Yin
Baoshu Yin
Baoshu Yin
Baoshu Yin
Baoshu Yin
Dezhou Yang
Dezhou Yang
Dezhou Yang
Dezhou Yang
Dezhou Yang
Lingjing Xu
Lingjing Xu
Lingjing Xu
Lingjing Xu
Wenxin Jiang
Wenxin Jiang
Wenxin Jiang
Wenxin Jiang
Wenxin Jiang
Xuan Cui
Xuan Cui
Xuan Cui
Xuan Cui
Zhiwei He
Zhiwei He
Zhiwei He
Zhiwei He
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

Tremendous amounts of materials and energy are transported from the East China Sea (ECS) to the East/Japan Sea (EJS) through the Korea/Tsushima Strait (KTS). Tides undoubtedly play an important role in regulating ocean circulation on the broad continental shelf of the ECS, while the effects of tides on the water exchange between the ECS and EJS remain unclear. Using a three-dimensional Regional Oceanic Modeling System (ROMS) circulation model, we conducted numerical experiments with tides, without tides, and only barotropic tides. The results showed that the water flux across the KTS can increase by up to 13% (in summer) when excluding tides from the numerical simulation. To understand how tidal forcing regulates the KTS water flux, we performed a dynamic diagnostic analysis and revealed that the variation in sea surface height under tidal effect is the main reason for the water flux variation across the KTS. The tidal effect can adjust the sea surface height, weaken the pressure gradient and reduce the water flux across the KTS, which affect the intensity of water exchange between the ECS and EJS. The tidal effect can alter sea level difference between the Taiwan Strait and the KTS, which influences the KTS water flux. Tides can also influence the KTS water flux by altering the sea surface height through interaction with topography and stratification. We also found that tidal effect weakens the northward intrusion of the Yellow Sea Warm Current in winter and in turn enhances the water flux across the KTS according to volume conservation. These modeling results imply that tides must be considered when simulating the ocean environment of the northwestern Pacific Ocean

Directory of Open Access Journals

Transcriptome analysis of the venom gland of the scorpion Scorpiops jendeki: implication for the evolution of the scorpion venom arsenal

Author: Cao Zhijian
He Yawen
Li Songryong
Li Wenxin
Liu Jun
Ma Yibao
Wu Yingliang
Zhao Ruiming
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central