42,153 research outputs found
Stochastic Data Clustering
In 1961 Herbert Simon and Albert Ando published the theory behind the
long-term behavior of a dynamical system that can be described by a nearly
uncoupled matrix. Over the past fifty years this theory has been used in a
variety of contexts, including queueing theory, brain organization, and
ecology. In all these applications, the structure of the system is known and
the point of interest is the various stages the system passes through on its
way to some long-term equilibrium.
This paper looks at this problem from the other direction. That is, we
develop a technique for using the evolution of the system to tell us about its
initial structure, and we use this technique to develop a new algorithm for
data clustering.Comment: 23 page
User Review-Based Change File Localization for Mobile Applications
In the current mobile app development, novel and emerging DevOps practices
(e.g., Continuous Delivery, Integration, and user feedback analysis) and tools
are becoming more widespread. For instance, the integration of user feedback
(provided in the form of user reviews) in the software release cycle represents
a valuable asset for the maintenance and evolution of mobile apps. To fully
make use of these assets, it is highly desirable for developers to establish
semantic links between the user reviews and the software artefacts to be
changed (e.g., source code and documentation), and thus to localize the
potential files to change for addressing the user feedback. In this paper, we
propose RISING (Review Integration via claSsification, clusterIng, and
linkiNG), an automated approach to support the continuous integration of user
feedback via classification, clustering, and linking of user reviews. RISING
leverages domain-specific constraint information and semi-supervised learning
to group user reviews into multiple fine-grained clusters concerning similar
users' requests. Then, by combining the textual information from both commit
messages and source code, it automatically localizes potential change files to
accommodate the users' requests. Our empirical studies demonstrate that the
proposed approach outperforms the state-of-the-art baseline work in terms of
clustering and localization accuracy, and thus produces more reliable results.Comment: 15 pages, 3 figures, 8 table
Recommended from our members
Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.
Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity
Analysing Lexical Semantic Change with Contextualised Word Representations
This paper presents the first unsupervised approach to lexical semantic
change that makes use of contextualised word representations. We propose a
novel method that exploits the BERT neural language model to obtain
representations of word usages, clusters these representations into usage
types, and measures change along time with three proposed metrics. We create a
new evaluation dataset and show that the model representations and the detected
semantic shifts are positively correlated with human judgements. Our extensive
qualitative analysis demonstrates that our method captures a variety of
synchronic and diachronic linguistic phenomena. We expect our work to inspire
further research in this direction.Comment: To appear in Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics (ACL-2020
- …