307 research outputs found
Detailed Overview of Software Smells
This document provides an overview of literature concerning software smells covering various dimensions of smells along with their corresponding references
On the Feasibility of Transfer-learning Code Smells using Deep Learning
Context: A substantial amount of work has been done to detect smells in
source code using metrics-based and heuristics-based methods. Machine learning
methods have been recently applied to detect source code smells; however, the
current practices are considered far from mature. Objective: First, explore the
feasibility of applying deep learning models to detect smells without extensive
feature engineering, just by feeding the source code in tokenized form. Second,
investigate the possibility of applying transfer-learning in the context of
deep learning models for smell detection. Method: We use existing metric-based
state-of-the-art methods for detecting three implementation smells and one
design smell in C# code. Using these results as the annotated gold standard, we
train smell detection models on three different deep learning architectures.
These architectures use Convolution Neural Networks (CNNs) of one or two
dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden
layers. For the first objective of our study, we perform training and
evaluation on C# samples, whereas for the second objective, we train the models
from C# code and evaluate the models over Java code samples. We perform the
experiments with various combinations of hyper-parameters for each model.
Results: We find it feasible to detect smells using deep learning methods. Our
comparative experiments find that there is no clearly superior method between
CNN-1D and CNN-2D. We also observe that performance of the deep learning models
is smell-specific. Our transfer-learning experiments show that
transfer-learning is definitely feasible for implementation smells with
performance comparable to that of direct-learning. This work opens up a new
paradigm to detect code smells by transfer-learning especially for the
programming languages where the comprehensive code smell detection tools are
not available
Leveraging Sentiment Analysis for Twitter Data to Uncover User Opinions and Emotions
Huge amounts of emotion are expressed on social media in the form of tweets, blogs, and updates to posts, statuses, etc. Twitter, one of the most well-known microblogging platforms, is used in this essay. Twitter is a social networking site that enables users to post status updates and other brief messages with a maximum character count of 280. Twitter sentiment analysis is the application of sentiment analysis to Twitter data (tweets) in order to derive user sentiments and opinions. Due to the extensive usage, we intend to reflect the mood of the general people by examining the thoughts conveyed in the tweets. Numerous applications require the analysis of public opinion, including businesses attempting to gauge the market response to their products, the prediction of political outcomes, and the analysis of socioeconomic phenomena like stock exchange. Sentiment classification attempts to estimate the sentiment polarity of user updates automatically. So, in order to categorize a tweet as good or negative, we need a model that can accurately discern sarcasm from the lexical meaning of the text. The main objective is to create a practical classifier that can accurately classify the sentiment of twitter streams relating to GST and Tax. Python is used to carry out the suggested algorithm
Naturalness of Attention: Revisiting Attention in Code Language Models
Language models for code such as CodeBERT offer the capability to learn
advanced source code representation, but their opacity poses barriers to
understanding of captured properties. Recent attention analysis studies provide
initial interpretability insights by focusing solely on attention weights
rather than considering the wider context modeling of Transformers. This study
aims to shed some light on the previously ignored factors of the attention
mechanism beyond the attention weights. We conduct an initial empirical study
analyzing both attention distributions and transformed representations in
CodeBERT. Across two programming languages, Java and Python, we find that the
scaled transformation norms of the input better capture syntactic structure
compared to attention weights alone. Our analysis reveals characterization of
how CodeBERT embeds syntactic code properties. The findings demonstrate the
importance of incorporating factors beyond just attention weights for
rigorously understanding neural code models. This lays the groundwork for
developing more interpretable models and effective uses of attention mechanisms
in program analysis.Comment: Accepted at ICSE-NIER (2024) trac
CONCORD: Towards a DSL for Configurable Graph Code Representation
Deep learning is widely used to uncover hidden patterns in large code
corpora. To achieve this, constructing a format that captures the relevant
characteristics and features of source code is essential. Graph-based
representations have gained attention for their ability to model structural and
semantic information. However, existing tools lack flexibility in constructing
graphs across different programming languages, limiting their use.
Additionally, the output of these tools often lacks interoperability and
results in excessively large graphs, making graph-based neural networks
training slower and less scalable.
We introduce CONCORD, a domain-specific language to build customizable graph
representations. It implements reduction heuristics to reduce graphs' size
complexity. We demonstrate its effectiveness in code smell detection as an
illustrative use case and show that: first, CONCORD can produce code
representations automatically per the specified configuration, and second, our
heuristics can achieve comparable performance with significantly reduced size.
CONCORD will help researchers a) create and experiment with customizable
graph-based code representations for different software engineering tasks
involving DL, b) reduce the engineering work to generate graph representations,
c) address the issue of scalability in GNN models, and d) enhance the
reproducibility of experiments in research through a standardized approach to
code representation and analysis
Definitions of a Software Smell
Many authors have defined smells from their perspective. This document attempts to provide a consolidated list of such definitions
Lixisenatide: a once-daily glucagon-like peptide-1 receptor agonist
Lixisenatide (AVE0010) is a once-daily glucagon-like peptide-1 (GLP-1) receptor agonist used in the treatment of type 2 diabetes. Phase II dose-finding and pharmacodynamic studies identified the 20 µg once-daily dose as having the optimum combination of efficacy, convenience and tolerability. Lixisenatide was prospectively investigated in a series of 11 multinational, randomised, controlled phase III trials (GLP-1 agonist AVE0010 in patients with type 2 diabetes mellitus for Glycemic control and safety evaluation [Getgoal] programme) that included a direct head-to-head study with exenatide. The Getgoal programme established the efficacy and safety profile of lixisenatide 20 µg once daily across the spectrum of patients with type 2 diabetes, including patients not treated with anti-diabetic agents, those failing on oral agents and as an adjunct to basal insulin therapy. The main efficacy endpoints were met in all studies, with the baseline to endpoint reductions in HbA1c consistently ranging from 0.7% to 1.0%. In a head-to-head comparison with exenatide 10 µg twice daily, lixisenatide 20 µg once daily was non-inferior for HbA1c reduction, achieved with threefold fewer patients with symptomatic hypoglycemia events and better gastrointestinal tolerability. Three randomised trials of lixisenatide treatment added to basal insulin showed significantly improved glycemic control over placebo, with pronounced postprandial glucose reductions and good tolerability. Discontinuations for adverse events were consistently low, ranging from 2.5% to 10.4%. As the provision of individualized care moves center stage in diabetes management, lixisenatide with once-daily dosing, a single maintenance dose and fixed-dose pens offers an important treatment option for type 2 diabetes
COMET: Generating Commit Messages using Delta Graph Context Representation
Commit messages explain code changes in a commit and facilitate collaboration
among developers. Several commit message generation approaches have been
proposed; however, they exhibit limited success in capturing the context of
code changes. We propose Comet (Context-Aware Commit Message Generation), a
novel approach that captures context of code changes using a graph-based
representation and leverages a transformer-based model to generate high-quality
commit messages. Our proposed method utilizes delta graph that we developed to
effectively represent code differences. We also introduce a customizable
quality assurance module to identify optimal messages, mitigating subjectivity
in commit messages. Experiments show that Comet outperforms state-of-the-art
techniques in terms of bleu-norm and meteor metrics while being comparable in
terms of rogue-l. Additionally, we compare the proposed approach with the
popular gpt-3.5-turbo model, along with gpt-4-turbo; the most capable GPT
model, over zero-shot, one-shot, and multi-shot settings. We found Comet
outperforming the GPT models, on five and four metrics respectively and provide
competitive results with the two other metrics. The study has implications for
researchers, tool developers, and software developers. Software developers may
utilize Comet to generate context-aware commit messages. Researchers and tool
developers can apply the proposed delta graph technique in similar contexts,
like code review summarization.Comment: 22 Pages, 7 Figure
QUANTIFICATION OF EXEMESTANE ACCUMULATION DURING MICROBIAL BIOCONVERSION BY TLC IMAGE ANALYSIS
Objective: The present study was aimed at developing a rapid, cost effective and accurate method for quantification of exemestane using thin layer chromatography (TLC) separation followed by image analysis and to test it for monitoring the accumulation of exemestane during microbial bioconversion.Methods: After microbial bioconversion and TLC separation of products formed, exemestane was quantified using ImageQuant TL v2003 image analysis software and the results were compared with high performance liquid chromatography (HPLC) analysis.Results: The percentage error between TLC and HPLC analyses was ranged from (-) 5.18 to (+) 5.51. Bacterial strains Arthrobacter simplex IAM 1660, Nocardia sp. MTCC 1534, Pseudomonas putida MTCC 1194 and Rhodococcus rhodochorus MTCC 291 respectively yielded 79.7 (72 h), 63.9 (72 h), 69.8 (96 h) and 83.2 (96 h) mole percent bioconversion of 6-methylene androstenedione to exemestane. Conclusion: Rhodococcus rhodochorus MTCC 291 was found to be the most suitable organism for the bioconversion and may be used to develop an eco-friendly route to replace chemical synthesis that eliminates the use of toxic chemicals and side products
- …