7 research outputs found
TraceSim: An alignment method for computing stack trace similarity
ABSTRACT: Software systems can automatically submit crash reports to a repository for investigation when program failures occur. A significant portion of these crash reports are duplicate, i.e., they are caused by the same software issue. Therefore, if the volume of submitted reports is very large, automatic grouping of duplicate crash reports can significantly ease and speed up analysis of software failures. This task is known as crash report deduplication. Given a huge volume of incoming reports, increasing quality of deduplication is an important task. The majority of studies address it via information retrieval or sequence matching methods based on the similarity of stack traces from two crash reports. While information retrieval methods disregard the position of a frame in a stack trace, the existing works based on sequence matching algorithms do not fully consider subroutine global frequency and unmatched frames. Besides, due to data distribution differences among software projects, parameters that are learned using machine learning algorithms are necessary to provide more flexibility to the methods. In this paper, we propose TraceSim β an approach for crash report deduplication which combines TF-IDF, optimum global alignment, and machine learning (ML) in a novel way. Moreover, we propose a new evaluation methodology for this task that is more comprehensive and robust than previously used evaluation approaches. TraceSim significantly outperforms seven baselines and state-of-the-art methods in the majority of the scenarios. It is the only approach that achieves competitive results on all datasets regarding all considered metrics. Moreover, we conduct an extensive ablation study that demonstrates the importance of each TraceSimβs element to its final performance and robustness. Finally, we provide the source code for all considered methods and evaluation methodology as well as the created datasets
New Sulfamides Based on 1-Izopropil-3-Ξ±-Naftyl-5- Methoxymethyl-4-Aminopyrazole and Determination of Their Structure
ΠΠ»Ρ ΡΠ°Π½Π΅Π΅ ΠΏΠΎΠ»ΡΡΠ΅Π½Π½ΠΎΠ³ΠΎ 1-ΠΈΠ·ΠΎΠΏΡΠΎΠΏΠΈΠ»-3-Ξ±-Π½Π°ΡΡΠΈΠ»-5-ΠΌΠ΅ΡΠΎΠΊΡΠΈΠΌΠ΅ΡΠΈΠ»-4-Π½ΠΈΡΡΠΎΠ·ΠΎΠΏΠΈΡΠ°Π·ΠΎΠ»Π°
ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½Π° ΡΠ΅Π°ΠΊΡΠΈΡ Π²ΠΎΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½ΠΈΡ Π³ΠΈΠ΄ΡΠ°Π·ΠΈΠ½Π³ΠΈΠ΄ΡΠ°ΡΠΎΠΌ. ΠΠΏΠ΅ΡΠ²ΡΠ΅ Π±ΡΠ» ΡΠΈΠ½ΡΠ΅Π·ΠΈΡΠΎΠ²Π°Π½
1-ΠΈΠ·ΠΎΠΏΡΠΎΠΏΠΈΠ»-3-Ξ±-Π½Π°ΡΡΠΈΠ»-5-ΠΌΠ΅ΡΠΎΠΊΡΠΈΠΌΠ΅ΡΠΈΠ»-4-Π°ΠΌΠΈΠ½ΠΎΠΏΠΈΡΠ°Π·ΠΎΠ», ΠΊΠΎΡΠΎΡΡΠΉ Π·Π°ΡΠ΅ΠΌ ΡΡΠ»ΡΡΠΎΠ½ΠΈΠ»ΠΈΡΠΎΠ²Π°Π»ΠΈ
ΠΏ-Π°ΡΠ΅ΡΠ°ΠΌΠΈΠ΄ΠΎΠ±Π΅Π½Π·ΠΎΠ»ΡΡΠ»ΡΡΠΎΡ
Π»ΠΎΡΠΈΠ΄ΠΎΠΌ ΠΈ ΠΏ-ΡΠΎΠ»ΡΠΎΠ»ΡΡΠ»ΡΡΠΎΡ
Π»ΠΎΡΠΈΠ΄ΠΎΠΌ. Π ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠ΅ ΠΏΠΎΠ»ΡΡΠ΅Π½Ρ
ΡΠ°Π½Π΅Π΅ Π½Π΅ΠΈΠ·Π²Π΅ΡΡΠ½ΡΠ΅ ΡΡΠ»ΡΡΠΎΠ½ΠΈΠ»ΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄Π½ΡΠ΅ N-Π°Π»ΠΊΠΈΠ»ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Π°ΠΌΠΈΠ½ΠΎΠΏΠΈΡΠ°Π·ΠΎΠ»ΠΎΠ².
Π‘ΠΎΡΡΠ°Π² ΠΈ ΡΡΡΠΎΠ΅Π½ΠΈΠ΅ ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠΆΠ΄Π΅Π½Ρ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΠΌΠΈ ΠΌΠ΅ΡΠΎΠ΄Π°ΠΌΠΈ Π°Π½Π°Π»ΠΈΠ·Π°, ΡΠ°ΠΊΠΈΠΌΠΈ ΠΊΠ°ΠΊ ΠΠ-, Π―ΠΠ
1Π-ΡΠΏΠ΅ΠΊΡΡΠΎΡΠΊΠΎΠΏΠΈΡ ΠΈ ΠΌΠ°ΡΡ-ΡΠΏΠ΅ΠΊΡΡΠΎΠΌΠ΅ΡΡΠΈΡFor the previously obtained 1-isopropyl-3-Ξ±-naphthyl-5-methoxymethyl-4-nitrosopyrazole, a reduction
reaction with hydrazine hydrate was performed. It was first synthesized by 1-isopropyl-3-Ξ±-naphthyl-
5-methoxymethyl-4-aminopyrazole which was then sulfonylated by p-acetamidobenzenesulfonyl
chloride and p-toluenesulfonic chloride. As a result previously unknown sulfonylated derivatives of
N-alkylated aminopyrazoles were obtained. The composition and structure are confirmed by modern
methods of analysis such as IR, 1H NMR spectroscopy and mass spectrometr
S3M: Siamese Stack (Trace) Similarity Measure
Automatic crash reporting systems have become a de-facto standard in software
development. These systems monitor target software, and if a crash occurs they
send details to a backend application. Later on, these reports are aggregated
and used in the development process to 1) understand whether it is a new or an
existing issue, 2) assign these bugs to appropriate developers, and 3) gain a
general overview of the application's bug landscape. The efficiency of report
aggregation and subsequent operations heavily depends on the quality of the
report similarity metric. However, a distinctive feature of this kind of report
is that no textual input from the user (i.e., bug description) is available: it
contains only stack trace information.
In this paper, we present S3M ("extreme") -- the first approach to computing
stack trace similarity based on deep learning. It is based on a siamese
architecture that uses a biLSTM encoder and a fully-connected classifier to
compute similarity. Our experiments demonstrate the superiority of our approach
over the state-of-the-art on both open-sourced data and a private JetBrains
dataset. Additionally, we review the impact of stack trace trimming on the
quality of the results
All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs
Integrated Development Environments (IDE) are designed to make users more
productive, as well as to make their work more comfortable. To achieve this, a
lot of diverse tools are embedded into IDEs, and the developers of IDEs can
employ anonymous usage logs to collect the data about how they are being used
to improve them. A particularly important component that this can be applied to
is code completion, since improving code completion using statistical learning
techniques is a well-established research area.
In this work, we propose an approach for collecting completion usage logs
from the users in an IDE and using them to train a machine learning based model
for ranking completion candidates. We developed a set of features that describe
completion candidates and their context, and deployed their anonymized
collection in the Early Access Program of IntelliJ-based IDEs. We used the logs
to collect a dataset of code completions from users, and employed it to train a
ranking CatBoost model. Then, we evaluated it in two settings: on a held-out
set of the collected completions and in a separate A/B test on two different
groups of users in the IDE. Our evaluation shows that using a simple ranking
model trained on the past user behavior logs significantly improved code
completion experience. Compared to the default heuristics-based ranking, our
model demonstrated a decrease in the number of typing actions necessary to
perform the completion in the IDE from 2.073 to 1.832.
The approach adheres to privacy requirements and legal constraints, since it
does not require collecting personal information, performing all the necessary
anonymization on the client's side. Importantly, it can be improved
continuously: implementing new features, collecting new data, and evaluating
new models - this way, we have been using it in production since the end of
2020.Comment: 11 pages, 4 figure