Search CORE

7 research outputs found

TraceSim: An alignment method for computing stack trace similarity

Author: Aloise Daniel
Chernishev George
Fernandes Eraldo Rezende
Khvorov Aleksandr
Koznov Dmitrij
Luciv Dmitry
Muller Rodrigues Irving
Povarov Nikita
Vasiliev Roman
Publication venue: Springer
Publication date: 01/03/2022
Field of study

ABSTRACT: Software systems can automatically submit crash reports to a repository for investigation when program failures occur. A significant portion of these crash reports are duplicate, i.e., they are caused by the same software issue. Therefore, if the volume of submitted reports is very large, automatic grouping of duplicate crash reports can significantly ease and speed up analysis of software failures. This task is known as crash report deduplication. Given a huge volume of incoming reports, increasing quality of deduplication is an important task. The majority of studies address it via information retrieval or sequence matching methods based on the similarity of stack traces from two crash reports. While information retrieval methods disregard the position of a frame in a stack trace, the existing works based on sequence matching algorithms do not fully consider subroutine global frequency and unmatched frames. Besides, due to data distribution differences among software projects, parameters that are learned using machine learning algorithms are necessary to provide more flexibility to the methods. In this paper, we propose TraceSim – an approach for crash report deduplication which combines TF-IDF, optimum global alignment, and machine learning (ML) in a novel way. Moreover, we propose a new evaluation methodology for this task that is more comprehensive and robust than previously used evaluation approaches. TraceSim significantly outperforms seven baselines and state-of-the-art methods in the majority of the scenarios. It is the only approach that achieves competitive results on all datasets regarding all considered metrics. Moreover, we conduct an extensive ablation study that demonstrates the importance of each TraceSim’s element to its final performance and robustness. Finally, we provide the source code for all considered methods and evaluation methodology as well as the created datasets

PolyPublie

New Sulfamides Based on 1-Izopropil-3-α-Naftyl-5- Methoxymethyl-4-Aminopyrazole and Determination of Their Structure

Author: Lubyashkin Alexey V.
Neupokoeva Ekaterina V.
Peterson Ivan V.
Povarov Ilya G.
Shilenkov Nikita A.
Suboch Georgy A.
Tovbis Mikhail S.
Любяшкин А.В.
Неупокоева Е.В.
Петерсон И.В.
Поваров И.Г.
Субоч Г.А.
Товбис М.С.
Шиленков Н.А.
Publication venue: 'Siberian Federal University'
Publication date: 01/09/2019
Field of study

Для ранее полученного 1-изопропил-3-α-нафтил-5-метоксиметил-4-нитрозопиразола проведена реакция восстановления гидразингидратом. Впервые был синтезирован 1-изопропил-3-α-нафтил-5-метоксиметил-4-аминопиразол, который затем сульфонилировали п-ацетамидобензолсульфохлоридом и п-толуолсульфохлоридом. В результате получены ранее неизвестные сульфонилированные производные N-алкилированных аминопиразолов. Состав и строение подтверждены современными методами анализа, такими как ИК-, ЯМР 1Н-спектроскопия и масс-спектрометрияFor the previously obtained 1-isopropyl-3-α-naphthyl-5-methoxymethyl-4-nitrosopyrazole, a reduction reaction with hydrazine hydrate was performed. It was first synthesized by 1-isopropyl-3-α-naphthyl- 5-methoxymethyl-4-aminopyrazole which was then sulfonylated by p-acetamidobenzenesulfonyl chloride and p-toluenesulfonic chloride. As a result previously unknown sulfonylated derivatives of N-alkylated aminopyrazoles were obtained. The composition and structure are confirmed by modern methods of analysis such as IR, 1H NMR spectroscopy and mass spectrometr

Siberian Federal University Digital Repository

S3M: Siamese Stack (Trace) Similarity Measure

Author: Chernishev George
Khvorov Aleksander
Koznov Dmitrij
Muller Rodrigues Irving
Povarov Nikita
Vasiliev Roman
Publication venue: IEEE
Publication date: 01/01/2021
Field of study

Automatic crash reporting systems have become a de-facto standard in software development. These systems monitor target software, and if a crash occurs they send details to a backend application. Later on, these reports are aggregated and used in the development process to 1) understand whether it is a new or an existing issue, 2) assign these bugs to appropriate developers, and 3) gain a general overview of the application's bug landscape. The efficiency of report aggregation and subsequent operations heavily depends on the quality of the report similarity metric. However, a distinctive feature of this kind of report is that no textual input from the user (i.e., bug description) is available: it contains only stack trace information. In this paper, we present S3M ("extreme") -- the first approach to computing stack trace similarity based on deep learning. It is based on a siamese architecture that uses a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset. Additionally, we review the impact of stack trace trimming on the quality of the results

arXiv.org e-Print Archive

PolyPublie

All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs

Author: Bezzubov Alexander
Bibaev Vitaliy
Bryksin Timofey
Golubev Yaroslav
Kalina Alexey
Lomshakov Vadim
Povarov Nikita
Publication venue
Publication date: 21/05/2022
Field of study

Integrated Development Environments (IDE) are designed to make users more productive, as well as to make their work more comfortable. To achieve this, a lot of diverse tools are embedded into IDEs, and the developers of IDEs can employ anonymous usage logs to collect the data about how they are being used to improve them. A particularly important component that this can be applied to is code completion, since improving code completion using statistical learning techniques is a well-established research area. In this work, we propose an approach for collecting completion usage logs from the users in an IDE and using them to train a machine learning based model for ranking completion candidates. We developed a set of features that describe completion candidates and their context, and deployed their anonymized collection in the Early Access Program of IntelliJ-based IDEs. We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model. Then, we evaluated it in two settings: on a held-out set of the collected completions and in a separate A/B test on two different groups of users in the IDE. Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience. Compared to the default heuristics-based ranking, our model demonstrated a decrease in the number of typing actions necessary to perform the completion in the IDE from 2.073 to 1.832. The approach adheres to privacy requirements and legal constraints, since it does not require collecting personal information, performing all the necessary anonymization on the client's side. Importantly, it can be improved continuously: implementing new features, collecting new data, and evaluating new models - this way, we have been using it in production since the end of 2020.Comment: 11 pages, 4 figure

arXiv.org e-Print Archive