2 research outputs found
Π’Π΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΡΡΡΡΠΊΠΎΡΠ·ΡΡΠ½ΡΡ ΡΠ΅ΠΊΡΡΠΎΠ² Ρ ΠΎΠΏΠΎΡΠΎΠΉ Π½Π° Π»Π΅ΠΌΠΌΡ ΠΈ Π»Π΅ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΊΠΎΠ½ΡΡΡΡΠΊΡΠΈΠΈ
ΠΠ°Π½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ° ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π° ΡΡΠΎΠ²Π΅ΡΡΠ΅Π½ΡΡΠ²ΠΎΠ²Π°Π½ΠΈΡ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠ½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ, Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½Π½ΡΡ
Π½Π° Π²ΡΡΠ²Π»Π΅Π½ΠΈΠ΅ ΡΠΊΡΡΡΡΡ
Π²Π·Π°ΠΈΠΌΠΎΡΠ²ΡΠ·Π΅ΠΉ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ»ΠΎΠ²Π°ΠΌΠΈ, Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠ°ΠΌΠΈ ΠΈ ΡΠ΅ΠΌΠ°ΠΌΠΈ Π² ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ
ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΡΡ
. Π Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²Π΅ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΡΠ΅ΠΌΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Ρ ΠΈΡΠΊΠ»ΡΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΡΠ½ΠΈΠ³ΡΠ°ΠΌΠΌΠ°ΠΌΠΈ, ΡΡΠΎ Π² Π½Π΅ΠΊΠΎΡΠΎΡΡΡ
ΡΠ»ΡΡΠ°ΡΡ
Π²Π»Π΅ΡΠ΅Ρ Π·Π° ΡΠΎΠ±ΠΎΠΉ ΡΡ
ΡΠ΄ΡΠ΅Π½ΠΈΠ΅ ΡΠΎΡΠ½ΠΎΡΡΠΈ ΠΈ ΠΏΠΎΠ²ΡΡΠ°Π΅Ρ ΡΠ»ΠΎΠΆΠ½ΠΎΡΡΡ ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠ΅Π»ΡΠ½ΠΎΠΉ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠ°ΡΠΈΠΈ Π²ΡΠ΄Π΅Π»ΡΠ΅ΠΌΡΡ
ΡΠ΅ΠΌ. ΠΠ°ΠΌΠΈ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½ Π½ΠΎΠ²ΡΠΉ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΌΠ΅ΡΠΎΠ΄Π° LDA, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡΠΈΠΉ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈ Π²ΡΠ΄Π΅Π»ΡΡΡ Π² ΠΊΠΎΡΠΏΡΡΠ΅ ΡΠ»ΠΎΠ²ΠΎΡΠΎΡΠ΅ΡΠ°Π½ΠΈΡ, ΡΠΎΡΡΠΎΡΡΠΈΠ΅ ΠΈΠ· Π΄Π²ΡΡ
ΡΠ»ΠΎΠ², ΠΈ Π΄ΠΎΠ±Π°Π²Π»ΡΡΡ ΠΈΡ
Π² ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ. Π ΠΏΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΡΠ°ΡΡΠΈ Π΄Π°Π½Π½ΠΎΠ³ΠΎ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΠΎΠΏΠΈΡΠ°Π½Π° ΡΠ°Π±ΠΎΡΠ° Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΈ ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½Ρ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΡ Π΅Π³ΠΎ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ Π² Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ΅ Π΄Π²ΡΡ
ΠΊΠΎΡΠΏΡΡΠΎΠ² ΡΡΡΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ°: ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠΎΠ² ΠΏΠΎ ΡΠ°Π΄ΠΈΠΎΡΠ»Π΅ΠΊΡΡΠΎΠ½ΠΈΠΊΠ΅, ΡΠ°ΠΊΠ΅ΡΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΠΈ ΡΠ΅Ρ
Π½ΠΈΠΊΠ΅ ΠΈ ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΡΠ΅ΡΠΊΡΡ ΡΠ΅ΠΌΠ°ΡΠΈΠΊΡ.The graduation qualification paper is devoted to the improvement of topic modelling algorithms aimed at extraction of latent relations between words, documents and topics in processed corpora. In the majority of cases topics generated by topic models contain only unigrams, so that the interpretation of extracted topics turns out to be a complicated task. This paper presents a new algorithm based on the classic LDA model which provides automatic extraction of bigrams in the given text collection and further incorporation of bigrams into the topic model. In the second part of paper at hand we describe our algorithm in action and discuss results achieved in course of processing the Russian corpora on radioengineering and linguistics
Empirical Software Engineering Automated Topic Naming: Supporting Cross-project Analysis of Software Maintenance Activities--Manuscript Draft-- Manuscript Number: Article Type: Keywords: Corresponding Author: First Author: Order of Authors:
Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semiunsupervised machine-learning algorithms, such as Latent Dirichlet Allocation (LDA), used for concept and topic analysis to suggest candidate word-lists or topics that describe and relate software artifacts. However, these word-lists and topics are difficult to interpret in the absence of meaningful summary labels. Current topic modeling techniques assume manual labelling and do not use domain-specific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using LDA from commit-log comments recovered from source control systems. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on three large-scale Relational Database Management System (RDBMS) projects: MySQL, PostgreSQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels that are relevant to these projects, and provid