Search CORE

2 research outputs found

Benchmarking Top-K Keyword and Top-K Document Processing with T ${}^2$ K ${}^2$ and T ${}^2$ K ${}^2$ D ${}^2$

Author: Boicea Alexandru
Darmont Jérôme
Radulescu Florin
Truica Ciprian-Octavian
Publication venue: 'Elsevier BV'
Publication date: 20/04/2018
Field of study

Top-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T

{}^2

{}^2

, a top-k keywords and documents benchmark, and its decision support-oriented evolution T

{}^2

{}^2

{}^2

. Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our bench-marks' relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

Graph-Based Keyphrase Extraction for Software Traceability in Source Code and Documentation Mapping

Author: Swapnil Shinde Vishnu Suryawanshi, Varsha Jadhav, Nakul Sharma, Mandar Diwakar,
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/10/2023
Field of study

Natural Language Processing (NLP) forms the basis of several computational tasks. However,  when applied to the software system’s, NLP provides several irrelevant features and the noise gets mixed up while extracting features. As the scale of software system’s increases,   different   metrics are needed to assess these systems. Diagrammatic and visual representation of the SE projects code forms an essential component of Source Code Analysis (SCA). These SE projects cannot be analyzed by traditional source code analysis methods nor can they be analyzed by traditional diagrammatic representation. Hence, there is a need to modify the traditional approaches in lieu of changing environments to reduce learning gap for the developers and traceability engineers. The traditional approaches fall short in addressing specific metrics in terms of document similarity and graph dependency approaches. In terms of source code analysis, the graph dependency graph can be used for finding the relevant key-terms and keyphrases as they occur not just intra-document but also inter-document. In this work, a similarity measure based on context is proposed which can be employed to find a traceability link between the source code metrics and API documents present in a package.   Probabilistic graph-based keyphrase extraction approach is used for searching across the different project files.&nbsp

International Journal on Recent and Innovation Trends in Computing and Communication