6,266 research outputs found

    On the Influence of Latent Semantic Analysis Parameterization for Bug Localization

    Get PDF
    The bug localization problem has benefited from modern information retrieval techniques, such as Latent Semantic Analysis. There are many factors that influence the quality of results of this approach, such as, stop-words, term-documentmatrix transformations, dimensionality reduction and filtering criteria of the corpus. In this paper, we study the effect of different combinations for these factors on the impact of the accuracy of the query results in the proposed technique for bug localization. Bugs of three real-world software systems were analyzed with different combinations of input parameters for the LSA technique. Our results suggest that the term-document matrix transformations and filtering criteria of the corpus have major influence in the quality of the result and that the combination of adequate individual parameter values does not necessarily produce the best combination. Furthermore, some general guidance for parameterization of the LSA technique for bug localization could also besuggested from the observed results

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    TOOL SUPPORT FOR CAPTURING THE ESSENCE OF A CONCERN IN SOURCE CODE

    Get PDF
    Software evolves constantly to adapt to changing user needs. As it evolves, it becomes progressively harder to understand due to accumulation of code changes, increasing code size, and the introduction of complex code dependencies. As a result, it becomes harder to maintain, exposing the software to potential bugs and degradation of code quality. High maintenance costs and diminished opportunities for software reusability and portability lead to reduced return on investment, increasing the likelihood of the software product being discarded or replaced. Nevertheless, we believe that there is value in legacy software due to the amount of intellectual efforts that have been invested in it. To extend its value, we utilize the common practice of identifying the pieces of code relevant to a given concern. Identifying relevant code is a manual process and relies on domain and code expertise. This makes it difficult to scale to large and complex code. In this thesis, we propose several automated approaches for capturing the essential code that represents a concern of interest. We utilize dynamic program analysis of execution traces to identify a relevant code subset. Information retrieval techniques are then utilized to improve the accuracy of the capture, refine the process, and verify the results

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Supporting feature-level software maintenance

    Get PDF
    Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled
    • …
    corecore