486 research outputs found

    SEWordSim: Software-Specific Word Similarity Database

    Get PDF
    International audienceMeasuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in a software-engineering context, and the software-specific word similarity resources that have been developed rely on data sources containing only a limited range of words and word uses.In recent work, we have proposed a word similarity resource based on information collected automatically from StackOverflow. We have found that the results of this resource are given scores on a 3-point Likert scale that are over 50% higher than the results of a resource based on WordNet. In this demo paper, we review our data collection methodology and propose a Java API to make the resulting word similarity resource useful in practice.The SEWordSim database and related information can be found at http://goo.gl/BVEAs8. Demo video is available at http://goo.gl/dyNwyb

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Exploiting tag information for search and personalization

    Get PDF
    [no abstract

    Semantic Social Network Analysis: A Concrete Case

    Get PDF
    In this chapter we present our approach to analyzing such semantic social networks and capturing collective intelligence from collaborative interactions to challenge requirements of Enterprise 2.0. Our tools and models have been tested on an anonymized dataset from Ipernity.com, one of the biggest French social web sites centered on multimedia sharing. This dataset contains over 60,000 users, around half a million declared relationships of three types, and millions of interactions (messages, comments on resources, etc.). We show that the enriched semantic web framework is particularly well-suited for representing online social networks, for identifying their key features and for predicting their evolution. Organizing huge quantity of socially produced information is necessary for a future acceptance of social applications in corporate contexts

    Social and Semantic Contexts in Tourist Mobile Applications

    Get PDF
    The ongoing growth of the World Wide Web along with the increase possibility of access information through a variety of devices in mobility, has defi nitely changed the way users acquire, create, and personalize information, pushing innovative strategies for annotating and organizing it. In this scenario, Social Annotation Systems have quickly gained a huge popularity, introducing millions of metadata on di fferent Web resources following a bottom-up approach, generating free and democratic mechanisms of classi cation, namely folksonomies. Moving away from hierarchical classi cation schemas, folksonomies represent also a meaningful mean for identifying similarities among users, resources and tags. At any rate, they suff er from several limitations, such as the lack of specialized tools devoted to manage, modify, customize and visualize them as well as the lack of an explicit semantic, making di fficult for users to bene fit from them eff ectively. Despite appealing promises of Semantic Web technologies, which were intended to explicitly formalize the knowledge within a particular domain in a top-down manner, in order to perform intelligent integration and reasoning on it, they are still far from reach their objectives, due to di fficulties in knowledge acquisition and annotation bottleneck. The main contribution of this dissertation consists in modeling a novel conceptual framework that exploits both social and semantic contextual dimensions, focusing on the domain of tourism and cultural heritage. The primary aim of our assessment is to evaluate the overall user satisfaction and the perceived quality in use thanks to two concrete case studies. Firstly, we concentrate our attention on contextual information and navigation, and on authoring tool; secondly, we provide a semantic mapping of tags of the system folksonomy, contrasted and compared to the expert users' classi cation, allowing a bridge between social and semantic knowledge according to its constantly mutual growth. The performed user evaluations analyses results are promising, reporting a high level of agreement on the perceived quality in use of both the applications and of the speci c analyzed features, demonstrating that a social-semantic contextual model improves the general users' satisfactio

    Tag Recommendation in Software Information Sites

    Get PDF
    Abstract—Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers im-prove their performance in software development, maintenance and test processes as software information sites. It is common to see tags in software information sites and many sites allow users to tag various objects with their own words. Users increasingly use tags to describe the most important features of their posted contents or projects. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software in-formation sites. TagCombine has 3 different components: 1. multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2. similarity based rankin
    • …
    corecore