9,582 research outputs found

    Stack Overflow in Github: Any Snippets There?

    Full text link
    When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11 page

    Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

    Full text link
    In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

    Revamping question answering with a semantic approach over world knowledge

    Get PDF
    Classic textual question answering (QA) approaches that rely on statistical keyword relevance scoring without exploiting semantic content are useful to a certain extent, but are limited to questions answered by a small text excerpt. With the maturation of Wikipedia and with upcoming projects like DBpedia, we feel that nowadays QA can adopt a deeper, semantic approach to the task, where answers can be inferred using knowledge bases to overcome the limitations of textual QA approaches. In GikiCLEF, a QA-flavoured evaluation task, the best performing systems followed a semantic approach. In this paper, we present our motivations for preferring semantic approaches to QA over textual approaches, with Wikipedia serving as a raw knowledge source

    Is Stack Overflow Overflowing With Questions and Tags

    Full text link
    Programming question and answer (Q & A) websites, such as Quora, Stack Overflow, and Yahoo! Answer etc. helps us to understand the programming concepts easily and quickly in a way that has been tested and applied by many software developers. Stack Overflow is one of the most frequently used programming Q\&A website where the questions and answers posted are presently analyzed manually, which requires a huge amount of time and resource. To save the effort, we present a topic modeling based technique to analyze the words of the original texts to discover the themes that run through them. We also propose a method to automate the process of reviewing the quality of questions on Stack Overflow dataset in order to avoid ballooning the stack overflow with insignificant questions. The proposed method also recommends the appropriate tags for the new post, which averts the creation of unnecessary tags on Stack Overflow.Comment: 11 pages, 7 figures, 3 tables Presented at Third International Symposium on Women in Computing and Informatics (WCI-2015
    corecore