83,676 research outputs found
Dublin City University at QA@CLEF 2008
We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages respectively. The system was built using UIMA (Unstructured Information Management Architecture) as underlying framework
Finding Relevant Answers in Software Forums
AbstractāOnline software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin
From Query to Usable Code: An Analysis of Stack Overflow Code Snippets
Enriched by natural language texts, Stack Overflow code snippets are an
invaluable code-centric knowledge base of small units of source code. Besides
being useful for software developers, these annotated snippets can potentially
serve as the basis for automated tools that provide working code solutions to
specific natural language queries.
With the goal of developing automated tools with the Stack Overflow snippets
and surrounding text, this paper investigates the following questions: (1) How
usable are the Stack Overflow code snippets? and (2) When using text search
engines for matching on the natural language questions and answers around the
snippets, what percentage of the top results contain usable code snippets?
A total of 3M code snippets are analyzed across four languages: C\#, Java,
JavaScript, and Python. Python and JavaScript proved to be the languages for
which the most code snippets are usable. Conversely, Java and C\# proved to be
the languages with the lowest usability rate. Further qualitative analysis on
usable Python snippets shows the characteristics of the answers that solve the
original question. Finally, we use Google search to investigate the alignment
of usability and the natural language annotations around code snippets, and
explore how to make snippets in Stack Overflow an adequate base for future
automatic program generation.Comment: 13th IEEE/ACM International Conference on Mining Software
Repositories, 11 page
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
- ā¦