14,967 research outputs found
Algorithmic Programming Language Identification
Motivated by the amount of code that goes unidentified on the web, we
introduce a practical method for algorithmically identifying the programming
language of source code. Our work is based on supervised learning and
intelligent statistical features. We also explored, but abandoned, a
grammatical approach. In testing, our implementation greatly outperforms that
of an existing tool that relies on a Bayesian classifier. Code is written in
Python and available under an MIT license.Comment: 11 pages. Code:
https://github.com/simon-weber/Programming-Language-Identificatio
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News
The problem of fake news has gained a lot of attention as it is claimed to
have had a significant impact on 2016 US Presidential Elections. Fake news is
not a new problem and its spread in social networks is well-studied. Often an
underlying assumption in fake news discussion is that it is written to look
like real news, fooling the reader who does not check for reliability of the
sources or the arguments in its content. Through a unique study of three data
sets and features that capture the style and the language of articles, we show
that this assumption is not true. Fake news in most cases is more similar to
satire than to real news, leading us to conclude that persuasion in fake news
is achieved through heuristics rather than the strength of arguments. We show
overall title structure and the use of proper nouns in titles are very
significant in differentiating fake from real. This leads us to conclude that
fake news is targeted for audiences who are not likely to read beyond titles
and is aimed at creating mental associations between entities and claims.Comment: Published at The 2nd International Workshop on News and Public
Opinion at ICWS
Teaching University Students to Read and Write
Recent government initiatives have required universities to include specific literacy and numeracy targets for the students. The authors – both members of the English discipline at Charles Sturt University – were invited to develop and run a two-semester program for all students studying to become early childhood, primary, and secondary teachers. This article outlines the nature of the two subjects which comprise the program: the first focused on reading and comprehension, the second on writing and composition. These subjects were conceived from collegial dialogues between academics in education and the humanities, and then developed from these different assumptions and starting points. Over the last five years, the shared experiences of teaching these prospective teachers has grown into a strongly coherent first year of study. This article seeks the describe the experiences of teaching literacy to first-year education students, and it is by turns hypothesising and speculative, reflective and qualitative, in its approach. In the process, this article offers colleagues across the country a reflection on the hypotheses of literacy education, some new ideas for teaching literacy, and some optimism for the future of the teaching profession, and the dignity of those who aspire to be a part of it
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
- …