41,868 research outputs found
“Because the computer said so!”: Can computational authorship analysis be trusted?
This study belongs to the domain of authorship analysis (AA), a discipline under the umbrella of forensic linguistics in which writing style is analysed as a means of authorship identification.
Due to advances in natural language processing and machine learning in recent years, interest in computational methods of AA is gaining over traditional stylistic analysis by human experts. It may only be a matter of time before the software will assist, if not replace, a forensic examiner. But can we trust its verdict? The existing computational methods of AA receive critique for the lack of theoretical motivation, black box methodologies and controversial results, and ultimately, many argue that these are unable to deliver viable forensic evidence.
The study replicates a popular algorithm of computational AA in order to open one of the existing black boxes. It takes a closer look at the so-called “bag-of-words” (BoW) approach – a word distributions method used in the majority of AA models, evaluates the parameters that the algorithm bases its conclusions on and offers detailed linguistic explanations for the statistical results these discriminators produce.
The framework behind the design of this study draws on multidimensional analysis – a multivariate analytical approach to linguistic variation. By building on the theory of systemic functional linguistics and variationist sociolinguistics, the study takes steps toward solving the existing problem of the theoretical validity of computational AA
An Abstract Machine for Unification Grammars
This work describes the design and implementation of an abstract machine,
Amalia, for the linguistic formalism ALE, which is based on typed feature
structures. This formalism is one of the most widely accepted in computational
linguistics and has been used for designing grammars in various linguistic
theories, most notably HPSG. Amalia is composed of data structures and a set of
instructions, augmented by a compiler from the grammatical formalism to the
abstract instructions, and a (portable) interpreter of the abstract
instructions. The effect of each instruction is defined using a low-level
language that can be executed on ordinary hardware.
The advantages of the abstract machine approach are twofold. From a
theoretical point of view, the abstract machine gives a well-defined
operational semantics to the grammatical formalism. This ensures that grammars
specified using our system are endowed with well defined meaning. It enables,
for example, to formally verify the correctness of a compiler for HPSG, given
an independent definition. From a practical point of view, Amalia is the first
system that employs a direct compilation scheme for unification grammars that
are based on typed feature structures. The use of amalia results in a much
improved performance over existing systems.
In order to test the machine on a realistic application, we have developed a
small-scale, HPSG-based grammar for a fragment of the Hebrew language, using
Amalia as the development platform. This is the first application of HPSG to a
Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks,
pst-node, psfig, fullname and a macros fil
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- …