2,795 research outputs found
Textual Stylistic Variation: Choices, Genres and Individuals
This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections. Much as operationalized relevance proved a useful goal to strive for in information retrieval, research in textual stylistics, whether application oriented or philologically inclined, needs goals formulated in terms of pertinence, relevance, and utility — notions that agree with reader ex- perience of text. Differences readers are aware of are mostly based on utility — not on textual characteristics per se. Mostly, readers report stylistic differences in terms of genres. Genres, while vague and undefined, are well-established and talked about: very early on, readers learn to distinguish genres. This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice
Linguistics in the Study and Teaching of Literature
Literary texts include linguistic form, as well as specialized literary forms (some of which also involve language). Linguistics can offer to literary studies an understanding of these kinds of form, and the ways by which a text is used to communicate meaning. In order to cope with the great variety of creative uses of language in literature, linguistics must acknowledge that some texts are assigned structure by non-linguistic means, but the boundaries between linguistic and non-linguistic explanations for literary language are not clearly drawn. The article concludes with discussion of what kinds and level of linguistics might usefully be taught in a literature classroom, and offers practical suggestions for the application of linguistics to literature teaching
Computing the Affective-Aesthetic Potential of Literary Texts
In this paper, we compute the affective-aesthetic potential (AAP) of literary texts by using a simple sentiment analysis tool called SentiArt. In contrast to other established tools, SentiArt is based on publicly available vector space models (VSMs) and requires no emotional dictionary, thus making it applicable in any language for which VSMs have been made available (>150 so far) and avoiding issues of low coverage. In a first study, the AAP values of all words of a widely used lexical databank for German were computed and the VSM’s ability in representing concrete and more abstract semantic concepts was demonstrated. In a second study, SentiArt was used to predict ~2800 human word valence ratings and shown to have a high predictive accuracy (R2 > 0.5, p < 0.0001). A third study tested the validity of SentiArt in predicting emotional states over (narrative) time using human liking ratings from reading a story. Again, the predictive accuracy was highly significant: R2adj = 0.46, p < 0.0001, establishing the SentiArt tool as a promising candidate for lexical sentiment analyses at both the micro- and macrolevels, i.e., short and long literary materials. Possibilities and limitations of lexical VSM-based sentiment analyses of diverse complex literary texts are discussed in the light of these results
O uniwersaliach tłumaczeniowych w wybranych współczesnych polskich tłumaczeniach literackich
Niniejsze badanie o charakterze pilotażowym dotyczy wykorzystania wybranych metod badawczych językoznawstwa korpusowego i stylistyki komputerowej w analizie uniwersaliów tłumaczeniowych na materiale wybranych współczesnych polskich tłumaczeń literackich. Mówiąc ściślej, badanie dotyczy wybranych uniwersaliów typu T (za Chestermanem 2004), które nazywam uniwersaliami tłumaczeniowymi wewnątrz-językowymi (Grabowski 2011), takich jak kluczowe wzorce leksykalne (corepatterns of lexicaluse; Laviosa 2002) oraz hipoteza dotycząca konwergencji (levelling-out; Baker 1996). W celu przeprowadzenia niniejszego badania opracowano dwa specjalne korpusy badawcze (z 500 000 wyrazów tekstowych w każdym) obejmujące wybrane współczesne polskie powieści oraz wybrane współczesne tłumaczenia literackie z języka angielskiego na język polski. Wyniki badania wykazały, że jako całość teksty tłumaczone są bardziej zróżnicowane leksykalnie od tekstów nietłumaczonych, ale też cechują się większą liczbą powtórzeń i mniejszym zróżnicowaniem leksykalnym jeśli idzie o wyrazy o wysokiej frekwencji w tekście. Z drugiej strony badanie wykazało, że teksty nietłumaczone cechują się większym bogactwem leksykalnym w zakresie wyrazów o niskiej frekwencji w tekście, gdzie z reguły można znaleźć słownictwo kreatywne i odautorskie. Metody wielowymiarowe (analiza głównych składowych, analiza skupień) potwierdziła hipotezę dotyczącą konwergencji, zgodnie z którą można zaobserwować większe podobieństwo między tekstami tłumaczonymi niż między tekstami tłumaczonymi a oryginałami napisanymi w tym samym języku.This pilot study attempts to examine the potential of selected corpus linguistics and computational stylistics methods in the investigation of translation universals in translational literary Polish. More specifically, the study deals with T-universals (after Chesterman 2004), which are also referred to as intralingual translation universals (Grabowski 2011), with emphasis on core patterns of lexical use, as proposed by Laviosa (1998, 2002), and the leveling-out hypothesis, as proposed by Baker (1996). To that end, the custom-designed corpora,with approximately 500,000 tokens each, of contemporary translational and non-translational literary Polish were compiled. The results of the study reveal that on the whole translated texts are more varied lexically and have more repetitions and lower lexical variety among top-frequency words than non-translated Polish texts. On the other hand, the study shows that non-translational texts have higher lexical variety among bottom-frequency words, where usually one can find author-specific and creative vocabulary. The results of multivariate methods (Principal Components Analysis and Cluster Analysis) confirm the leveling-out hypothesis that translations are more alike as compared with native texts
ATMS-Based architecture for stylistics-aware text generation
This thesis is concerned with the effect of surface stylistic constraints (SSC) on syntactic
and lexical choice within a unified generation architecture. Despite the fact that these
issues have been investigated by researchers in the field, little work has been done with
regard to system architectures that allow surface form constraints to influence earlier
linguistic or even semantic decisions made throughout the NLG process. By SSC we
mean those stylistic requirements that are known beforehand but cannot be tested
until after the utterance or — in some lucky cases — until a proper linearised part
of it has been generated. These include collocational constraints, text size limits, and
poetic aspects such as rhyme and metre to name a few.
This thesis introduces a new NLG architecture that can be sensitive to surface stylistic
requirements. It brings together a well-founded linguistic theory that has been used
in many successful NLG systems (Systemic Functional Linguistics, SFL) and an exist¬
ing AI search mechanism (the Assumption-based Truth Maintenance System, ATMS)
which caches important search information and avoids work duplication.
To this end, the thesis explores the logical relation between the grammar formalism and
the search technique. It designs, based on that logical connection, an algorithm for the
automatic translation of systemic grammar networks to ATMS dependency networks.
The generator then uses the translated networks to generate natural language texts
with a high paraphrasing power as a direct result of its ability to pursue multiple paths
simultaneously. The thesis approaches the crucial notion of choice differently to previ¬
ous systems using SFL. It relaxes the choice process in that choosers are not obliged to
deterministically choose a single alternative allowing SSC to influence the final lexical
and syntactic decisions. The thesis also develops a situation-action framework for the
specification of stylistic requirements independently of the micro-semantic input. The
user or application can state what surface requirements they wish to impose and the
ATMS-based generator then attempts to satisfy these constraints.
Finally, a prototype ATMS-based generation system embodying the ideas presented in
this thesis is implemented and evaluated. We examine the system's stylistic sensitivity
by testing it on three different sets of stylistic requirements, namely: collocational,
size, and poetic constraints
Building a resource for studying translation shifts
This paper describes an interdisciplinary approach which brings together the
fields of corpus linguistics and translation studies. It presents ongoing work
on the creation of a corpus resource in which translation shifts are explicitly
annotated. Translation shifts denote departures from formal correspondence
between source and target text, i.e. deviations that have occurred during the
translation process. A resource in which such shifts are annotated in a
systematic way will make it possible to study those phenomena that need to be
addressed if machine translation output is to resemble human translation. The
resource described in this paper contains English source texts (parliamentary
proceedings) and their German translations. The shift annotation is based on
predicate-argument structures and proceeds in two steps: first, predicates and
their arguments are annotated monolingually in a straightforward manner. Then,
the corresponding English and German predicates and arguments are aligned with
each other. Whenever a shift - mainly grammatical or semantic -has occurred,
the alignment is tagged accordingly.Comment: 6 pages, 1 figur
- …