23 research outputs found
Finnâs Hotel and the Joycean Canon
Initially, I conduct a stylometric analysis of Dubliners, A Portrait of the Artist as a Young Man, Ulysses, Finnegans Wake, and Finnâs Hotel, using the relative frequencies of the 100 most frequent words in each text to form an authorial signature. In doing so, I hope to demonstrate whether the collection is, from the perspective of style, quite distinct, or alternatively, closely aligned to Finnegans Wake. If style can be considered a determinant of what makes a text, then I believe that the results of such an analysis should be accepted as an indicator of whether Joyce intended Finnâs Hotel to be a standalone publication, or whether the relevant manuscripts are indeed the earliest incarnations of what would eventually come to be Finnegans Wake
The Secret to Popular Chinese Web Novels: A Corpus-Driven Study
What is the secret to writing popular novels? The issue is an intriguing one among researchers from various fields. The goal of this study is to identify the linguistic features of several popular web novels as well as how the textual features found within and the overall tone interact with the genre and themes of each novel. Apart from writing style, non-textual information may also reveal details behind the success of web novels. Since web fiction has become a major industry with top writers making millions of dollars and their stories adapted into published books, determining essential elements of "publishable" novels is of importance. The present study further examines how non-textual information, namely, the number of hits, shares, favorites, and comments, may contribute to several features of the most popular published and unpublished web novels. Findings reveal that keywords, function words, and lexical diversity of a novel are highly related to its genres and writing style while dialogue proportion shows the narration voice of the story. In addition, relatively shorter sentences are found in these novels. The data also reveal that the number of favorites and comments serve as significant predictors for the number of shares and hits of unpublished web novels, respectively; however, the number of hits and shares of published web novels is more unpredictable
Recommended from our members
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach
Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist âapproaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. . . ], their own idiolectâ (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing concern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ân-gram textbitesâ, small textual segments that characterise that authorâs writing, providing DNA-like chunks of identifying material
Translating English verbal collocations into Spanish: On distribution and other relevant differences related to diatopic variation
Language varieties should be taken into account in order to enhance fluency and naturalness of translated texts. In this paper we will examine the collocational verbal range for prima-facie translation equivalents of words like decision and dilemma, which in both languages denote the act or process of reaching a resolution after consideration, resolving a question or deciding something. We will be mainly concerned with diatopic variation in Spanish. To this end, we set out to develop a giga-token corpus-based protocol which includes a detailed and reproducible methodology sufficient to detect collocational peculiarities of transnational languages. To our knowledge, this is one of the first observational studies of this kind. The paper is organised as follows. SectionâŻ1 introduces some basic issues about the translation of collocations against the background of languagesâ anisomorphism. SectionâŻ2 provides a feature characterisation of collocations. SectionâŻ3 deals with the choice of corpora, corpus tools, nodes and patterns. SectionâŻ4 covers the automatic retrieval of the selected verb + noun (object) collocations in general Spanish and the co-existing national varieties. Special attention is paid to comparative results in terms of similarities and mismatches. SectionâŻ5 presents conclusions and outlines avenues of further research.Published versio
The Portrait of Dorian Gray: A corpus-based analysis of translated verb + noun (object) collocations in Peninsular and Colombian Spanish
This is an accepted manuscript of an article published by Springer in In: Corpas Pastor G., Mitkov R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019 on 18/09/2019, available online: https://doi.org/10.1007/978-3-030-30135-4_30
The accepted version of the publication may differ from the final published version.Corpus-based Translation Studies have promoted research on the features of translated language, by focusing on the process and product of translation, from a descriptive perspective. Some of these features have been proposed by Toury [31] under the term of laws of translation, namely the law of growing standardisation and the law of interference. The law of standardisation appears to be particularly at play in diatopy, and more specifically in the case of transnational languages (e.g. English, Spanish, French, German). In fact, some studies have revealed the tendency to standardise the diatopic varieties of Spanish in translated language [8, 9, 11, 12]. This paper focuses on verb + noun (object) collocations of Spanish translations of The Portrait of Dorian Gray by Oscar Wilde. Two different varieties have been chosen (Peninsular and Colombian Spanish). Our main aim is to establish whether the Colombian Spanish translation actually matches the variety spoken in Colombia or it is closer to general or standard Spanish. For this purpose, the techniques used to translate this type of collocations in both Spanish translations will be analysed. Furthermore, the diatopic distribution of these collocations will be studied by means of large corpora.Published versio
Atribuição de autoria em micro-mensagens
Orientadores: Ariadne Maria Brito Rizzoni Carvalho, Anderson de Rezende RochaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de MatemĂĄtica EstatĂstica e Computação CientĂficaResumo: Com o crescimento continuo do uso de midias sociais, a atribuição de autoria tem um papel imortante na prevenção dos crimes cibernĂ©ticos e na anĂĄlise de rastros online deixados por assediadores, \textit{bullies}, ladrĂ”es de identidade entre outros. Nesta dissertação, nĂłs propusemos um mĂ©todo para atribuição de autoria que Ă© de cem a mil vezes mais rĂĄpido que o estado da arte. NĂłs tambĂ©m obtivemos uma acurĂĄcia 65\% na classificação de 50 autores. O mĂ©todo proposto se baseia numa representação de caracteristicas escalĂĄvel utilizando os padrĂ”es das mensagens dos micro-blogs, e tambĂ©m nos utilizamos de um classificador de padrĂ”es customizado para lidar com grandes quantidades de dados e alta dimensionalidade. Por fim, nĂłs discutimos a redução do espaço de busca na anĂĄlise de centenas de suspeitos online e milĂ”es de micro mensagens online, o que torna essa abordagem valiosa para forense digital e aplicação das leisAbstract: With the ever-growing use of social media, authorship attribution plays an important role in avoiding cybercrime, and helping the analysis of online trails left behind by cyber pranks, stalkers, bullies, identity thieves and alike. In this dissertation, we propose a method for authorship attribution in micro blogs with efficiency one hundred to a thousand times faster than state-of-the-art counterparts. We also achieved a accuracy of 65% when classifying texts from 50 authors. The method relies on a powerful and scalable feature representation approach taking advantage of user patterns on micro-blog messages, and also on a custom-tailored pattern classifier adapted to deal with big data and high-dimensional data. Finally, we discuss search space reduction when analysing hundreds of online suspects and millions of online micro messages, which makes this approach invaluable for digital forensics and law enforcementMestradoCiĂȘncia da ComputaçãoMestre em CiĂȘncia da Computaçã