11 research outputs found

    Evaluating prose style transfer with the Bible

    Get PDF
    In the prose style transfer task a system, provided with text input and a target prose style, produces output which preserves the meaning of the input text but alters the style. These systems require parallel data for evaluation of results and usually make use of parallel data for training. Currently, there are few publicly available corpora for this task. In this work, we identify a high-quality source of aligned, stylistically distinct text in different versions of the Bible. We provide a standardized split, into training, development and testing data, of the public domain versions in our corpus. This corpus is highly parallel since many Bible versions are included. Sentences are aligned due to the presence of chapter and verse numbers within all versions of the text. In addition to the corpus, we present the results, as measured by the BLEU and PINC metrics, of several models trained on our data which can serve as baselines for future research. While we present these data as a style transfer corpus, we believe that it is of unmatched quality and may be useful for other natural language tasks as well

    Towards the Automatic Processing of Language Registers: Semi-supervisedly Built Corpus and Classifier for French

    Get PDF
    International audienceLanguage registers are a strongly perceptible characteristic of texts and speeches. However, they are still poorly studied in natural language processing. In this paper, we present a semi-supervised approach which jointly builds a corpus of texts labeled in registers and an associated classifier. This approach relies on a small initial seed of expert data. After massively retrieving web pages, it iteratively alternates the training of an intermediate classifier and the annotation of new texts to augment the labeled corpus. The approach is applied to the casual, neutral, and formal registers, leading to a 750M word corpus and a final neural classifier with an acceptable performance

    Use of single- vs. multi-word verbs in the written discourse of Iranian EFL learners

    Get PDF
    Age is being evermore complained as an impediment to language competency, either given as a pretext or raised as a real challenge, taken for granted by foreign language learners. This study seeks to prod about the verb choices among EFL learners. In so doing, the two completely different radiuses of EFL learners, a group of university students in distance education, with part-time class participation and another from a private language institute in Qom province were recruited and compared on their choices of verbs in respect of single- and multi-word forms put into the written tasks. The results of the rating of the students' assignments showed that adult Iranian EFL learners' written language was deprived of phrasal verbs, even in informal writing assigned the use of informal language were scarcely captured. The study corroborates the former studies for the avoidance and incompetency of EFL learners in the use of phrasal verbs

    Caractérisation de registres de langue par extraction de motifs séquentiels émergents

    Get PDF
    International audienceLanguage registers are the highly perceptible characteristic of written or spoken communication. In this paper we present a methodology to automatically characterize language registers using statistical tool named "emerging sequential patterns". Our approach is presented in two steps : the first one exhibits the relevance of the chosen statistical tool from artificial texts ; the second one shows that the characteristic patterns of the language registers from real data can be extracted by using this statistical tool. Experimental results show the quality of our methodology

    BRAZILIAN DISCUSSION ABOUT COVID-19 LOCKDOWN POLICIES ON TWITTER

    Get PDF
    The COVID-19 pandemic affected all countries worldwide, causing big changes in people's routines due to public policies for disease spreading control. Among the most impacting measures were social distancing policies and lockdown, leading to an intense discussion by the population. To describe this discussion in Brazil, this research applied data science and natural language methods to analyze posts on Twitter. It processed more than 12.9 million tweets between 2020 and 2021, and the results highlighted the main topics discussed by Brazilian Twitter users, such as the ideological-political component. The approach employed in this research proved to help extract valuable information in massive data mass.DOI: 10.36558/rsc.v12i3.790

    Visualising the intellectual and social structures of digital humanities using an invisible college model

    Get PDF
    This thesis explores the intellectual and social structures of an emerging field, Digital Humanities (DH). After around 70 years of development, DH claims to differentiate itself from the traditional Humanities for its inclusiveness, diversity, and collaboration. However, the ‘big tent’ concept not only limits our understandings of its research structure, but also results in a lack of empirical review and sustainable support. Under this umbrella, whether there are merely fragmented topics, or a consolidated knowledge system is still unknown. This study seeks to answer three research questions: a) Subject: What research topics is the DH subject composed of? b) Scholar: Who has contributed to the development of DH? c) Environment: How diverse are the backgrounds of DH scholars? The Invisible College research model is refined and applied as the methodological framework that produces four visualised networks. As the results show, DH currently contributes more towards the general historical literacy and information science, while longitudinally, it was heavily involved in computational linguistics. Humanistic topics are more popular and central, while technical topics are relatively peripheral and have stronger connections with non-Anglophone communities. DH social networks are at the early stages of development, and the formation is heavily influenced by non-academic and non-intellectual factors, e.g., language, working country, and informal relationships. Although male scholars have dominated the field, female scholars have encouraged more communication and built more collaborations. Despite the growing appeals for more diversity, the level of international collaboration in DH is more extensive than in many other disciplines. These findings can help us gain new understandings on the central and critical questions about DH. To the best of the candidate’s knowledge, this study is the first to investigate the formal and informal structures in DH with a well-grounded research model

    Writing in the workplace: Variation in the writing practices and formality of eight multinational companies in Greece

    Get PDF
    Workplace writing is a high stakes activity. It constitutes a permanent record of a company’s transactions and this has implications for both the employees involved in the production of documents and also for the company as a whole. Workplace writing is dynamic, and processes and practices vary between teams, departments, companies and industries. In this context, the study is concerned with workplace writing practices in eight multinational companies situated in Greece. The thesis is structured in two parts: the first part aims to explore the writing practices in the participant organisations focusing on factors behind inter- and intra- company variation. The discussion draws on the analysis of questionnaire and interview data. The second part takes a micro perspective and focuses on one genre, that of the business email. The analysis reports on a sample of naturally occurring emails from three participant companies. As the business email tends to be perceived as an informal genre, special attention is paid to the notion of formality, which has not been systematically discussed and defined in this context. The findings show that writing practices vary according to company size, employees’ hierarchical level and years of experience. Business email emerges as the most frequent genre, which serves a range of functions in different contexts. Dynamic continua of writing practices ranging from ‘formal to informal’ and ‘transactional to relational’ are mobilised as employees reflect on their use of email at work and this is aligned with the findings of the linguistic analysis. The data also indicate the impact of the globalised socioeconomic activity on employees’ practices in modern organisations. The participants in this study operate at the interface of different languages and practices, which cut across national and professional boundaries. The complex choices they make in different contexts have implications for language training and specifically the teaching of writing in academic contexts

    Identifying Stylometric Correlates of Social Power

    Get PDF
    This thesis takes a stylometric approach to the measurement of social power, particularly hierarchical power in an organisational setting. Following the social constructionist view of identity, we infer that construction of identity is an ongoing process incorporating the full scope of human behaviour, including linguistic behaviour. We test the primary hypothesis that stylistic choice in language is indicative of power relations, and that a stylometric signal can be extracted from natural language to enable prediction of relationship status. Additionally, we consider the effect of individual variation versus interpersonal variation, and the effects of aggregating predictions to boost the predictive strength of the model. Three different datasets are used to validate the proposed approach across three different genres: email, spoken conversation, and online chat. We also present a vector space approach to modelling linguistic style accommodation, and undertake a preliminary examination of the correlation between linguistic accommodation and social power

    Metafictional anaphora:A comparison of different accounts

    Get PDF

    Metafictional anaphora:A comparison of different accounts

    Get PDF
    I argue that pronominal anaphora across mixed parafictional/ metafictional discourse (e.g. In The Lord of the Rings, Frodoi goes through an immense mental struggle. Hei is an intriguing fictional character! ) poses a problem for a workspace account. I evaluate different possible solutions based on a descriptivist approach, Zalta's logic of abstract objects and Recanati's dot-object theory
    corecore