10 research outputs found

    An application of distributional semantics for the analysis of the Holy Quran

    Get PDF
    In this contribution we illustrate the methodology and the results of an experiment we conducted by applying Distributional Semantics Models to the analysis of the Holy Quran. Our aim was to gather information on the potential differences in meanings that the same words might take on when used in Modern Standard Arabic w.r.t. their usage in the Quran. To do so we used the Penn Arabic Treebank as a contrastive corpu

    Identifying Stylometric Correlates of Social Power

    Get PDF
    This thesis takes a stylometric approach to the measurement of social power, particularly hierarchical power in an organisational setting. Following the social constructionist view of identity, we infer that construction of identity is an ongoing process incorporating the full scope of human behaviour, including linguistic behaviour. We test the primary hypothesis that stylistic choice in language is indicative of power relations, and that a stylometric signal can be extracted from natural language to enable prediction of relationship status. Additionally, we consider the effect of individual variation versus interpersonal variation, and the effects of aggregating predictions to boost the predictive strength of the model. Three different datasets are used to validate the proposed approach across three different genres: email, spoken conversation, and online chat. We also present a vector space approach to modelling linguistic style accommodation, and undertake a preliminary examination of the correlation between linguistic accommodation and social power

    Digital Histories. Emergent Approaches within the New Digital History

    Get PDF
    The article applies computational stylometry to explore medieval authorship. The study consists of the investigation of different versions of an anti-heretical treatise Refutatio errorum, only recently attributed to the inquisitor Petrus Zwicker based on qualitative evidence. The article confirms this attribution and demonstrates that from a textually inconsistent corpus, it is possible to prepare data for computational stylometry with relatively fast and straightforward cleaning. Finally, the article discusses the relationship between qualitative and computational methods in the study of medieval texts. The greatest added value of computational authorship attributions comes from unexpected results, from texts behaving in an anomalous way. In the classifications made for this study, a small work called Attendite a falsis prophetis was for the first time attributed to Petrus Zwicker, a prime example of an unanticipated result. This attribution is within possibilities, but its final corroboration requires codicological study of the preserved manuscripts.</p

    Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails

    Get PDF
    This thesis empirically investigates how a corpus linguistic approach can address the main theoretical and methodological challenges facing the field of forensic authorship analysis. Linguists approach the problem of questioned authorship from the theoretical position that each person has their own distinctive idiolect (Coulthard 2004: 431). However, the notion of idiolect has come under scrutiny in forensic linguistics over recent years for being too abstract to be of practical use (Grant 2010; Turell 2010). At the same time, two competing methodologies have developed in authorship analysis. On the one hand, there are qualitative stylistic approaches, and on the other there are statistical ‘stylometric’ techniques. This study uses a corpus of over 60,000 emails and 2.5 million words written by 176 employees of the former American company Enron to tackle these issues in the contexts of both authorship attribution (identifying authors using linguistic evidence) and author profiling (predicting authors’ social characteristics using linguistic evidence). Analyses reveal that even in shared communicative contexts, and when using very common lexical items, individual Enron employees produce distinctive collocation patterns and lexical co-selections. In turn, these idiolectal elements of linguistic output can be captured and quantified by word n-grams (strings of n words). An attribution experiment is performed using word n-grams to identify the authors of anonymised email samples. Results of the experiment are encouraging, and it is argued that the approach developed here offers a means by which stylistic and statistical techniques can complement each other. Finally, quantitative and qualitative analyses are combined in the sociolinguistic profiling of Enron employees by gender and occupation. Current author profiling research is exclusively statistical in nature. However, the findings here demonstrate that when statistical results are augmented by qualitative evidence, the complex relationship between language use and author identity can be more accurately observed

    Mapping extremist forums using text mining

    Get PDF
    Political opinions far from what is considered normal, a distorted view of reality, and hatred to certain other groups are spread amongst political extremists like Islamists and White Supremacists. Demonstrations and violence performed by some members of these groups are well-known from mass media and get a lot of attention. The Islamists and right-extremists exploit this benefit to spread their message to ordinary people. In online forums, young, curious people can read detailed information (or propaganda) from extremists. Which words do extremists then use to convince each other in addition to other curious readers that what they stand for is right? The goal of this thesis is to first find algorithms or techniques for how to discover characteristic vocabulary in online extremist forums and words that frequently are used in the same forum message. Then we analyse the results to find patterns of what is typical vocabulary in the different forums. Mapping the extremists’ habits of vocabulary usage can help us know better how extremists write in online extremist forums, and possibly also help us recognize them when they write on some other websites. In this thesis, we find frequent and characteristic words by means of Global Term Frequency (GTF) and pairs of co-occurring words by means of odds ratio in different extremist forums. We compare normalized GTF (NGTF) of words in two forums to find out where they are used most. Words used in only one of two forums are found as well. We find the GTFs for words written by five of the ten most active authors in each forum, and we find words that one author writes, while the other of ten most active authors does not write. From results we see that Islamists write most about religion, but also some politics. Some popular words are “allah”, “prophet”, “fasting”, and “hajj”. The right-extreme websites Stormfront and Vigrid discuss politics and argument for their own ideology and against the mainstream politics. Frequent words in Stormfront are “white”, “jews”, and “race”, in the Norwegian Vigrid website “jødene”, “tyskland”, and “krigen”. In the German right-extreme website Deutsche Stimme, “npd”, “Deutschland”, “partei”, and “volk” are frequent words. Both Islamists and right-extremists are preoccupied by family values. Our results are useful for discovering topics that extremists write about in their online forums, topics that other people do not write about at all or write about with a different point of view

    Proceedings of the 42nd Australian Linguistic Society Conference - 2011

    Get PDF
    ANU College of Arts & Social Sciences, School of Language Studies; ANU College of Asia and the Pacific, School of Culture, History and Languag

    Digital Histories

    Get PDF
    Historical scholarship is currently undergoing a digital turn. All historians have experienced this change in one way or another, by writing on word processors, applying quantitative methods on digitalized source materials, or using internet resources and digital tools. Digital Histories showcases this emerging wave of digital history research. It presents work by historians who – on their own or through collaborations with e.g. information technology specialists – have uncovered new, empirical historical knowledge through digital and computational methods. The topics of the volume range from the medieval period to the present day, including various parts of Europe. The chapters apply an exemplary array of methods, such as digital metadata analysis, machine learning, network analysis, topic modelling, named entity recognition, collocation analysis, critical search, and text and data mining. The volume argues that digital history is entering a mature phase, digital history ‘in action’, where its focus is shifting from the building of resources towards the making of new historical knowledge. This also involves novel challenges that digital methods pose to historical research, including awareness of the pitfalls and limitations of the digital tools and the necessity of new forms of digital source criticisms. Through its combination of empirical, conceptual and contextual studies, Digital Histories is a timely and pioneering contribution taking stock of how digital research currently advances historical scholarship

    Modern Socio-Technical Perspectives on Privacy

    Get PDF
    This open access book provides researchers and professionals with a foundational understanding of online privacy as well as insight into the socio-technical privacy issues that are most pertinent to modern information systems, covering several modern topics (e.g., privacy in social media, IoT) and underexplored areas (e.g., privacy accessibility, privacy for vulnerable populations, cross-cultural privacy). The book is structured in four parts, which follow after an introduction to privacy on both a technical and social level: Privacy Theory and Methods covers a range of theoretical lenses through which one can view the concept of privacy. The chapters in this part relate to modern privacy phenomena, thus emphasizing its relevance to our digital, networked lives. Next, Domains covers a number of areas in which privacy concerns and implications are particularly salient, including among others social media, healthcare, smart cities, wearable IT, and trackers. The Audiences section then highlights audiences that have traditionally been ignored when creating privacy-preserving experiences: people from other (non-Western) cultures, people with accessibility needs, adolescents, and people who are underrepresented in terms of their race, class, gender or sexual identity, religion or some combination. Finally, the chapters in Moving Forward outline approaches to privacy that move beyond one-size-fits-all solutions, explore ethical considerations, and describe the regulatory landscape that governs privacy through laws and policies. Perhaps even more so than the other chapters in this book, these chapters are forward-looking by using current personalized, ethical and legal approaches as a starting point for re-conceptualizations of privacy to serve the modern technological landscape. The book’s primary goal is to inform IT students, researchers, and professionals about both the fundamentals of online privacy and the issues that are most pertinent to modern information systems. Lecturers or teacherscan assign (parts of) the book for a “professional issues” course. IT professionals may select chapters covering domains and audiences relevant to their field of work, as well as the Moving Forward chapters that cover ethical and legal aspects. Academicswho are interested in studying privacy or privacy-related topics will find a broad introduction in both technical and social aspects

    Modern Socio-Technical Perspectives on Privacy

    Get PDF
    This open access book provides researchers and professionals with a foundational understanding of online privacy as well as insight into the socio-technical privacy issues that are most pertinent to modern information systems, covering several modern topics (e.g., privacy in social media, IoT) and underexplored areas (e.g., privacy accessibility, privacy for vulnerable populations, cross-cultural privacy). The book is structured in four parts, which follow after an introduction to privacy on both a technical and social level: Privacy Theory and Methods covers a range of theoretical lenses through which one can view the concept of privacy. The chapters in this part relate to modern privacy phenomena, thus emphasizing its relevance to our digital, networked lives. Next, Domains covers a number of areas in which privacy concerns and implications are particularly salient, including among others social media, healthcare, smart cities, wearable IT, and trackers. The Audiences section then highlights audiences that have traditionally been ignored when creating privacy-preserving experiences: people from other (non-Western) cultures, people with accessibility needs, adolescents, and people who are underrepresented in terms of their race, class, gender or sexual identity, religion or some combination. Finally, the chapters in Moving Forward outline approaches to privacy that move beyond one-size-fits-all solutions, explore ethical considerations, and describe the regulatory landscape that governs privacy through laws and policies. Perhaps even more so than the other chapters in this book, these chapters are forward-looking by using current personalized, ethical and legal approaches as a starting point for re-conceptualizations of privacy to serve the modern technological landscape. The book’s primary goal is to inform IT students, researchers, and professionals about both the fundamentals of online privacy and the issues that are most pertinent to modern information systems. Lecturers or teacherscan assign (parts of) the book for a “professional issues” course. IT professionals may select chapters covering domains and audiences relevant to their field of work, as well as the Moving Forward chapters that cover ethical and legal aspects. Academicswho are interested in studying privacy or privacy-related topics will find a broad introduction in both technical and social aspects
    corecore