32 research outputs found

    Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

    Full text link
    Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal.Comment: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal

    Copernicus, his Latin style and comments to Commentariolus

    Get PDF
    A methodology of historical or higher criticism and of stylometry/stylochronometry known from Biblical and literary studies is applied to the examination of Nicolaus Copernicus’s writings. In particular, his early work Commentariolus is compared at the level of the Latin language with his later ones (Meditata, Letter against Werner and De revolutionibus) as well as the texts of some other authors. A number of striking stylistic dissimilarities between these works have been identified and interpreted in the light of stylometry/stylochronometry, historical criticism and the history of Copernican research. The conducted research allowed to draw some plausible conclusions about the Sitz im Leben (historical context), the dating of Commentariolus and related matters.Kopernik, styl jego łaciny i komentarze do Commentariolus Metodologia krytyki historycznej albo wyższej krytyki i stylometrii/stylochronometrii, znana z bibliologii i literaturoznawstwa, jest zastosowana do badania pism Mikołaja Kopernika. W szczególności jego wczesne dzieło Commentariolus porównuje się na poziomie języka łacińskiego z późniejszymi: jego własnymi (Meditata, List przeciwko Wernerowi i De revolutionibus) oraz innych autorów. Zidentyfikowano w tych pracach szereg uderzających różnic stylistycznych, które zinterpretowano w świetle stylometrii/stylochronometrii, krytyki historycznej oraz historii badań Kopernikowskich. Przeprowadzone badanie pozwoliło na wyciągnięcie prawdopodobnych wniosków na temat „Sitz im Leben” (kontekstu historycznego) i datowania Commentariolus

    StyleCounsel: Seeing the (Random) Forest for the Trees in Adversarial Code Stylometry

    Get PDF
    Authorship attribution has piqued the interest of scholars for centuries, but had historically remained a matter of subjective opinion, based upon examination of handwriting and the physical document. Midway through the 20th Century, a technique known as stylometry was developed, in which the content of a document is analyzed to extract the author's grammar use, preferred vocabulary, and other elements of compositional style. In parallel to this, programmers, and particularly those involved in education, were writing and testing systems designed to automate the analysis of good coding style and best practice, in order to assist with grading assignments. In the aftermath of the Morris Worm incident in 1988, researchers began to consider whether this automated analysis of program style could be combined with stylometry techniques and applied to source code, to identify the author of a program. The results of recent experiments have suggested this code stylometry can successfully identify the author of short programs from among hundreds of candidates with up to 98\% precision. This potential ability to discern the programmer of a sample of code from a large group of possible authors could have concerning consequences for the open-source community at large, particularly those contributors that may wish to remain anonymous. Recent international events have suggested the developers of certain anti-censorship and anti-surveillance tools are being targeted by their governments and forced to delete their repositories or face prosecution. In light of this threat to the freedom and privacy of individual programmers around the world, and due to a dearth of published research into practical code stylometry at scale and its feasibility, we carried out a number of investigations looking into the difficulties of applying this technique in the real world, and how one might effect a robust defence against it. To this end, we devised a system to aid programmers in obfuscating their inherent style and imitating another, overt, author's style in order to protect their anonymity from this forensic technique. Our system utilizes the implicit rules encoded in the decision points of a random forest ensemble in order to derive a set of recommendations to present to the user detailing how to achieve this obfuscation and mimicry attack. In order to best test this system, and simultaneously assess the difficulties of performing practical stylometry at scale, we also gathered a large corpus of real open-source software and devised our own feature set including both novel attributes and those inspired or borrowed from other sources. Our results indicate that attempting a mass analysis of publicly available source code is fraught with difficulties in ensuring the integrity of the data. Furthermore, we found ours and most other published feature sets do not sufficiently capture an author's style independently of the content to be very effective at scale, although its accuracy is significantly greater than a random guess. Evaluations of our tool indicate it can successfully extract a set of changes that would result in a misclassification as another user if implemented. More importantly, this extraction was independent of the specifics of the feature set, and therefore would still work even with a more accurate model of style. We ran a limited user study to assess the usability of the tool, and found overall it was beneficial to our participants, and could be even more beneficial if the valuable feedback we received were implemented in future work

    Modelling communities and populations: An introduction to computational social science

    Get PDF
    In sociology, interest in modelling has not yet become widespread. However, the methodology has been gaining increased attention in parallel with its growing popularity in economics and other social sciences, notably psychology and political science, and the growing volume of social data being measured and collected. In this paper, we present representative computational methodologies from both data-driven (such as “black box”) and rule-based (such as “per analogy”) approaches. We show how to build simple models, and discuss both the greatest successes and the major limitations of modelling societies. We claim that the end goal of computational tools in sociology is providing meaningful analyses and calculations in order to allow making causal statements in sociological explanation and support decisions of great importance for society

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    WoX+: A Meta-Model-Driven Approach to Mine User Habits and Provide Continuous Authentication in the Smart City

    Get PDF
    The literature is rich in techniques and methods to perform Continuous Authentication (CA) using biometric data, both physiological and behavioral. As a recent trend, less invasive methods such as the ones based on context-aware recognition allows the continuous identification of the user by retrieving device and app usage patterns. However, a still uncovered research topic is to extend the concepts of behavioral and context-aware biometric to take into account all the sensing data provided by the Internet of Things (IoT) and the smart city, in the shape of user habits. In this paper, we propose a meta-model-driven approach to mine user habits, by means of a combination of IoT data incoming from several sources such as smart mobility, smart metering, smart home, wearables and so on. Then, we use those habits to seamlessly authenticate users in real time all along the smart city when the same behavior occurs in different context and with different sensing technologies. Our model, which we called WoX+, allows the automatic extraction of user habits using a novel Artificial Intelligence (AI) technique focused on high-level concepts. The aim is to continuously authenticate the users using their habits as behavioral biometric, independently from the involved sensing hardware. To prove the effectiveness of WoX+ we organized a quantitative and qualitative evaluation in which 10 participants told us a spending habit they have involving the use of IoT. We chose the financial domain because it is ubiquitous, it is inherently multi-device, it is rich in time patterns, and most of all it requires a secure authentication. With the aim of extracting the requirement of such a system, we also asked the cohort how they expect WoX+ will use such habits to securely automatize payments and identify them in the smart city. We discovered that WoX+ satisfies most of the expected requirements, particularly in terms of unobtrusiveness of the solution, in contrast with the limitations observed in the existing studies. Finally, we used the responses given by the cohorts to generate synthetic data and train our novel AI block. Results show that the error in reconstructing the habits is acceptable: Mean Squared Error Percentage (MSEP) 0.04%

    Combating Fake News: A Gravity Well Simulation to Model Echo Chamber Formation In Social Media

    Get PDF
    Fake news has become a serious concern as distributing misinformation has become easier and more impactful. A solution is critically required. One solution is to ban fake news, but that approach could create more problems than it solves, and would also be problematic from the beginning, as it must first be identified to be banned. We initially propose a method to automatically recognize suspected fake news, and to provide news consumers with more information as to its veracity. We suggest that fake news is comprised of two components: premises and misleading content. Fake news can be condensed down to a collection of premises, which may or may not be true, and to various forms of misleading material, including biased arguments and language, misdirection, and manipulation. Misleading content can then be exposed. While valuable, this framework’s utility may be limited by artificial intelligence, which can be used to alter fake news strategies at a rate exceeding the ability to update the framework. Therefore, we propose a model for identifying echo chambers, which are widely reported to be havens for fake news producers and consumers. We simulate a social media interest group as a gravity well, through which we identify the online groups postured to become echo chambers, and thus a source for fake news consumption and replication. This echo chamber model is supported by three pillars related to the social media group: technology employed, topic explored, and confirmation bias of group members. The model is validated by modeling and analyzing 19 subreddits on the Reddit social media platform. Contributions include a working definition for fake news, a framework for recognizing fake news, a generic model for social media echo chambers including three pillars central to echo chamber formation, and a gravity well simulation for social media groups, implemented for 19 subreddits

    In Search of the Culprit

    Get PDF
    Despite poststructuralist rejections of the idea of a singular author-genius, the question of a textual archetype that can be assigned to a named author is still a common scholarly phantasm. This volume sets new standards by applying current theoretical approaches to the question of concepts of authorship in medieval and early modern literature. The basic thesis is that authorship is a narratological function rather than an extra-textual entity

    In Search of the Culprit

    Get PDF
    Despite poststructuralist rejections of the idea of a singular author-genius, the question of a textual archetype that can be assigned to a named author is still a common scholarly phantasm. This volume sets new standards by applying current theoretical approaches to the question of concepts of authorship in medieval and early modern literature. The basic thesis is that authorship is a narratological function rather than an extra-textual entity

    Behavioral Economics & Machine Learning Expanding the Field Through a New Lens

    Get PDF
    In this thesis, I investigate central questions in behavioral economics as well as law and economics. I examine well-studied problems through a new methodological lens. The aim is to generate new insights and thus point behavioral scientists to novel analytical tools. To this end, I show how machine learning may be used to build new theories by reducing complexity in experimental economic data. Moreover, I use natural language processing to show how supervised learning can enable the scientific community to expand limited datasets. I also investigate the normative impact of the use of such tools in social science research or decision-making as well as their deficiencies
    corecore