9 research outputs found

    Linguistic spatial classifications of event domains in narratives of crime

    Get PDF
    Structurally, formal definitions of the linguistic narrative minimally require two temporally linked past-time events. The role of space in this definition, based on spatial language indicating where events occur, is considered optional and non-structural. However, based on narratives with a high frequency of spatial language, recent research has questioned this perspective, suggesting that space is more critical than may be readily apparent. Through an analysis of spatially rich serial criminal narratives, it will be demonstrated that spatial information qualitatively varies relative to narrative events. In particular, statistical classifiers in a supervised machine learning task achieve a 90% accuracy in predicting Pre-Crime, Crime, and Post-Crime events based on spatial (and temporal) information. Overall, these results suggest a deeper spatial organization of discourse, which not only provides practical event resolution possibilities, but also challenges traditional formal linguistic definitions of narrative

    The transformation of spatal experience in narrative discourse

    No full text
    This dissertation investigates the status of spatial information as a structural element of narratives of personal experience. Traditionally, event, temporal and rhetorical relation information are considered structural – i.e., minimally necessary to define local and textual elements of narrative discourse. However, while this information is readily apparent from surface linguistic forms, spatial information, and its status as structural, is less straightforward. To uncover correspondences between spatial information and structural elements of narrative discourse, I rely on a series of machine learning experiments to analyze morpho-syntactic, formal and cognitive semantically encoded spatial information indexed by spatial prepositions and verbs from a particular frame of reference, relative to events, rhetorical relations, tense, aspect, explicit temporal reference and text sequence in three corpora of narrative discourses (conversational, adventure travel, and criminal activity narratives). Based on strength of prediction in the machine learning experiments – where statistical classifiers are able to predict spatial, temporal, event and rhetorical information to between 60 and 70% accuracy with an increase to over 80% when implicit spatial information and text sequence are considered – spatial information is argued to demonstrate structural patterns on clausal and textual levels. These structural patterns hold for all corpora despite contextual parameters, number of authors, length of text and density of spatial information. Further, the results and analysis are compared to existing narrative analysis frameworks (Labov 1972, Herman 2001) where it is determined that a more nuanced, but non-contradictory, picture of spatial information in narrative discourse, based on both syntactic and semantic considerations, emerges from the presented research. Additionally, I engage in a discussion of environmental criminology to bridge interdisciplinary gaps between cognitively informed insights into spatial language and the linguistic conveyance of experiential discourse. In sum, spatial information exhibits structural patterns in narrative discourses that facilitate a deeper practical and theoretical understanding of the cognitive and linguistic organization, and analysis of, experiential discourses

    Linguistic spatial classifications of event domains in narratives of crime

    No full text
    Structurally, formal definitions of the linguistic narrative minimally require two temporally linked past-time events. The role of space in this definition, based on spatial language indicating where events occur, is considered optional and non-structural. However, based on narratives with a high frequency of spatial language, recent research has questioned this perspective, suggesting that space is more critical than may be readily apparent. Through an analysis of spatially rich serial criminal narratives, it will be demonstrated that spatial information qualitatively varies relative to narrative events. In particular, statistical classifiers in a supervised machine learning task achieve a 90% accuracy in predicting Pre-Crime, Crime, and Post-Crime events based on spatial (and temporal) information. Overall, these results suggest a deeper spatial organization of discourse, which not only provides practical event resolution possibilities, but also challenges traditional formal linguistic definitions of narrative

    Abstract Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method

    No full text
    Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code authorship disputes, intellectual property infringement, tracing the source of code left in the system after a cyber attack, and so forth. As in any identification task, the disputed program is compared to undisputed, known programming samples by the predefined author candidates. We present a new approach, called the SCAP (Source Code Author Profiles) approach, based on byte-level n-gram profiles representing the source code author’s style. The SCAP method extends a method originally applied to natural language text authorship attribution; we show that an n-gram approach also suits the characteristics of source code analysis. The methodological extension includes a simplified profile and a less complicated, but more effective, similarity measure. Experiments on data sets of different programming-language (Java or C++) and commented/commentless code demonstrate the effectiveness of these extensions. The SCAP approach is programming-language independent. Moreover, the SCAP approach deals surprisingly well with cases where only a limited amount of very shor

    Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

    No full text
    Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Get PDF
    Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.Comment: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-benc
    corecore