3,898 research outputs found

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    Structuring the State’s Voice of Contention in Harmonious Society: How Party Newspapers Cover Social Protests in China

    Get PDF
    During the Chinese Communist Party’s (CCP) campaign of building a ‘harmonious society’, how do the official newspapers cover the instances of social contention on the ground? Answering this question will shed light not only on how the party press works but also on how the state and the society interact in today’s China. This thesis conceptualises this phenomenon with a multi-faceted and multi-levelled notion of ‘state-initiated contentious public sphere’ to capture the complexity of mediated relations between the state and social contention in the party press. Adopting a relational approach, this thesis analyses 1758 news reports of ‘mass incident’ in the People’s Daily and the Guangming Daily between 2004 and 2020, employing cluster analysis, qualitative comparative analysis, and social network analysis. The thesis finds significant differences in the patterns of contentious coverage in the party press at the level of event and province and an uneven distribution of attention to social contention across incidents and regions. For ‘reported regions’, the thesis distinguishes four types of coverage and presents how party press responds differently to social contention in different scenarios at the provincial level. For ‘identified incidents’, the thesis distinguishes a cumulative type of visibility based on the quantity of coverage from a relational visibility based on the structure emerging from coverage and explains how different news-making rationales determine whether instances receive similar amounts of coverage or occupy similar positions within coverage. Eventually, by demonstrating how the Chinese state strategically uses party press to respond to social contention and how social contention is journalistically placed in different positions in the state’s eyes, this thesis argues that what social contention leads to is the establishment of complex state-contention relations channelled through the party press

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Patterns and Variation in English Language Discourse

    Get PDF
    The publication is reviewed post-conference proceedings from the international 9th Brno Conference on Linguistics Studies in English, held on 16–17 September 2021 and organised by the Faculty of Education, Masaryk University in Brno. The papers revolve around the themes of patterns and variation in specialised discourses (namely the media, academic, business, tourism, educational and learner discourses), effective interaction between the addressor and addressees and the current trends and development in specialised discourses. The principal methodological perspectives are the comparative approach involving discourses in English and another language, critical and corpus analysis, as well as identification of pragmatic strategies and appropriate rhetorical means. The authors of papers are researchers from the Czech Republic, Italy, Luxembourg, Serbia and Georgia

    A Critical Review Of Post-Secondary Education Writing During A 21st Century Education Revolution

    Get PDF
    Educational materials are effective instruments which provide information and report new discoveries uncovered by researchers in specific areas of academia. Higher education, like other education institutions, rely on instructional materials to inform its practice of educating adult learners. In post-secondary education, developmental English programs are tasked with meeting the needs of dynamic populations, thus there is a continuous need for research in this area to support its changing landscape. However, the majority of scholarly thought in this area centers on K-12 reading and writing. This paucity presents a phenomenon to the post-secondary community. This research study uses a qualitative content analysis to examine peer-reviewed journals from 2003-2017, developmental online websites, and a government issued document directed toward reforming post-secondary developmental education programs. These highly relevant sources aid educators in discovering informational support to apply best practices for student success. Developmental education serves the purpose of addressing literacy gaps for students transitioning to college-level work. The findings here illuminate the dearth of material offered to developmental educators. This study suggests the field of literacy research is fragmented and highlights an apparent blind spot in scholarly literature with regard to English writing instruction. This poses a quandary for post-secondary literacy researchers in the 21st century and establishes the necessity for the literacy research community to commit future scholarship toward equipping college educators teaching writing instruction to underprepared adult learners

    Talking about personal recovery in bipolar disorder: Integrating health research, natural language processing, and corpus linguistics to analyse peer online support forum posts

    Get PDF
    Background: Personal recovery, ‘living a satisfying, hopeful and contributing lifeeven with the limitations caused by the illness’ (Anthony, 1993) is of particular value in bipolar disorder where symptoms often persist despite treatment. So far, personal recovery has only been studied in researcher-constructed environments (interviews, focus groups). Support forum posts can serve as a complementary naturalistic data source. Objective: The overarching aim of this thesis was to study personal recovery experiences that people living with bipolar disorder have shared in online support forums through integrating health research, NLP, and corpus linguistics in a mixed methods approach within a pragmatic research paradigm, while considering ethical issues and involving people with lived experience. Methods: This mixed-methods study analysed: 1) previous qualitative evidence on personal recovery in bipolar disorder from interviews and focus groups 2) who self-reports a bipolar disorder diagnosis on the online discussion platform Reddit 3) the relationship of mood and posting in mental health-specific Reddit forums (subreddits) 4) discussions of personal recovery in bipolar disorder subreddits. Results: A systematic review of qualitative evidence resulted in the first framework for personal recovery in bipolar disorder, POETIC (Purpose & meaning, Optimism & hope, Empowerment, Tensions, Identity, Connectedness). Mainly young or middle-aged US-based adults self-report a bipolar disorder diagnosis on Reddit. Of these, those experiencing more intense emotions appear to be more likely to post in mental health support subreddits. Their personal recovery-related discussions in bipolar disorder subreddits primarily focussed on three domains: Purpose & meaning (particularly reproductive decisions, work), Connectedness (romantic relationships, social support), Empowerment (self-management, personal responsibility). Support forum data highlighted personal recovery issues that exclusively or more frequently came up online compared to previous evidence from interviews and focus groups. Conclusion: This project is the first to analyse non-reactive data on personal recovery in bipolar disorder. Indicating the key areas that people focus on in personal recovery when posting freely and the language they use provides a helpful starting point for formal and informal carers to understand the concerns of people diagnosed with bipolar disorder and to consider how best to offer support

    Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

    Get PDF
    Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

    Themes, Lexemes, and "Mnemes": Composite Allusions in the Gospel of John and other Jewish Literature

    Get PDF
    This thesis examines composite allusions to the Jewish scriptures in the Gospel of John and compares these to similar phenomena in late Second Temple Jewish literature. Composite allusions are defined in this study as allusions clustered together in a single literary unit that are best interpreted together. To analyze such allusions, I develop a three-fold method integrating 1) literary analysis; 2) Jewish catchword exegesis; and 3) insights from studies in ancient media culture. The passages I examine are, first, six passages from Jewish literature (CD 1:1–3; 1QHa 16:5–12a; Sir. 33:7–15; Exod. 15:3 LXX; Ps. 71:17 LXX; and Isa. 3:9 LXX); secondly, a double citation in John (12:37–40); and, finally, three composite allusions in John (1:29, 7:37–39, 15:1–11). I argue that the composite features across all of these passages function on the basis of common lexemes, common themes, and metonymy. For all the cases in question I offer fresh insights on how different ancient texts and traditions were likely to have become associated with each other, and how, in the Gospel of John, these associations are embedded in the narrative and utilized for the author’s theological and literary purposes. In my synthesizing conclusion, I apply the results of my findings to the current debate about the “Jewishness” of John. On the one hand, the Gospel of John demonstrates a sophisticated interaction with its scriptural sources—and thus situates itself squarely within the Jewish exegetical traditions of its day. On the other hand, scriptural allusions are employed above all in the interests of christology—setting it outside of and beyond the compass of other Jewish writings

    Neural Concept-to-text Generation with Knowledge Graphs

    Get PDF
    Modern language models are strong at generating grammatically correct, natural lan- guage. However, they still struggle with commonsense reasoning - a task involving making inferences about common everyday situations without explicitly stated informa- tion. Prior research into the topic has shown that providing additional information from external sources helps language models generate better outputs. In this thesis, we explore methods of extracting information from knowledge graphs and using it as additional input for a pre-trained generative language model. We do this by either extracting a subgraph relevant to the context or by using graph neural networks to predict which information is relevant. Moreover, we experiment with a post-editing approach and with a model trained in a multi-task setup (generation and consistency classification). Our methods are evaluated on the CommonGen benchmark for generative commonsense reasoning using both automatic metrics and a detailed error analysis on a small sample of outputs. We show that the methods improve over a simple language model fine-tuning baseline, although they do not set a new state of the art. 1Moderní jazykové modely jsou schopné generovat gramaticky správný, přirozený ja- zyk. Stále však mají potíže s commonsense reasoningem, což je úkol zahrnující vyvozování závěrů o běžných každodenních situacích bez explicitně uvedených informací. Předchozí výzkum tohoto tématu ukázal, že poskytnutí dodatečných informací z externích zdrojů pomáhá jazykovým modelům generovat lepší výstupy. V této práci zkoumáme metody získávání informací ze znalostních grafů a jejich využití jako dodatečného vstupu pro předem natrénovaný generativní jazykový model. Děláme to buď extrakcí podgrafu rele- vantního pro kontext, nebo pomocí grafových neuronových sítí, které předpovídají, které informace jsou relevantní. Kromě toho experimentujeme s post-editačním přístupem a s modelem natrénovaným ve víceúlohovém setupu (generování a klasifikace konzistence). Naše metody jsou hodnoceny na benchmarku CommonGen pro generativní common- sense reasoning s využitím automatických metrik i podrobné analýzy chyb na malém vzorku výstupů. Ukazujeme, že metody se zlepšují ve srovnání s jednoduchým přístu- pem spočívajícím ve vyladění jazykového modelu, ačkoli nepřekonávají nejlepší současné modely. 1Institute of Formal and Applied LinguisticsÚstav formální a aplikované lingvistikyFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

    Full text link
    The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data.Comment: NeurIPS 2023 camera ready (32 pages). Code, models, data available in https://github.com/martiansideofthemoon/ai-detection-paraphrase
    corecore