7,043 research outputs found

    Personality Dysfunction Manifest in Words : Understanding Personality Pathology Using Computational Language Analysis

    Get PDF
    Personality disorders (PDs) are some of the most prevalent and high-risk mental health conditions, and yet remain poorly understood. Today, the development of new technologies means that there are advanced tools that can be used to improve our understanding and treatment of PD. One promising tool – indeed, the focus of this thesis – is computational language analysis. By looking at patterns in how people with personality pathology use words, it is possible to gain access into their constellation of thinking, feelings, and behaviours. To date, however, there has been little research at the intersection of verbal behaviour and personality pathology. Accordingly, the central goal of this thesis is to demonstrate how PD can be better understood through the analysis of natural language. This thesis presents three research articles, comprising four empirical studies, that each leverage computational language analysis to better understand personality pathology. Each paper focuses on a distinct core feature of PD, while incorporating language analysis methods: Paper 1 (Study 1) focuses on interpersonal dysfunction; Paper 2 (Studies 2 and 3) focuses on emotion dysregulation; and Paper 3 (Study 4) focuses on behavioural dysregulation (i.e., engagement in suicidality and deliberate self-harm). Findings from this research have generated better understanding of fundamental features of PD, including insight into characterising dimensions of social dysfunction (Paper 1), maladaptive emotion processes that may contribute to emotion dysregulation (Paper 2), and psychosocial dynamics relating to suicidality and deliberate self-harm (Paper 3) in PD. Such theoretical knowledge subsequently has important implications for clinical practice, particularly regarding the potential to inform psychological therapy. More broadly, this research highlights how language can provide implicit and unobtrusive insight into the personality and psychological processes that underlie personality pathology at a large-scale, using an individualised, naturalistic approach

    Dataflow Programming and Acceleration of Computationally-Intensive Algorithms

    Get PDF
    The volume of unstructured textual information continues to grow due to recent technological advancements. This resulted in an exponential growth of information generated in various formats, including blogs, posts, social networking, and enterprise documents. Numerous Enterprise Architecture (EA) documents are also created daily, such as reports, contracts, agreements, frameworks, architecture requirements, designs, and operational guides. The processing and computation of this massive amount of unstructured information necessitate substantial computing capabilities and the implementation of new techniques. It is critical to manage this unstructured information through a centralized knowledge management platform. Knowledge management is the process of managing information within an organization. This involves creating, collecting, organizing, and storing information in a way that makes it easily accessible and usable. The research involved the development textual knowledge management system, and two use cases were considered for extracting textual knowledge from documents. The first case study focused on the safety-critical documents of a railway enterprise. Safety is of paramount importance in the railway industry. There are several EA documents including manuals, operational procedures, and technical guidelines that contain critical information. Digitalization of these documents is essential for analysing vast amounts of textual knowledge that exist in these documents to improve the safety and security of railway operations. A case study was conducted between the University of Huddersfield and the Railway Safety Standard Board (RSSB) to analyse EA safety documents using Natural language processing (NLP). A graphical user interface was developed that includes various document processing features such as semantic search, document mapping, text summarization, and visualization of key trends. For the second case study, open-source data was utilized, and textual knowledge was extracted. Several features were also developed, including kernel distribution, analysis offkey trends, and sentiment analysis of words (such as unique, positive, and negative) within the documents. Additionally, a heterogeneous framework was designed using CPU/GPU and FPGAs to analyse the computational performance of document mapping

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Sound Event Detection by Exploring Audio Sequence Modelling

    Get PDF
    Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing a sound recognition system are, which portion of a sound event should the system analyse, and what proportion of a sound event should the system process in order to claim a confident detection of that particular sound event. While the classification of sound events has improved a lot in recent years, it is considered that the temporal-segmentation of sound events has not improved in the same extent. The aim of this thesis is to propose and develop methods to improve the segmentation and classification of everyday sound events in SED models. In particular, this thesis explores the segmentation of sound events by investigating audio sequence encoding-based and audio sequence modelling-based methods, in an effort to improve the overall sound event detection performance. In the first phase of this thesis, efforts are put towards improving sound event detection by explicitly conditioning the audio sequence representations of an SED model using sound activity detection (SAD) and onset detection. To achieve this, we propose multi-task learning-based SED models in which SAD and onset detection are used as auxiliary tasks for the SED task. The next part of this thesis explores self-attention-based audio sequence modelling, which aggregates audio representations based on temporal relations within and between sound events, scored on the basis of the similarity of sound event portions in audio event sequences. We propose SED models that include memory-controlled, adaptive, dynamic, and source separation-induced self-attention variants, with the aim to improve overall sound recognition

    Cultures of Citizenship in the Twenty-First Century: Literary and Cultural Perspectives on a Legal Concept

    Get PDF
    In the early twenty-first century, the concept of citizenship is more contested than ever. As refugees set out to cross the Mediterranean, European nation-states refer to "cultural integrity" and "immigrant inassimilability," revealing citizenship to be much more than a legal concept. The contributors to this volume take an interdisciplinary approach to considering how cultures of citizenship are being envisioned and interrogated in literary and cultural (con)texts. Through this framework, they attend to the tension between the citizen and its spectral others - a tension determined by how a country defines difference at a given moment

    Dataset And Deep Neural Network Based Approach To Audio Question Answering

    Get PDF
    Audio question answering (AQA) is a multimodal task in which a system analyzes an audio signal and a question in natural language, to produce a desirable answer in natural language. In this thesis, a new dataset for audio question answering, Clotho-AQA, consisting of 1991 audio files each between 15 to 30 seconds in duration is presented. For each audio file in the dataset, six different questions and their corresponding answers were crowdsourced using Amazon Mechanical Turk (AMT). The questions and their corresponding answers were created by different annotators. Out of the six questions for each audio, two questions each were designed to have ‘yes’ and ‘no’ as answers respectively, while the remaining two questions have other single-word answers. For every question, answers from three independent annotators were collected. In this thesis, two baseline experiments are presented to portray the usage of the Clotho-AQA dataset - a multimodal binary classifier for ‘yes’ or ‘no’ answers and a multimodal multi-class classifier for single-word answers both based on long short-term memory (LSTM) layers. The binary classifier achieved an accuracy of 62.7% and the multi-class classifier achieved a top-1 accuracy of 54.2% and a top-5 accuracy of 93.7%. Further, an attention-based model was proposed, which increased the binary classifier accuracy to 66.2% and the top-1 and top-5 multiclass classifier accuracy to 57.5% and 99.8% respectively. Some drawbacks of the Clotho-AQA dataset such as the presence of the same answer words in different tenses, singular-plural forms, etc., that are considered as different classes for the classification problem were addressed and a refined version called Clotho-AQA_v2 is also presented. The multimodal baseline model achieved a top-1 and top-5 accuracy of 59.8% and 96.6% respectively while the attention-based model achieved a top-1 and top-5 accuracy of 61.3% and 99.6% respectively on this refined dataset

    Software Design Change Artifacts Generation through Software Architectural Change Detection and Categorisation

    Get PDF
    Software is solely designed, implemented, tested, and inspected by expert people, unlike other engineering projects where they are mostly implemented by workers (non-experts) after designing by engineers. Researchers and practitioners have linked software bugs, security holes, problematic integration of changes, complex-to-understand codebase, unwarranted mental pressure, and so on in software development and maintenance to inconsistent and complex design and a lack of ways to easily understand what is going on and what to plan in a software system. The unavailability of proper information and insights needed by the development teams to make good decisions makes these challenges worse. Therefore, software design documents and other insightful information extraction are essential to reduce the above mentioned anomalies. Moreover, architectural design artifacts extraction is required to create the developer’s profile to be available to the market for many crucial scenarios. To that end, architectural change detection, categorization, and change description generation are crucial because they are the primary artifacts to trace other software artifacts. However, it is not feasible for humans to analyze all the changes for a single release for detecting change and impact because it is time-consuming, laborious, costly, and inconsistent. In this thesis, we conduct six studies considering the mentioned challenges to automate the architectural change information extraction and document generation that could potentially assist the development and maintenance teams. In particular, (1) we detect architectural changes using lightweight techniques leveraging textual and codebase properties, (2) categorize them considering intelligent perspectives, and (3) generate design change documents by exploiting precise contexts of components’ relations and change purposes which were previously unexplored. Our experiment using 4000+ architectural change samples and 200+ design change documents suggests that our proposed approaches are promising in accuracy and scalability to deploy frequently. Our proposed change detection approach can detect up to 100% of the architectural change instances (and is very scalable). On the other hand, our proposed change classifier’s F1 score is 70%, which is promising given the challenges. Finally, our proposed system can produce descriptive design change artifacts with 75% significance. Since most of our studies are foundational, our approaches and prepared datasets can be used as baselines for advancing research in design change information extraction and documentation

    Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs

    Full text link
    Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion by collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new deep convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented model. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance
    • …
    corecore