94 research outputs found

    Classifying distinct data types: textual streams protein sequences and genomic variants

    Get PDF
    Artificial Intelligence (AI) is an interdisciplinary field combining different research areas with the end goal to automate processes in the everyday life and industry. The fundamental components of AI models are an “intelligent” model and a functional component defined by the end-application. That is, an intelligent model can be a statistical model that can recognize patterns in data instances to distinguish differences in between these instances. For example, if the AI is applied in car manufacturing, based on an image of a part of a car, the model can categorize if the car part is in the front, middle or rear compartment of the car, as a human brain would do. For the same example application, the statistical model informs a mechanical arm, the functional component, for the current car compartment and the arm in turn assembles this compartment, of the car, based on predefined instructions, likely as a human hand would follow human brain neural signals. A crucial step of AI applications is the classification of input instances by the intelligent model. The classification step in the intelligent model pipeline allows the subsequent steps to act in similar fashion for instances belonging to the same category. We define as classification the module of the intelligent model, which categorizes the input instances based on predefined human-expert or data-driven produced patterns of the instances. Irrespectively of the method to find patterns in data, classification is composed of four distinct steps: (i) input representation, (ii) model building (iii) model prediction and (iv) model assessment. Based on these classification steps, we argue that applying classification on distinct data types holds different challenges. In this thesis, I focus on challenges for three distinct classification scenarios: (i) Textual Streams: how to advance the model building step, commonly used for static distribution of data, to classify textual posts with transient data distribution? (ii) Protein Prediction: which biologically meaningful information can be used in the input representation step to overcome the limited training data challenge? (iii) Human Variant Pathogenicity Prediction: how to develop a classification system for functional impact of human variants, by providing standardized and well accepted evidence for the classification outcome and thus enabling the model assessment step? To answer these research questions, I present my contributions in classifying these different types of data: temporalMNB: I adapt the sequential prediction with expert advice paradigm to optimally aggregate complementary distributions to enhance a Naive Bayes model to adapt on drifting distribution of the characteristics of the textual posts. dom2vec: our proposal to learn embedding vectors for the protein domains using self-supervision. Based on the high performance achieved by the dom2vec embeddings in quantitative intrinsic assessment on the captured biological information, I provide example evidence for an analogy between the local linguistic features in natural languages and the domain structure and function information in domain architectures. Last, I describe GenOtoScope bioinformatics software tool to automate standardized evidence-based criteria for pathogenicity impact of variants associated with hearing loss. Finally, to increase the practical use of our last contribution, I develop easy-to-use software interfaces to be used, in research settings, by clinical diagnostics personnel.Künstliche Intelligenz (KI) ist ein interdisziplinäres Gebiet, das verschiedene Forschungsbereiche mit dem Ziel verbindet, Prozesse im Alltag und in der Industrie zu automatisieren. Die grundlegenden Komponenten von KI-Modellen sind ein “intelligentes” Modell und eine durch die Endanwendung definierte funktionale Komponente. Das heißt, ein intelligentes Modell kann ein statistisches Modell sein, das Muster in Dateninstanzen erkennen kann, um Unterschiede zwischen diesen Instanzen zu unterscheiden. Wird die KI beispielsweise in der Automobilherstellung eingesetzt, kann das Modell auf der Grundlage eines Bildes eines Autoteils kategorisieren, ob sich das Autoteil im vorderen, mittleren oder hinteren Bereich des Autos befindet, wie es ein menschliches Gehirn tun würde. Bei der gleichen Beispielanwendung informiert das statistische Modell einen mechanischen Arm, die funktionale Komponente, über den aktuellen Fahrzeugbereich, und der Arm wiederum baut diesen Bereich des Fahrzeugs auf der Grundlage vordefinierter Anweisungen zusammen, so wie eine menschliche Hand den neuronalen Signalen des menschlichen Gehirns folgen würde. Ein entscheidender Schritt bei KI-Anwendungen ist die Klassifizierung von Eingabeinstanzen durch das intelligente Modell. Unabhängig von der Methode zum Auffinden von Mustern in Daten besteht die Klassifizierung aus vier verschiedenen Schritten: (i) Eingabedarstellung, (ii) Modellbildung, (iii) Modellvorhersage und (iv) Modellbewertung. Ausgehend von diesen Klassifizierungsschritten argumentiere ich, dass die Anwendung der Klassifizierung auf verschiedene Datentypen unterschiedliche Herausforderungen mit sich bringt. In dieser Arbeit konzentriere ich uns auf die Herausforderungen für drei verschiedene Klassifizierungsszenarien: (i) Textdatenströme: Wie kann der Schritt der Modellerstellung, der üblicherweise für eine statische Datenverteilung verwendet wird, weiterentwickelt werden, um die Klassifizierung von Textbeiträgen mit einer instationären Datenverteilung zu erlernen? (ii) Proteinvorhersage: Welche biologisch sinnvollen Informationen können im Schritt der Eingabedarstellung verwendet werden, um die Herausforderung der begrenzten Trainingsdaten zu überwinden? (iii) Vorhersage der Pathogenität menschlicher Varianten: Wie kann ein Klassifizierungssystem für die funktionellen Auswirkungen menschlicher Varianten entwickelt werden, indem standardisierte und anerkannte Beweise für das Klassifizierungsergebnis bereitgestellt werden und somit der Schritt der Modellbewertung ermöglicht wird? Um diese Forschungsfragen zu beantworten, stelle ich meine Beiträge zur Klassifizierung dieser verschiedenen Datentypen vor: temporalMNB: Verbesserung des Naive-Bayes-Modells zur Klassifizierung driftender Textströme durch Ensemble-Lernen. dom2vec: Lernen von Einbettungsvektoren für Proteindomänen durch Selbstüberwachung. Auf der Grundlage der berichteten Ergebnisse liefere ich Beispiele für eine Analogie zwischen den lokalen linguistischen Merkmalen in natürlichen Sprachen und den Domänenstruktur- und Funktionsinformationen in Domänenarchitekturen. Schließlich beschreibe ich ein bioinformatisches Softwaretool, GenOtoScope, zur Automatisierung standardisierter evidenzbasierter Kriterien für die orthogenitätsauswirkungen von Varianten, die mit angeborener Schwerhörigkeit in Verbindung stehen

    Twitter and Research: A Systematic Literature Review Through Text Mining

    Get PDF

    Twitter and Research: A Systematic Literature Review Through Text Mining

    Get PDF
    Researchers have collected Twitter data to study a wide range of topics. This growing body of literature, however, has not yet been reviewed systematically to synthesize Twitter-related papers. The existing literature review papers have been limited by constraints of traditional methods to manually select and analyze samples of topically related papers. The goals of this retrospective study are to identify dominant topics of Twitter-based research, summarize the temporal trend of topics, and interpret the evolution of topics withing the last ten years. This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach. This study collected relevant papers from three databases and applied text mining and trend analysis to detect semantic patterns and explore the yearly development of research themes across a decade. We found 38 topics in more than 18,000 manuscripts published between 2006 and 2019. By quantifying temporal trends, this study found that while 23.7% of topics did not show a significant trend ( P=\u3e0.05 ), 21% of topics had increasing trends and 55.3% of topics had decreasing trends that these hot and cold topics represent three categories: application, methodology, and technology. The contributions of this paper can be utilized in the growing field of Twitter-based research and are beneficial to researchers, educators, and publishers

    Developing natural language processing instruments to study sociotechnical systems

    Get PDF
    Identifying temporal linguistic patterns and tracing social amplification across communities has always been vital to understanding modern sociotechnical systems. Now, well into the age of information technology, the growing digitization of text archives powered by machine learning systems has enabled an enormous number of interdisciplinary studies to examine the coevolution of language and culture. However, most research in that domain investigates formal textual records, such as books and newspapers. In this work, I argue that the study of conversational text derived from social media is just as important. I present four case studies to identify and investigate societal developments in longitudinal social media streams with high temporal resolution spanning over 100 languages. These case studies show how everyday conversations on social media encode a unique perspective that is often complementary to observations derived from more formal texts. This unique perspective improves our understanding of modern sociotechnical systems and enables future research in computational linguistics, social science, and behavioral science

    Overlapping dialogues: the role of interpretation design in communicating Australia’s natural and cultural heritage

    Get PDF
    This research investigates the development of interpretation design in Australia during the period 1980 – 2006, and its role in presenting natural and cultural heritage to audiences in visitor settings. It establishes Australian interpretation design at the intersection of two professional fields, interpretation and design. Where heritage interpretation originates from a background of spoken language, through narrative and storytelling, graphic and communication design have origins in visual language, communicated through images and text. This research positions interpretation design as a new field within design and traces its emergence as a hybrid of spoken and visual traditions of communication.The study gives visibility to this previously undocumented and un-theorised hybrid field of design and creates a thematic conceptual framework within which to locate its historical, conceptual and practical origins. In substantiating interpretation design as a new field, three avenues of enquiry were considered; documentation and analysis of the visual artefacts of interpretation design, locating interpretation design in a wider conceptual and professional context through literature reviews, and consultation with designers in order to understand the challenges and problems in this new mode of design. Further, to facilitate designers to continue to work effectively in highly collaborative, complex and cross-disciplinary professional environments a conceptual collaborative tool was developed for use by interpretation design project teams. The conceptual tool integrates the theoretical and practical findings from this research and is based on a pattern language approach first developed by Christopher Alexander et al (1977).The research is conducted from a design perspective, and integrates theoretical and professional knowledge from related fields into interpretation design practice. Through a progressively widening interrogation of the literature, professional contexts, and designed artefacts of interpretation design, this new area of design is examined from a number of perspectives, building up a multi-faceted framework for understanding its historical, conceptual and practical dimensions. A Grounded Theory methodology was adapted to develop the theoretical framework of this study and to gather a wide range of relevant data. The practical outcome of the research was developed using a Pattern Language methodology originating from a problem-based design approach in architecture (Alexander et al 1977) and underpinned the interpretation of data.Conclusions of the research found that despite invisibility within the discourse of Australian design, designers working in this specialised field of practice have, since the early 1980s, contributed to projects which shape ideas, attitudes and visual representations of natural and cultural heritage in Australia’s most widely visited and valued sites. Designer’s practice is identified as part of an ongoing process of both contributing to Australian cultural narrative and being influenced by the legacy of culture. Contemporary interpretation design is highly cross-disciplinary and collaborative, characterised by a differentiated professional practice with dispersed networks of stakeholders. While interpretation design is located within a larger framework of the professional practice of interpretation, there exists many opportunities to enrich and better inform designers by integrating wider pools of knowledge that intersect the activities of interpretation, including education, tourism, visitor studies and psychology

    Trolling Aesthetics: The Lulz as Creative Practice

    Get PDF
    “The LULZ” became common Internet parlance in the mid-2000s to describe a wide array of online phenomena, from childish pranks, to the peculiar discourse of anonymous message boards, to a shadowy and subversive ideology. By the end of the decade, the canon of images and icons associated with the LULZ entered into artistic practice and along with it a certain dark understanding of the “digital condition” of online mediation. “Trolling Aesthetics: the LULZ as Creative Practice” charts how the LULZ began as an aesthetic sense and sensibility on the notorious message board 4chan. Akin to most online content, it quickly morphed into a multitude of new forms, including, for example, the video remix practice YouTube Poop, which takes the aesthetic logic of 4chan but changes its creative systems and output. The result is both a discordant bric-a-brac of absurd digital art and an example of how the LULZ functions, beyond idle message boards, as a purposeful creative work. The final chapter follows this trajectory into direct artistic practices. Unlike many of the earlier iterations that sputter rather than comment fully on what such digital culture means, artist projects like Brad Troemel’s The Jogging mobilizes the LULZ to reflect on a network of technology obsessed with speed, time, identity, and representations. Through a blend of material, expressive, and aesthetic approaches, this dissertation is both a historical analysis of the emergence of the LULZ as well as a socio-historical critique of an online world willing to foster, participate, and partake in such an ethos

    Atmosphere(s) for Architects: Between Phenomenology and Cognition

    Get PDF
    Interfaces 5 was born to home the dialogue that the neuroscientist Michael A. Arbib and the philosopher Tonino Griffero started at the end of 2021 about atmospheric experiences, striving to bridge the gap between cognitive science’s perspective and the (neo)phenomenological one. This conversation progressed due to Pato Paez’s offer to participate in the webinar “Architectural Atmospheres: Phenomenology, Cognition, and Feeling,” a roundtable hosted by The Commission Project (TCP) within the Applied Neuroaesthetics initiative. The event ran online on May 20, 2022. Bob Condia moderated the panel discussion between Suchi Reddy, Michael A. Arbib, and Tonino Griffero. This volume collects nine essays: the target chapter is “A Dialogue on Affordances, Atmospheres, and Architecture” by Michael A. Arbib and Tonino Griffero; there are four commentaries to this text by Federico De Matteis, Robert Lamb Hart, Mark Alan Hewitt, and Suchi Reddy; Michael A. Arbib and Tonino Griffero have independently responded to the commentaries, emphasizing the opportunities and challenges of their respective approaches: cog/neuroscience and atmospherology applied to architecture; Elisabetta Canepa offers “An Essential Vocabulary of Atmospheric Architecture,” developing an atmospherological critique of the Marianna Kistler Beach Museum of Art on the Kansas State University campus in Manhattan to evaluate the accuracy, coherence, and adaptability of her lexicon. Bob Condia and Mikaela Wynne provide an introduction entitled “On Becoming an Atmospherologist: A Praxis of Atmospheres.
    • …
    corecore