66 research outputs found

    Biosensor approach to psychopathology classification.

    Get PDF
    We used a multi-round, two-party exchange game in which a healthy subject played a subject diagnosed with a DSM-IV (Diagnostic and Statistics Manual-IV) disorder, and applied a Bayesian clustering approach to the behavior exhibited by the healthy subject. The goal was to characterize quantitatively the style of play elicited in the healthy subject (the proposer) by their DSM-diagnosed partner (the responder). The approach exploits the dynamics of the behavior elicited in the healthy proposer as a biosensor for cognitive features that characterize the psychopathology group at the other side of the interaction. Using a large cohort of subjects (n = 574), we found statistically significant clustering of proposers' behavior overlapping with a range of DSM-IV disorders including autism spectrum disorder, borderline personality disorder, attention deficit hyperactivity disorder, and major depressive disorder. To further validate these results, we developed a computer agent to replace the human subject in the proposer role (the biosensor) and show that it can also detect these same four DSM-defined disorders. These results suggest that the highly developed social sensitivities that humans bring to a two-party social exchange can be exploited and automated to detect important psychopathologies, using an interpersonal behavioral probe not directly related to the defining diagnostic criteria

    (Digitally entangled) touristic placemaking: locative media, algorithmic navigation & affective orderings

    Get PDF
    In this thesis, I explore the ways in which the locative platform, TripAdvisor mediates touristic placemaking through a case study which centres on Edinburgh’s Harry Potter tourism scene. This case study is based on three years (2018-2021) of digital, ethnographic research and is illustrative of a setting in which algorithmic navigation is essential to maintaining and/or establishing one’s touristic service in time and space. Drawing on ‘relational materialist’ histories of tourism, work which elaborates on Foucauldian notions of governance, ANT/STS and digital sociological scholarship, I forward an imagining of this genre of platform as ‘geo-pastoral technologies’ and ‘social partners’ to cultural-economic actors who accommodate tourists in the destinations travelled. This conceptualisation is useful for making sense of the specific qualities of this partnership which emerged in my corpus of data -- including it functioning as a ‘promotional partner’ and being used as a ‘thinking partner’ -- and enables me to position these qualities as ongoing accomplishments which require work on the part of touristic organisations mapped on the platform. This work, and particularly the ‘socio-technological techniques’ developed and mobilised to maintain this partnership demonstrate how the algorithmic navigation of locative media platforms is a complex, collective, and more-than-digital endeavour. In particular, I argue that the algorithmic navigation of TripAdvisor can be understood as a form of ‘affective ordering’ which involves: attempting to translate affects onto the platform, attending to the content which accumulates on the platform, and sometimes assembling a digital response and/or re-ordering the collective of things and factors which are understood to be preventing them from assembling a “good experience”, and in doing so attempting to differently affect future touristic audiences. I conclude by reflecting on what this ethnographic case study can contribute to our understanding of platform governance, algorithmic navigation, touristic working practice, and orderings

    Complex queries and complex data

    Get PDF
    With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale MarktplĂ€tze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten veröffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu können bedarf es effizienter und effektiver Algorithmen zur Ähnlichkeitssuche, die zu einem gegebenen Anfrageobjekt Ă€hnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der Ähnlichkeit ĂŒber unterschiedliche Datentypen und Anwendungen hinweg hat sich die Ähnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschĂ€ftigen wir uns mit einem speziellen AnfrageprĂ€dikat im Bereich der Ähnlichkeitsanfragen, mit k-nĂ€chste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nĂ€chsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nĂ€chsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszufĂŒhren nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen DatensĂ€tzen. Nach diesem Kapitel ĂŒber Ähnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell fĂŒr unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche AbhĂ€ngigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir AnfrageprĂ€dikate die nicht nur die Objekte zurĂŒckzugeben, die dem Anfrageobjekt am nĂ€chsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nĂ€chster Nachbar sind. Wir evaluieren die definierten AnfrageprĂ€dikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewĂ€hrleisten. Ausgehend von den Ergebnissen fĂŒr Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusĂ€tzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer öfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und höhere Laufzeiteffizienz auszeichnen, evaluieren wir außerdem Indexierungsverfahren fĂŒr BinĂ€rvektoren

    Developing a framework for semi-automated rule-based modelling for neuroscience research

    Get PDF
    Dynamic modelling has significantly improved our understanding of the complex molecular mechanisms underpinning neurobiological processes. The detailed mechanistic insights these models offer depend on the availability of a diverse range of experimental observations. Despite the huge increase in biomolecular data generation from novel high-throughput technologies and extensive research in bioinformatics and dynamical modelling, efficient creation of accurate dynamical models remains highly challenging. To study this problem, three perspectives are considered: comparison of modelling methods, prioritisation of results and analysis of primary data sets. Firstly, I compare two models of the DARPP-32 signalling network: a classically defined model with ordinary differential equations (ODE) and its equivalent, defined using a novel rule-based (RB) paradigm. The RB model recapitulates the results of the ODE model, but offers a more expressive and flexible syntax that can efficiently handle the “combinatorial complexity” commonly found in signalling networks, and allows ready access to fine-grain details of the emerging system. RB modelling is particularly well suited to encoding protein-centred features such as domain information and post-translational modification sites. Secondly, I propose a new pipeline for prioritisation of molecular species that arise during model simulation using a recently developed algorithm based on multivariate mutual information (CorEx) coupled with global sensitivity analysis (GSA) using the RKappa package. To efficiently evaluate the importance of parameters, Hilber-Schmidt Independence Criterion (HSIC)-based indices are aggregated into a weighted network that allows compact analysis of the model across conditions. Finally, I describe an approach for the development of disease-specific dynamical models using genes known to be associated with Attention Deficit Hyperactivity Disorder (ADHD) as an exemplar. Candidate disease genes are mapped to a selection of datasets that are potentially relevant to the modelling process (e.g. interactions between proteins and domains, protein-domain and kinase-substrates mappings) and these are jointly analysed using network clustering and pathway enrichment analyses to evaluate their coverage and utility in developing rule-based models

    Criminal data analysis based on low rank sparse representation

    Get PDF
    FINDING effective clustering methods for a high dimensional dataset is challenging due to the curse of dimensionality. These challenges can usually make the most of basic common algorithms fail in highdimensional spaces from tackling problems such as large number of groups, and overlapping. Most domains uses some parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the implementation of several techniques of a high-dimensional data for finding a low-dimensional space. Many proposed methods fail to overcome the challenges, especially when the data input is high-dimensional, and the clusters have a complex. REGULARLY in high dimensional data, lots of the data dimensions are not related and might hide the existing clusters in noisy data. High-dimensional data often reside on some low dimensional subspaces. The problem of subspace clustering algorithms is to uncover the type of relationship of an objects from one dimension that are related in different subsets of another dimensions. The state-of-the-art methods for subspace segmentation which included the Low Rank Representation (LRR) and Sparse Representation (SR). The former seeks the global lowest-rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which, however, causes failure in the case of large noise. THIS thesis aims are to identify the key problems and obstacles that have challenged the researchers in recent years in clustering high dimensional data, then to implement an effective subspace clustering methods for solving high dimensional crimes domains for both real events and synthetic data which has complex data structure with 168 different offence crimes. As well as to overcome the disadvantages of existed subspace algorithms techniques. To this end, a Low-Rank Sparse Representation (LRSR) theory, the future will refer to as Criminal Data Analysis Based on LRSR will be examined, then to be used to recover and segment embedding subspaces. The results of these methods will be discussed and compared with what already have been examined on previous approaches such as K-mean and PCA segmented based on K-means. The previous approaches have helped us to chose the right subspace clustering methods. The Proposed method based on subspace segmentation method named Low Rank subspace Sparse Representation (LRSR) which not only recovers the low-rank subspaces but also gets a relatively sparse segmentation with respect to disjoint subspaces or even overlapping subspaces. BOTH UCI Machine Learning Repository, and crime database are the best to find and compare the best subspace clustering algorithm that fit for high dimensional space data. We used many Open-Source Machine Learning Frameworks and Tools for both employ our machine learning tasks and methods including preparing, transforming, clustering and visualizing the high-dimensional crime dataset, we precisely have used the most modern and powerful Machine Learning Frameworks data science that known as SciKit-Learn for library for the Python programming language, as well as we have used R, and Matlab in previous experiment

    Complex queries and complex data

    Get PDF
    With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale MarktplĂ€tze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten veröffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu können bedarf es effizienter und effektiver Algorithmen zur Ähnlichkeitssuche, die zu einem gegebenen Anfrageobjekt Ă€hnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der Ähnlichkeit ĂŒber unterschiedliche Datentypen und Anwendungen hinweg hat sich die Ähnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschĂ€ftigen wir uns mit einem speziellen AnfrageprĂ€dikat im Bereich der Ähnlichkeitsanfragen, mit k-nĂ€chste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nĂ€chsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nĂ€chsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszufĂŒhren nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen DatensĂ€tzen. Nach diesem Kapitel ĂŒber Ähnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell fĂŒr unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche AbhĂ€ngigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir AnfrageprĂ€dikate die nicht nur die Objekte zurĂŒckzugeben, die dem Anfrageobjekt am nĂ€chsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nĂ€chster Nachbar sind. Wir evaluieren die definierten AnfrageprĂ€dikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewĂ€hrleisten. Ausgehend von den Ergebnissen fĂŒr Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusĂ€tzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer öfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und höhere Laufzeiteffizienz auszeichnen, evaluieren wir außerdem Indexierungsverfahren fĂŒr BinĂ€rvektoren

    Automatic concept learning via information lattices

    Get PDF
    Concept learning is about distilling interpretable rules and concepts from data, a prelude to more advanced knowledge discovery and problem solving in creative domains such as art and science. While concept learning is pervasive in humans, current artificial intelligent (AI) systems are mostly good at either applying human-distilled rules (rule-based AI) or capturing patterns in a task-driven fashion (pattern recognition), but not at learning concepts in a human-interpretable way similar to human-induced rules and theory. This thesis introduces a new learning problem---Automatic Concept Learning (ACL)---targeting self-explanation and self-exploration as the two principal pursuits; correspondingly, it proposes a new learning model---Information Lattice Learning (ILL)---combining computational abstraction and probabilistic rule learning as the two principal components. Woven around the core idea of abstraction, the entire ACL framework is presented as a generalization of Shannon’s information lattice that further brings learning into the picture. The core idea of abstraction is cast as a hierarchical, interpretable, data-free, and task-free clustering problem, seeded from universal priors such as simple algebra and symmetries. The main body of the thesis comprises three self-contained yet close-knit parts: theory, algorithms, and applications. The theory part presents the mathematical exposition of ACL, formalizing the key notions of abstraction, concept, probabilistic rule, and further the entire concept learning problem. The goal is to lay down a solid path towards algorithmic means that are computationally feasible, reliable, and human-interpretable. The algorithms part presents the computational development of ACL, that is, ILL. It puts together computational abstraction and statistical learning in the same algorithmic picture, creating a bridge that connects deductive (rule-based) and inductive (data-driven) approaches in AI. Aiming for human interpretability and model transparency in particular, ILL in many ways mimics human learning. This includes mechanism-driven abstraction generation, as well as a "teacher-student" loop that can distill customizable traces of rules for data summarization and data explanation. The applications part recapitulates the theory and algorithms through concrete examples. Music is used for demonstration and automatic music concept learning is thoroughly studied. This part details the implementation of MUS-ROVER, an automatic music theorist that distills music composition rules from sheet music. To better support music ACL and music AI in general, the twin system MUS-NET is built as a crowdsourcing platform for making and serving digital sheet music data sets

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution

    Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts

    Get PDF
    Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world. This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift. To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly. We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks
    • 

    corecore