813 research outputs found

    Quantifying joint behavioral states in zebrafish (Danio rerio) dyadic contests through interpretable variables

    Get PDF
    Tese de Mestrado, Engenharia Biomédica e Biofísica, 2021, Universidade de Lisboa, Faculdade de CiênciasO comportamento animal é uma área fascinante do ponto de vista físico, no entanto ainda existem vários desafios associados à construção de modelos ou ao desenvolvimento de teorias do comportamento em física. Um dos desafios é desenvolver modelos diretamente dos dados, eliminando o viés antropocêntrico que existe na definição de estados comportamentais. Um bom exemplo da complexidade associada ao comportamento pode ser encontrado em interações sociais, nomeadamente interações agonistas entre peixes-zebra (Danio rerio). Estas interações são bem compreendidas e estereotípicas, e existem catálogos a descrever os estados comportamentais associados a cada fase da interação. Isto e a versatilidade genética a que o peixe zebra se encontra associado, tornam esta interação ideal para o nosso estudo. O nosso objetivo principal consiste na tentativa de derivar um conjunto de estados comportamentais diretamente a partir dos dados experimentais obtidos, sendo estes estados definidos para o conjunto, e não individualmente. Fazemos isso sob a assunção de em interações sociais, estados comportamentais dependem dos elementos envolvidos nessa interação (neste caso, são peixes-zebra) e que esta não é completamente descrita, exceto se levar ambos em conta simultaneamente. Os dados são esqueletos tridimensionais dos 2 peixes-zebra num volume. O processo de aquisição desses dados consiste na aquisição de imagens em 3 planos bidimensionais com câmaras de alta definição, e um pipeline de processamento, que combina várias redes neuronais para a identificação de pontos corporais, a atribuição de identidade temporal aos peixes envolvidos, e a interpolação das imagens nos diferentes planos. Este processo permite a conversão de vários vídeos em sinais temporais que podem ser manipulados e processados de forma adequada. Usamos variáveis interpretáveis, no caso, a distância, os alinhamentos de direcção e aceleração, e os ritmos de batimento de cauda. Essas variáveis embora sejam simples, podem dizer bastante informação sobre a natureza do comportamento, sendo úteis numa exploração inicial. Definimos estados comportamentais compostos (colecção de vários comportamentos efectuados pelos peixes ao longo de um determinado período de tempo) e exploramos a dinâmica de uma luta nesta descrição simplificada. O sistema que resulta das variáveis definidas possui 6 dimensões, projectamos esse sistema para um plano bidimensional para melhor análise. Efectua-se um histograma das novas variáveis, e ter uma estimativa da densidade de probabilidade através da convolução do mesmo com uma gaussiana bidimensional. Detecta-se os picos de densidade, que neste sistema podem ser interpretados como estados comportamentais. Com essa descrição é possível gerar uma sequência simbólica que representa a dinâmica da interacção como sendo a transição entre vários estados comportamentais discretos. Constrói-se uma matriz que representa a transição entre os vários estados, e por decomposição espectral pode-se observar o comportamento dos valores próprios em função do número de transições e é possível decompor os estados em vários conjuntos através dos vectores próprios, cuja dinâmica entre eles é representada pelo valor próprio associado. Através da sequência simbólica é possível uma descrição da interacção entre os elementos, tendo inclusive informação sobre a escala temporal associada à dinâmica entre esses estados. Ao associar os clusters aos diferentes estados comportamentais compostos definidos previamente, é possível ver que certos clusters se encontram associados, e apresenta uma certa estrutura, que pode ser representativa da dinâmica real. Também é possível determinar a escala temporal de interações entre diferentes conjuntos de clusters. Foi possível determinar que os comportamentos ocorrem em escalas temporais maiores do que a escala típica para processo de Markov, e a escala temporal mais elevada se encontra associada a transição entre estados associados à agressão entre o par, e estados associados aos períodos entre lutas. Mostramos que é possível obter uma estrutura comportamental da luta entre dois peixes-zebra utilizando as variáveis simples que definimos. Isto é um framework que permite explorar a dinâmica da sua interação em maior detalhe, a utilizar variáveis ou representações mais precisas, que podem não ser interpretáveis

    Embedding Approaches for Relational Data

    Get PDF
    ​Embedding methods for searching latent representations of the data are very important tools for unsupervised and supervised machine learning as well as information visualisation. Over the years, such methods have continually progressed towards the ability to capture and analyse the structure and latent characteristics of larger and more complex data. In this thesis, we examine the problem of developing efficient and reliable embedding methods for revealing, understanding, and exploiting the different aspects of the relational data. We split our work into three pieces, where each deals with a different relational data structure. In the first part, we are handling with the weighted bipartite relational structure. Based on the relational measurements between two groups of heterogeneous objects, our goal is to generate low dimensional representations of these two different types of objects in a unified common space. We propose a novel method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimisation despatched using matrix decomposition. And we have also proposed a simple way of measuring the conformity between the original object relations and the ones re-estimated from the embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. We show that our proposed method achieves consistently better or on-par results on multiple synthetic datasets and real world ones from the text mining domain when compared with existing embedding generation approaches. In the second part of this thesis, we focus on the multi-relational data, where objects are interlinked by various relation types. Embedding approaches are very popular in this field, they typically encode objects and relation types with hidden representations and use the operations between them to compute the positive scalars corresponding to the linkages' likelihood score. In this work, we aim at further improving the existing embedding techniques by taking into account the multiple facets of the different patterns and behaviours of each relation type. To the best of our knowledge, this is the first latent representation model which considers relational representations to be dependent on the objects they relate in this field. The multi-modality of the relation type over different objects is effectively formulated as a projection matrix over the space spanned by the object vectors. Two large benchmark knowledge bases are used to evaluate the performance with respect to the link prediction task. And a new test data partition scheme is proposed to offer a better understanding of the behaviour of a link prediction model. In the last part of this thesis, a much more complex relational structure is considered. In particular, we aim at developing novel embedding methods for jointly modelling the linkage structure and objects' attributes. Traditionally, link prediction task is carried out on either the linkage structure or the objects' attributes, which does not aware of their semantic connections and is insufficient for handling the complex link prediction task. Thus, our goal in this work is to build a reliable model that can fuse both sources of information to improve the link prediction problem. The key idea of our approach is to encode both the linkage validities and the nodes neighbourhood information into embedding-based conditional probabilities. Another important aspect of our proposed algorithm is that we utilise a margin-based contrastive training process for encoding the linkage structure, which relies on a more appropriate assumption and dramatically reduces the number of training links. In the experiments, our proposed method indeed improves the link prediction performance on three citation/hyperlink datasets, when compared with those methods relying on only the nodes' attributes or the linkage structure, and it also achieves much better performances compared with the state-of-arts

    On deep generative modelling methods for protein-protein interaction

    Get PDF
    Proteins form the basis for almost all biological processes, identifying the interactions that proteins have with themselves, the environment, and each other are critical to understanding their biological function in an organism, and thus the impact of drugs designed to affect them. Consequently a significant body of research and development focuses on methods to analyse and predict protein structure and interactions. Due to the breadth of possible interactions and the complexity of structures, \textit{in sillico} methods are used to propose models of both interaction and structure that can then be verified experimentally. However the computational complexity of protein interaction means that full physical simulation of these processes requires exceptional computational resources and is often infeasible. Recent advances in deep generative modelling have shown promise in correctly capturing complex conditional distributions. These models derive their basic principles from statistical mechanics and thermodynamic modelling. While the learned functions of these methods are not guaranteed to be physically accurate, they result in a similar sampling process to that suggested by the thermodynamic principles of protein folding and interaction. However, limited research has been applied to extending these models to work over the space of 3D rotation, limiting their applicability to protein models. In this thesis we develop an accelerated sampling strategy for faster sampling of potential docking locations, we then address the rotational diffusion limitation by extending diffusion models to the space of SO(3)SO(3) and finally present a framework for the use of this rotational diffusion model to rigid docking of proteins

    Tensor-variate machine learning on graphs

    Get PDF
    Traditional machine learning algorithms are facing significant challenges as the world enters the era of big data, with a dramatic expansion in volume and range of applications and an increase in the variety of data sources. The large- and multi-dimensional nature of data often increases the computational costs associated with their processing and raises the risks of model over-fitting - a phenomenon known as the curse of dimensionality. To this end, tensors have become a subject of great interest in the data analytics community, owing to their remarkable ability to super-compress high-dimensional data into a low-rank format, while retaining the original data structure and interpretability. This leads to a significant reduction in computational costs, from an exponential complexity to a linear one in the data dimensions. An additional challenge when processing modern big data is that they often reside on irregular domains and exhibit relational structures, which violates the regular grid assumptions of traditional machine learning models. To this end, there has been an increasing amount of research in generalizing traditional learning algorithms to graph data. This allows for the processing of graph signals while accounting for the underlying relational structure, such as user interactions in social networks, vehicle flows in traffic networks, transactions in supply chains, chemical bonds in proteins, and trading data in financial networks, to name a few. Although promising results have been achieved in these fields, there is a void in literature when it comes to the conjoint treatment of tensors and graphs for data analytics. Solutions in this area are increasingly urgent, as modern big data is both large-dimensional and irregular in structure. To this end, the goal of this thesis is to explore machine learning methods that can fully exploit the advantages of both tensors and graphs. In particular, the following approaches are introduced: (i) Graph-regularized tensor regression framework for modelling high-dimensional data while accounting for the underlying graph structure; (ii) Tensor-algebraic approach for computing efficient convolution on graphs; (iii) Graph tensor network framework for designing neural learning systems which is both general enough to describe most existing neural network architectures and flexible enough to model large-dimensional data on any and many irregular domains. The considered frameworks were employed in several real-world applications, including air quality forecasting, protein classification, and financial modelling. Experimental results validate the advantages of the proposed methods, which achieved better or comparable performance against state-of-the-art models. Additionally, these methods benefit from increased interpretability and reduced computational costs, which are crucial for tackling the challenges posed by the era of big data.Open Acces

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Current overview and way forward for the use of machine learning in the field of petroleum gas hydrates

    Get PDF
    Gas hydrates represent one of the main flow assurance challenges in the oil and gas industry as they can lead to plugging of pipelines and process equipment. In this paper we present a literature study performed to evaluate the current state of the use of machine learning methods within the field of gas hydrates with specific focus on the oil chemistry. A common analysis technique for crude oils is Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) which could be a good approach to achieving a better understanding of the chemical composition of hydrates, and the use of machine learning in the field of FT-ICR MS was therefore also examined. Several machine learning methods were identified as promising, their use in the literature was reviewed and a text analysis study was performed to identify the main topics within the publications. The literature search revealed that the publications on the combination of FT-ICR MS, machine learning and gas hydrates is limited to one. Most of the work on gas hydrates is related to thermodynamics, while FT-ICR MS is mostly used for chemical analysis of oils. However, with the combination of FT-ICR MS and machine learning to evaluate samples related to gas hydrates, it could be possible to improve the understanding of the composition of hydrates and thereby identify hydrate active compounds responsible for the differences between oils forming plugging hydrates and oils forming transportable hydrates.Current overview and way forward for the use of machine learning in the field of petroleum gas hydratespublishedVersio

    Investigating Information Flows in Spiking Neural Networks With High Fidelity

    Get PDF
    The brains of many organisms are capable of a wide variety of complex computations. This capability must be undergirded by a more general purpose computational capacity. The exact nature of this capacity, how it is distributed across the brains of organisms and how it arises throughout the course of development is an open topic of scientific investigation. Individual neurons are widely considered to be the fundamental computational units of brains. Moreover, the finest scale at which large scale recordings of brain activity can be performed is the spiking activity of neurons and our ability to perform these recordings over large numbers of neurons and with fine spatial resolution is increasing rapidly. This makes the spiking activity of individual neurons a highly attractive data modality on which to study neural computation. The framework of information dynamics has proven to be a successful approach towards interrogating the capacity for general purpose computation. It does this by revealing the atomic information processing operations of information storage, transfer and modification. Unfortunately, the study of information flows and other information processing operations from the spiking activity of neurons has been severely hindered by the lack of effective tools for estimating these quantities on this data modality. This thesis remedies this situation by presenting an estimator for information flows, as measured by Transfer Entropy (TE), that operates in continuous time on event-based data such as spike trains. Unlike the previous approach to the estimation of this quantity, which discretised the process into time bins, this estimator operates on the raw inter-spike intervals. It is demonstrated to be far superior to the previous discrete-time approach in terms of consistency, rate of convergence and bias. Most importantly, unlike the discrete-time approach, which requires a hard tradeoff between capturing fine temporal precision or history effects occurring over reasonable time intervals, this estimator can capture history effects occurring over relatively large intervals without any loss of temporal precision. This estimator is applied to developing dissociated cultures of cortical rat neurons, therefore providing the first high-fidelity study of information flows on spiking data. It is found that the spatial structure of the flows locks in to a significant extent. at the point of their emergence and that certain nodes occupy specialised computational roles as either transmitters, receivers or mediators of information flow. Moreover, these roles are also found to lock in early. In order to fully understand the structure of neural information flows, however, we are required to go beyond pairwise interactions, and indeed multivariate information flows have become an important tool in the inference of effective networks from neuroscience data. These are directed networks where each node is connected to a minimal set of sources which maximally reduce the uncertainty in its present state. However, the application of multivariate information flows to the inference of effective networks from spiking data has been hampered by the above-mentioned issues with preexisting estimation techniques. Here, a greedy algorithm which iteratively builds a set of parents for each target node using multivariate transfer entropies, and which has already been well validated in the context of traditional discretely sampled time series, is adapted to use in conjunction with the newly developed estimator for event-based data. The combination of the greedy algorithm and continuous-time estimator is then validated on simulated examples for which the ground truth is known. The new capabilities in the estimation of information flows and the inference of effective networks on event-based data presented in this work represent a very substantial step forward in our ability to perform these analyses on the ever growing set of high resolution, large scale recordings of interacting neurons. As such, this work promises to enable substantial quantitative insights in the future regarding how neurons interact, how they process information, and how this changes under different conditions such as disease
    • …
    corecore