16,835 research outputs found

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Novel modeling of task versus rest brain state predictability using a dynamic time warping spectrum: comparisons and contrasts with other standard measures of brain dynamics

    Get PDF
    Dynamic time warping, or DTW, is a powerful and domain-general sequence alignment method for computing a similarity measure. Such dynamic programming-based techniques like DTW are now the backbone and driver of most bioinformatics methods and discoveries. In neuroscience it has had far less use, though this has begun to change. We wanted to explore new ways of applying DTW, not simply as a measure with which to cluster or compare similarity between features but in a conceptually different way. We have used DTW to provide a more interpretable spectral description of the data, compared to standard approaches such as the Fourier and related transforms. The DTW approach and standard discrete Fourier transform (DFT) are assessed against benchmark measures of neural dynamics. These include EEG microstates, EEG avalanches, and the sum squared error (SSE) from a multilayer perceptron (MLP) prediction of the EEG time series, and simultaneously acquired FMRI BOLD signal. We explored the relationships between these variables of interest in an EEG-FMRI dataset acquired during a standard cognitive task, which allowed us to explore how DTW differentially performs in different task settings. We found that despite strong correlations between DTW and DFT-spectra, DTW was a better predictor for almost every measure of brain dynamics. Using these DTW measures, we show that predictability is almost always higher in task than in rest states, which is consistent to other theoretical and empirical findings, providing additional evidence for the utility of the DTW approach

    ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space

    Full text link
    Studying the function of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the determination of the function of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the determination of protein functions in living cells. In this paper, we propose ProtNN, a novel approach for protein function prediction. Given an unannotated protein structure and a set of annotated proteins, ProtNN finds the nearest neighbor annotated structures based on protein-graph pairwise similarities. Given a query protein, ProtNN finds the nearest neighbor reference proteins based on a graph representation model and a pairwise similarity between vector embedding of both query and reference protein-graphs in structural and topological spaces. ProtNN assigns to the query protein the function with the highest number of votes across the set of k nearest neighbor reference proteins, where k is a user-defined parameter. Experimental evaluation demonstrates that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude of runtime compared to state-of-the-art approaches

    Flexible Time Series Matching for Clinical and Behavioral Data

    Get PDF
    Time Series data became broadly applied by the research community in the last decades after a massive explosion of its availability. Nonetheless, this rise required an improvement in the existing analysis techniques which, in the medical domain, would help specialists to evaluate their patients condition. One of the key tasks in time series analysis is pattern recognition (segmentation and classification). Traditional methods typically perform subsequence matching, making use of a pattern template and a similarity metric to search for similar sequences throughout time series. However, real-world data is noisy and variable (morphological distortions), making a template-based exact matching an elementary approach. Intending to increase flexibility and generalize the pattern searching tasks across domains, this dissertation proposes two Deep Learning-based frameworks to solve pattern segmentation and anomaly detection problems. Regarding pattern segmentation, a Convolution/Deconvolution Neural Network is proposed, learning to distinguish, point-by-point, desired sub-patterns from background content within a time series. The proposed framework was validated in two use-cases: electrocardiogram (ECG) and inertial sensor-based human activity (IMU) signals. It outperformed two conventional matching techniques, being capable of notably detecting the targeted cycles even in noise-corrupted or extremely distorted signals, without using any reference template nor hand-coded similarity scores. Concerning anomaly detection, the proposed unsupervised framework uses the reconstruction ability of Variational Autoencoders and a local similarity score to identify non-labeled abnormalities. The proposal was validated in two public ECG datasets (MITBIH Arrhythmia and ECG5000), performing cardiac arrhythmia identification. Results indicated competitiveness relative to recent techniques, achieving detection AUC scores of 98.84% (ECG5000) and 93.32% (MIT-BIH Arrhythmia).Dados de séries temporais tornaram-se largamente aplicados pela comunidade científica nas últimas decadas após um aumento massivo da sua disponibilidade. Contudo, este aumento exigiu uma melhoria das atuais técnicas de análise que, no domínio clínico, auxiliaria os especialistas na avaliação da condição dos seus pacientes. Um dos principais tipos de análise em séries temporais é o reconhecimento de padrões (segmentação e classificação). Métodos tradicionais assentam, tipicamente, em técnicas de correspondência em subsequências, fazendo uso de um padrão de referência e uma métrica de similaridade para procurar por subsequências similares ao longo de séries temporais. Todavia, dados do mundo real são ruidosos e variáveis (morfologicamente), tornando uma correspondência exata baseada num padrão de referência uma abordagem rudimentar. Pretendendo aumentar a flexibilidade da análise de séries temporais e generalizar tarefas de procura de padrões entre domínios, esta dissertação propõe duas abordagens baseadas em Deep Learning para solucionar problemas de segmentação de padrões e deteção de anomalias. Acerca da segmentação de padrões, a rede neuronal de Convolução/Deconvolução proposta aprende a distinguir, ponto a ponto, sub-padrões pretendidos de conteúdo de fundo numa série temporal. O modelo proposto foi validado em dois casos de uso: sinais eletrocardiográficos (ECG) e de sensores inerciais em atividade humana (IMU). Este superou duas técnicas convencionais, sendo capaz de detetar os ciclos-alvo notavelmente, mesmo em sinais corrompidos por ruído ou extremamente distorcidos, sem o uso de nenhum padrão de referência nem métricas de similaridade codificadas manualmente. A respeito da deteção de anomalias, a técnica não supervisionada proposta usa a capacidade de reconstrução dos Variational Autoencoders e uma métrica de similaridade local para identificar anomalias desconhecidas. A proposta foi validada na identificação de arritmias cardíacas em duas bases de dados públicas de ECG (MIT-BIH Arrhythmia e ECG5000). Os resultados revelam competitividade face a técnicas recentes, alcançando métricas AUC de deteção de 93.32% (MIT-BIH Arrhythmia) e 98.84% (ECG5000)

    The SP theory of intelligence: benefits and applications

    Full text link
    This article describes existing and expected benefits of the "SP theory of intelligence", and some potential applications. The theory aims to simplify and integrate ideas across artificial intelligence, mainstream computing, and human perception and cognition, with information compression as a unifying theme. It combines conceptual simplicity with descriptive and explanatory power across several areas of computing and cognition. In the "SP machine" -- an expression of the SP theory which is currently realized in the form of a computer model -- there is potential for an overall simplification of computing systems, including software. The SP theory promises deeper insights and better solutions in several areas of application including, most notably, unsupervised learning, natural language processing, autonomous robots, computer vision, intelligent databases, software engineering, information compression, medical diagnosis and big data. There is also potential in areas such as the semantic web, bioinformatics, structuring of documents, the detection of computer viruses, data fusion, new kinds of computer, and the development of scientific theories. The theory promises seamless integration of structures and functions within and between different areas of application. The potential value, worldwide, of these benefits and applications is at least $190 billion each year. Further development would be facilitated by the creation of a high-parallel, open-source version of the SP machine, available to researchers everywhere.Comment: arXiv admin note: substantial text overlap with arXiv:1212.022
    corecore