493 research outputs found

    Discovering study-specific gene regulatory networks

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Microarrays are commonly used in biology because of their ability to simultaneously measure thousands of genes under different conditions. Due to their structure, typically containing a high amount of variables but far fewer samples, scalable network analysis techniques are often employed. In particular, consensus approaches have been recently used that combine multiple microarray studies in order to find networks that are more robust. The purpose of this paper, however, is to combine multiple microarray studies to automatically identify subnetworks that are distinctive to specific experimental conditions rather than common to them all. To better understand key regulatory mechanisms and how they change under different conditions, we derive unique networks from multiple independent networks built using glasso which goes beyond standard correlations. This involves calculating cluster prediction accuracies to detect the most predictive genes for a specific set of conditions. We differentiate between accuracies calculated using cross-validation within a selected cluster of studies (the intra prediction accuracy) and those calculated on a set of independent studies belonging to different study clusters (inter prediction accuracy). Finally, we compare our method's results to related state-of-the art techniques. We explore how the proposed pipeline performs on both synthetic data and real data (wheat and Fusarium). Our results show that subnetworks can be identified reliably that are specific to subsets of studies and that these networks reflect key mechanisms that are fundamental to the experimental conditions in each of those subsets

    Profile Likelihood Biclustering

    Full text link
    Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.Comment: 40 pages, 11 figures; R package in development at https://github.com/patperry/biclustp

    Biclustering fMRI time series

    Get PDF
    Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2020Biclustering é um método de análise que procura gerar clusters tendo em conta simultaneamente as linhas e as colunas de uma matriz de dados. Este método tem sido vastamente explorado em análise de dados genéticos. Apesar de diversos estudos reconhecerem as capacidades deste método de análise em outras áreas de investigação, as últimas duas décadas tem sido marcadas por um número elevado de estudos aplicados em dados genéticos e pela ausência de uma linha de investigação que explore as capacidades de biclustering fora desta área tradicional Esta tese segue pistas que sugerem potencial no uso de biclustering em dados de natureza espaço-temporal. Considerando o contexto particular das neurociências, esta tese explora as capacidades dos algoritmos de biclustering em extrair conhecimento das séries temporais geradas por técnicas de imagem por ressonância magnética funcional (fMRI). Eta tese propõe uma metodologia para avaliar a capacidade de algoritmos de biclustering em estudar dados fMRI, considerando tanto dados sintéticos como dados reais. Para avaliar estes algoritmos, usamos métricas de avaliação interna. Os nossos resultados discutem o uso de diversas estratégias de busca, revelando a superioridade de estratégias exaustivos para obter os biclusters mais homogéneos. No entanto, o elevado custo computacional de estratégias exaustivas ainda são um desafio e é necessário pesquisa adicional para a busca eficiente de biclusters no contexto de análise de dados fMRI. Propomos adicionalmente uma nova metodologia de análise de biclusters baseada em algoritmos de descoberta de padrões para determinar os padrões mais frequentes presentes nas soluções de biclustering geradas. Um bicluster não é mais que um hipervértice num hipergrafo . Extrair padrões frequentes numa solução de biclustering implica extrair os hipervértices mais significativos. Numa primeira abordagem, isto permite entender relações entre regiões do cérebro e traçar perfis temporais que métodos tradicionais de estudos de correlação não são capazes de detetar. Adicionalmente, o processo de gerar os biclusters permite filtrar ligações pouco interessantes, permitindo potencialmente gerar hipergrafos de forma eficiente. A questão final é o que podemos fazer com este conhecimento. Conhecer a relação entre regiões do cérebro é o objetivo central das neurociências. Entender as ligações entre regiões do cérebro para vários sujeitos permitem traçar perfis. Nesse caso, propomos uma metodologia para extrapolar biclusters para dados tridimensionais e efetuar triclustering. Adicionalmente, entender a ligação entre zonas cerebrais permite identificar doenças como a esquizofrenia, demência ou o Alzheimer. Este trabalho aponta caminhos para o uso de biclustering na análise de dados espaço-temporais, em particular em neurociências. A metodologia de avaliação proposta mostra evidências da eficácia do biclustering para encontrar padrões locais em dados de fMRI, embora mais trabalhos sejam necessários em relação à escalabilidade para promover a aplicação em cenários reais.The effectiveness of biclustering, simultaneous clustering of both rows and columns in a data matrix, has been primarily shown in gene expression data analysis. Furthermore, several researchers recognize its potentialities in other research areas. Nevertheless, the last two decades witnessed many biclustering algorithms targeting gene expression data analysis and a lack of consistent studies exploring the capacities of biclustering outside this traditional application domain. Following hints that suggest potentialities for biclustering on Spatiotemporal data, particularly in neurosciences, this thesis explores biclustering’s capacity to extract knowledge from fMRI time series. This thesis proposes a methodology to evaluate biclustering algorithms’ feasibility to study the fMRI signal, considering both synthetic and realworld fMRI datasets. In the absence of ground truth to compare bicluster solutions with a reference one, we used internal valuation metrics. Results discussing the use of different search strategies showed the superiority of exhaustive approaches, obtaining the most homogeneous biclusters. However, their high computational cost is still a challenge, and further work is needed for the efficient use of biclustering in fMRI data analysis. We propose a new methodology for analyzing biclusters based on performing pattern mining algorithms to determine the most frequent patterns present in the generated biclustering solutions. A bicluster is nothing more than a hyperlink in a hypergraph. Extracting frequent patterns in a biclustering solution implies extracting the most significant hyperlinks. In a first approach, this allows to understand relationships between regions of the brain and draw temporal profiles that traditional methods of correlation studies cannot detect. Additionally, the process of generating biclusters allows filtering uninteresting links, potentially allowing to generate hypergraphs efficiently. The final question is, what can we do with this knowledge. Knowing the relationship between brain regions is the central objective of neurosciences. Understanding the connections between regions of the brain for various subjects allows one to draw profiles. In this case, we propose a methodology to extrapolate biclusters to threedimensional data and perform triclustering. Additionally, understanding the link between brain zones allows identifying diseases like schizophrenia, dementia, or Alzheimer’s. This work pinpoints avenues for the use of biclustering in Spatiotemporal data analysis, in particular neurosciences applications. The proposed evaluation methodology showed evidence of biclustering’s effectiveness in finding local fMRI data patterns, although further work is needed regarding scalability to promote the application in real scenarios
    corecore