675 research outputs found

    Active transitivity clustering of large-scale biomedical datasets

    Get PDF
    Clustering is a popular computational approach for partitioning data sets into groups of objects that share common traits. Due to recent advances in wet-lab technology, the amount of available biological data grows exponentially and increasingly poses problems in terms of computational complexity for current clustering approaches. In this thesis, we introduce two novel approaches, TransClustMV and ActiveTransClust, that enable the handling of large scale datasets by reducing the amount of required information drastically by means of exploiting missing values. Furthermore, there exists a plethora of different clustering tools and standards making it very difficult for researchers to choose the correct methods for a given problem. In order to clarify this multifarious field, we developed ClustEval which streamlines the clustering process and enables practitioners conducting large-scale cluster analyses in a standardized and bias-free manner. We conclude the thesis by demonstrating the power of clustering tools and the need for the previously developed methods by conducting real-world analyses. We transferred the regulatory network of E. coli K-12 to pathogenic EHEC organisms based on evolutionary conservation therefore avoiding tedious and potentially dangerous wet-lab experiments. In another example, we identify pathogenicity specific core genomes of actinobacteria in order to identify potential drug targets.Clustering ist ein populĂ€rer Ansatz um DatensĂ€tze in Gruppen Ă€hnlicher Objekte zu partitionieren. Nicht zuletzt aufgrund der jĂŒngsten Fortschritte in der Labortechnik wĂ€chst die Menge der biologischen Daten exponentiell und stellt zunehmend ein Problem fĂŒr heutige Clusteralgorithmen dar. Im Rahmen dieser Arbeit stellen wir zwei neue AnsĂ€tze, TransClustMV und ActiveTransClust, vor die auch das Bearbeiten sehr großer DatensĂ€tze ermöglichen, indem sie den Umfang der benötigten Informationen drastisch reduzieren da fehlende Werte kompensiert werden können. Allein die schiere Vielfalt der vorhanden Cluster-Methoden und Standards stellt den Anwender darĂŒber hinaus vor das Problem, den am besten geeigneten Algorithmus fĂŒr das vorliegende Problem zu wĂ€hlen. ClustEval wurde mit dem Ziel entwickelt, diese UnĂŒbersichtlichkeit zu beseitigen und gleichzeitig die Clusteranalyse zu vereinheitlichen und zu automatisieren um auch aufwendige Clusteranalysen zu realisieren. Abschließend demonstrieren wir die NĂŒtzlichkeit von Clustering anhand von realen AnwendungsfĂ€llen die darĂŒber hinaus auch den Bedarf der zuvor entwickelten Methoden aufzeigen. Wir haben das genregulatorische Netzwerk von E. coli K-12 ohne langwierige und potentiell gefĂ€hrliche Laborarbeit auf pathogene EHEC StĂ€mme ĂŒbertragen. In einem weiteren Beispiel bestimmen wir das pathogenitĂ€tsspeziefische „Kerngenom“ von Actinobakterien um potenzielle Angriffspunkte fĂŒr Medikamente zu identifizieren

    Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

    Full text link
    Abstract Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.http://deepblue.lib.umich.edu/bitstream/2027.42/134522/1/13742_2016_Article_117.pd

    Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms

    Get PDF
    Baumbach J, Rahmann S, Tauch A. Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms. BMC Systems Biology. 2009;3(1):8.Background: Transcriptional regulation of gene activity is essential for any living organism. Transcription factors therefore recognize specific binding sites within the DNA to regulate the expression of particular target genes. The genome-scale reconstruction of the emerging regulatory networks is important for biotechnology and human medicine but cost-intensive, time-consuming, and impossible to perform for any species separately. By using bioinformatics methods one can partially transfer networks from well-studied model organisms to closely related species. However, the prediction quality is limited by the low level of evolutionary conservation of the transcription factor binding sites, even within organisms of the same genus. Results: Here we present an integrated bioinformatics workflow that assures the reliability of transferred gene regulatory networks. Our approach combines three methods that can be applied on a large-scale: re-assessment of annotated binding sites, subsequent binding site prediction, and homology detection. A gene regulatory interaction is considered to be conserved if (1) the transcription factor, (2) the adjusted binding site, and (3) the target gene are conserved. The power of the approach is demonstrated by transferring gene regulations from the model organism Corynebacterium glutamicum to the human pathogens C. diphtheriae, C. jeikeium, and the biotechnologically relevant C. efficiens. For these three organisms we identified reliable transcriptional regulations for similar to 40% of the common transcription factors, compared to similar to 5% for which knowledge was available before. Conclusion: Our results suggest that trustworthy genome-scale transfer of gene regulatory networks between organisms is feasible in general but still limited by the level of evolutionary conservation

    MULTIVARIATE MODELING OF COGNITIVE PERFORMANCE AND CATEGORICAL PERCEPTION FROM NEUROIMAGING DATA

    Get PDF
    State-of-the-art cognitive-neuroscience mainly uses hypothesis-driven statistical testing to characterize and model neural disorders and diseases. While such techniques have proven to be powerful in understanding diseases and disorders, they are inadequate in explaining causal relationships as well as individuality and variations. In this study, we proposed multivariate data-driven approaches for predictive modeling of cognitive events and disorders. We developed network descriptions of both structural and functional connectivities that are critical in multivariate modeling of cognitive performance (i.e., fluency, attention, and working memory) and categorical perceptions (i.e., emotion, speech perception). We also performed dynamic network analysis on brain connectivity measures to determine the role of different functional areas in relation to categorical perceptions and cognitive events. Our empirical studies of structural connectivity were performed using Diffusion Tensor Imaging (DTI). The main objective was to discover the role of structural connectivity in selecting clinically interpretable features that are consistent over a large range of model parameters in classifying cognitive performances in relation to Acute Lymphoblastic Leukemia (ALL). The proposed approach substantially improved accuracy (13% - 26%) over existing models and also selected a relevant, small subset of features that were verified by domain experts. In summary, the proposed approach produced interpretable models with better generalization.Functional connectivity is related to similar patterns of activation in different brain regions regardless of the apparent physical connectedness of the regions. The proposed data-driven approach to the source localized electroencephalogram (EEG) data includes an array of tools such as graph mining, feature selection, and multivariate analysis to determine the functional connectivity in categorical perceptions. We used the network description to correctly classify listeners behavioral responses with an accuracy over 92% on 35 participants. State-of-the-art network description of human brain assumes static connectivities. However, brain networks in relation to perception and cognition are complex and dynamic. Analysis of transient functional networks with spatiotemporal variations to understand cognitive functions remains challenging. One of the critical missing links is the lack of sophisticated methodologies in understanding dynamics neural activity patterns. We proposed a clustering-based complex dynamic network analysis on source localized EEG data to understand the commonality and differences in gender-specific emotion processing. Besides, we also adopted Bayesian nonparametric framework for segmentation neural activity with a finite number of microstates. This approach enabled us to find the default network and transient pattern of the underlying neural mechanism in relation to categorical perception. In summary, multivariate and dynamic network analysis methods developed in this dissertation to analyze structural and functional connectivities will have a far-reaching impact on computational neuroscience to identify meaningful changes in spatiotemporal brain activities

    Brain connectivity analysis from EEG signals using stable phase-synchronized states during face perception tasks

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordDegree of phase synchronization between different Electroencephalogram (EEG) channels is known to be the manifestation of the underlying mechanism of information coupling between different brain regions. In this paper, we apply a continuous wavelet transform (CWT) based analysis technique on EEG data, captured during face perception tasks, to explore the temporal evolution of phase synchronization, from the onset of a stimulus. Our explorations show that there exists a small set (typically 3-5) of unique synchronized patterns or synchrostates, each of which are stable of the order of milliseconds. Particularly, in the beta (ÎČ) band, which has been reported to be associated with visual processing task, the number of such stable states has been found to be three consistently. During processing of the stimulus, the switching between these states occurs abruptly but the switching characteristic follows a well-behaved and repeatable sequence. This is observed in a single subject analysis as well as a multiple-subject group-analysis in adults during face perception. We also show that although these patterns remain topographically similar for the general category of face perception task, the sequence of their occurrence and their temporal stability varies markedly between different face perception scenarios (stimuli) indicating toward different dynamical characteristics for information processing, which is stimulus-specific in nature. Subsequently, we translated these stable states into brain complex networks and derived informative network measures for characterizing the degree of segregated processing and information integration in those synchrostates, leading to a new methodology for characterizing information processing in human brain. The proposed methodology of modeling the functional brain connectivity through the synchrostates may be viewed as a new way of quantitative characterization of the cognitive ability of the subject, stimuli and information integration/segregation capability.The work presented in this paper was supported by FP7 EU funded MICHELANGELO project, Grant Agreement #288241. Website: www.michelangelo-project.eu/
    • 

    corecore