97 research outputs found

    An evolutionary Monte Carlo algorithm for identifying short adjacent repeats in multiple sequences

    Get PDF
    Evolutionary Monte Carlo (EMC) algorithm is an effective and powerful method to sample complicated distributions. Short adjacent repeats identification problem (SARIP), i.e., searching for the common sequence pattern in multiple DNA sequences, is considered as one of the key challenges in the field of bioinformatics. A recently proposed Markov chain Monte Carlo (MCMC) algorithm has demonstrated its effectiveness in solving SARIP. However, high computation time and inevitable local optima hinder its wide application. In this paper, we apply EMC to parallelize the MCMC algorithm to solve SARIP. Our proposed EMC scheme is implemented on a parallel platform and the simulation results show that, compared with the conventional MCMC algorithm, EMC not only improves the quality of final solution but also reduces the computation time. ©2010 IEEE.published_or_final_versionThe 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Hong Kong, China, 18-21 December 2010. In Proceedings of BIBM, 2010, p. 643-64

    Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

    Get PDF
    Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≄ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer

    Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning

    Full text link
    Background: Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results: In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifcally designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion: HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa.Comment: 16 pages, 14 figure

    Annual Report, 2017-2018

    Get PDF

    Rf sensing and processing methods for noninvasive health monitoring

    Get PDF
    Vulnerable populations include groups of people with a higher risk of poor health as a result of the limitations due to illness or disability. The health issues of vulnerable populations include three categories: physical, psychological, and social. The people with physical issues include high-risk mothers and infants, older adults and others with chronic illnesses and people with disabilities. The psychological issues of vulnerable populations include chronic mental conditions, such as bipolar disorder, major depression, and hyperactivity disorder, as well as substance abuse and those who are suicidal. The social issues in vulnerable populations include those living in abusive families, the homeless, etc. This dissertation concentrates on methods for helping two groups of vulnerable populations, namely, frail older adults and psychiatric hospital patients, to monitor their activity level, respiration rate, sleeping quality, and restless time in bed. In the first part of our work, we investigate a contactless monitoring system for psychiatric patients in a naturalistic hospital setting that can track their motion in bed, estimate the breathing rate of patients during their peaceful sleeping periods, and can be used to estimate a patient's restless time and sleep quality. Specifically, the contactless monitoring system uses a Vayyar Radar system with a carrier frequency of 6.014 GHz to capture all reflections by the FMCW (frequency modulation continuous waveform) signal. The Vayyar Radar system has been installed in a Psychiatric Center to capture 12 nights with over 135 hours of data from 7 patients. A depth camera and a thermal camera have also been installed and are used as the ground truth. The goal is to classify in bed and out of bed classes, quantify restlessness in bed, and determine the breathing rate while patients are lying in bed. We have simulated the psychiatric hospital set-up in the lab, where a respiration belt is used for ground truth, and tested the system with body postures of patients observed in the psychiatric hospital. We estimated respiration rate with different sleep postures, with the aim of investigating a contactless monitoring system for psychiatric patients in the hospital that can estimate the breathing rate of patients during typical sleeping postures, and find the torso area when the patients use other postures, such as reading books in bed or reversing the body on the bed. In the second part of our work, we investigate two methods for learning the room structure via radio wave reflections for longitudinal health monitoring of older adults in a naturalistic home setting. The goal is to use these data as part of a monitoring system that can be easily installed in a home with minimal configuration, for the purpose of detecting very early signs of illness and functional decline. Two studies are conducted using RF (radio frequency) sensing. The first method learns the structure from the RF clutter patterns and uses the beat frequency of the maximum peak in each chirp to calculate the wall position. The second method learns the room structure from active movement patterns and uses the open space between the clusters of active movement patterns to estimate the possible wall locations. Comparing the two results from these methods provides a more robust wall location. In addition, a background filter is designed based on the wall position, and the activity level of people in different rooms is estimated using a fuzzy rule system applied to the RF motion data
    • 

    corecore