81 research outputs found
Efficient algorithms for reconstructing gene content by co-evolution
<p>Abstract</p> <p>Background</p> <p>In a previous study we demonstrated that co-evolutionary information can be utilized for improving the accuracy of ancestral gene content reconstruction. To this end, we defined a new computational problem, the Ancestral Co-Evolutionary (ACE) problem, and developed algorithms for solving it.</p> <p>Results</p> <p>In the current paper we generalize our previous study in various ways. First, we describe new efficient computational approaches for solving the ACE problem. The new approaches are based on reductions to classical methods such as linear programming relaxation, quadratic programming, and min-cut. Second, we report new computational hardness results related to the ACE, including practical cases where it can be solved in polynomial time.</p> <p>Third, we generalize the ACE problem and demonstrate how our approach can be used for inferring parts of the genomes of <it>non-ancestral</it> organisms. To this end, we describe a heuristic for finding the portion of the genome ('dominant set’) that can be used to reconstruct the rest of the genome with the lowest error rate. This heuristic utilizes both evolutionary information and co-evolutionary information.</p> <p>We implemented these algorithms on a large input of the ACE problem (95 unicellular organisms, 4,873 protein families, and 10, 576 of co-evolutionary relations), demonstrating that some of these algorithms can outperform the algorithm used in our previous study. In addition, we show that based on our approach a ’dominant set’ cab be used reconstruct a major fraction of a genome (up to 79%) with relatively low error-rate (<it>e.g.</it> 0.11). We find that the ’dominant set’ tends to include metabolic and regulatory genes, with high evolutionary rate, and low protein abundance and number of protein-protein interactions.</p> <p>Conclusions</p> <p>The <it>ACE</it> problem can be efficiently extended for inferring the genomes of organisms that exist today. In addition, it may be solved in polynomial time in many practical cases. Metabolic and regulatory genes were found to be the most important groups of genes necessary for reconstructing gene content of an organism based on other related genomes.</p
Discovering local patterns of co - evolution: computational aspects and biological examples
<p>Abstract</p> <p>Background</p> <p>Co-evolution is the process in which two (or more) sets of orthologs exhibit a similar or correlative pattern of evolution. Co-evolution is a powerful way to learn about the functional interdependencies between sets of genes and cellular functions and to predict physical interactions. More generally, it can be used for answering fundamental questions about the evolution of biological systems.</p> <p>Orthologs that exhibit a strong signal of co-evolution in a certain part of the evolutionary tree may show a mild signal of co-evolution in other branches of the tree. The major reasons for this phenomenon are noise in the biological input, genes that gain or lose functions, and the fact that some measures of co-evolution relate to rare events such as positive selection. Previous publications in the field dealt with the problem of finding sets of genes that co-evolved along an entire underlying phylogenetic tree, without considering the fact that often co-evolution is local.</p> <p>Results</p> <p>In this work, we describe a new set of biological problems that are related to finding patterns of <it>local </it>co-evolution. We discuss their computational complexity and design algorithms for solving them. These algorithms outperform other bi-clustering methods as they are designed specifically for solving the set of problems mentioned above.</p> <p>We use our approach to trace the co-evolution of fungal, eukaryotic, and mammalian genes at high resolution across the different parts of the corresponding phylogenetic trees. Specifically, we discover regions in the fungi tree that are enriched with positive evolution. We show that metabolic genes exhibit a remarkable level of co-evolution and different patterns of co-evolution in various biological datasets.</p> <p>In addition, we find that protein complexes that are related to gene expression exhibit non-homogenous levels of co-evolution across different parts of the <it>fungi </it>evolutionary line. In the case of mammalian evolution, signaling pathways that are related to <it>neurotransmission </it>exhibit a relatively higher level of co-evolution along the <it>primate </it>subtree.</p> <p>Conclusions</p> <p>We show that finding local patterns of co-evolution is a computationally challenging task and we offer novel algorithms that allow us to solve this problem, thus opening a new approach for analyzing the evolution of biological systems.</p
HIV Prevention in Care and Treatment Settings: Baseline Risk Behaviors among HIV Patients in Kenya, Namibia, and Tanzania.
HIV care and treatment settings provide an opportunity to reach people living with HIV/AIDS (PLHIV) with prevention messages and services. Population-based surveys in sub-Saharan Africa have identified HIV risk behaviors among PLHIV, yet data are limited regarding HIV risk behaviors of PLHIV in clinical care. This paper describes the baseline sociodemographic, HIV transmission risk behaviors, and clinical data of a study evaluating an HIV prevention intervention package for HIV care and treatment clinics in Africa. The study was a longitudinal group-randomized trial in 9 intervention clinics and 9 comparison clinics in Kenya, Namibia, and Tanzania (N = 3538). Baseline participants were mostly female, married, had less than a primary education, and were relatively recently diagnosed with HIV. Fifty-two percent of participants had a partner of negative or unknown status, 24% were not using condoms consistently, and 11% reported STI symptoms in the last 6 months. There were differences in demographic and HIV transmission risk variables by country, indicating the need to consider local context in designing studies and using caution when generalizing findings across African countries. Baseline data from this study indicate that participants were often engaging in HIV transmission risk behaviors, which supports the need for prevention with PLHIV (PwP). TRIAL REGISTRATION: ClinicalTrials.gov NCT01256463
Predictors of linkage to care following community-based HIV counseling and testing in rural Kenya
Despite innovations in HIV counseling and testing (HCT), important gaps remain in understanding linkage to care. We followed a cohort diagnosed with HIV through a community-based HCT campaign that trained persons living with HIV/AIDS (PLHA) as navigators. Individual, interpersonal, and institutional predictors of linkage were assessed using survival analysis of self-reported time to enrollment. Of 483 persons consenting to follow-up, 305 (63.2%) enrolled in HIV care within 3 months. Proportions linking to care were similar across sexes, barring a sub-sample of men aged 18–25 years who were highly unlikely to enroll. Men were more likely to enroll if they had disclosed to their spouse, and women if they had disclosed to family. Women who anticipated violence or relationship breakup were less likely to link to care. Enrolment rates were significantly higher among participants receiving a PLHA visit, suggesting that a navigator approach may improve linkage from community-based HCT campaigns.Vestergaard Frandse
Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization
<p>Abstract</p> <p>Background</p> <p>Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer.</p> <p>Results</p> <p>The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment.</p> <p>Conclusion</p> <p>Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at <it><url>http://www.visualgenedeveloper.net</url></it>.</p
Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells
<p>Abstract</p> <p>Background</p> <p>The ability to predict the spatial frequency of relapses in multiple sclerosis (MS) would enable physicians to decide when to intervene more aggressively and to plan clinical trials more accurately.</p> <p>Methods</p> <p>In the current study our objective was to determine if subsets of genes can predict the time to the next acute relapse in patients with MS. Data-mining and predictive modeling tools were utilized to analyze a gene-expression dataset of 94 non-treated patients; 62 patients with definite MS and 32 patients with clinically isolated syndrome (CIS). The dataset included the expression levels of 10,594 genes and annotated sequences corresponding to 22,215 gene-transcripts that appear in the microarray.</p> <p>Results</p> <p>We designed a two stage predictor. The first stage predictor was based on the expression level of 10 genes, and predicted the time to next relapse with a resolution of 500 days (error rate 0.079, p < 0.001). If the predicted relapse was to occur in less than 500 days, a second stage predictor based on an additional different set of 9 genes was used to give a more accurate estimation of the time till the next relapse (in resolution of 50 days). The error rate of the second stage predictor was 2.3 fold lower than the error rate of random predictions (error rate = 0.35, p < 0.001). The predictors were further evaluated and found effective both for untreated MS patients and for MS patients that subsequently received immunomodulatory treatments after the initial testing (the error rate of the first level predictor was < 0.18 with p < 0.001 for all the patient groups).</p> <p>Conclusion</p> <p>We conclude that gene expression analysis is a valuable tool that can be used in clinical practice to predict future MS disease activity. Similar approach can be also useful for dealing with other autoimmune diseases that characterized by relapsing-remitting nature.</p
HIV Promoter Integration Site Primarily Modulates Transcriptional Burst Size Rather Than Frequency
Mammalian gene expression patterns, and their variability across populations of cells, are regulated by factors specific to each gene in concert with its surrounding cellular and genomic environment. Lentiviruses such as HIV integrate their genomes into semi-random genomic locations in the cells they infect, and the resulting viral gene expression provides a natural system to dissect the contributions of genomic environment to transcriptional regulation. Previously, we showed that expression heterogeneity and its modulation by specific host factors at HIV integration sites are key determinants of infected-cell fate and a possible source of latent infections. Here, we assess the integration context dependence of expression heterogeneity from diverse single integrations of a HIV-promoter/GFP-reporter cassette in Jurkat T-cells. Systematically fitting a stochastic model of gene expression to our data reveals an underlying transcriptional dynamic, by which multiple transcripts are produced during short, infrequent bursts, that quantitatively accounts for the wide, highly skewed protein expression distributions observed in each of our clonal cell populations. Interestingly, we find that the size of transcriptional bursts is the primary systematic covariate over integration sites, varying from a few to tens of transcripts across integration sites, and correlating well with mean expression. In contrast, burst frequencies are scattered about a typical value of several per cell-division time and demonstrate little correlation with the clonal means. This pattern of modulation generates consistently noisy distributions over the sampled integration positions, with large expression variability relative to the mean maintained even for the most productive integrations, and could contribute to specifying heterogeneous, integration-site-dependent viral production patterns in HIV-infected cells. Genomic environment thus emerges as a significant control parameter for gene expression variation that may contribute to structuring mammalian genomes, as well as be exploited for survival by integrating viruses
- …