451,086 research outputs found

    A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of <it>de novo </it>gene prediction programs, and annotation up-dating. We present a novel <it>in silico </it>procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the <it>in silico </it>outcome.</p> <p>Findings</p> <p>We used four criteria for <it>in silico </it>probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus <it>Podospora anserina </it>and the selection of a single 60-mer probe for each of the 10,556 <it>P. anserina </it>CDS.</p> <p>Conclusions</p> <p>A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.</p

    FoxO gene family evolution in vertebrates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Forkhead box, class O (FoxO) belongs to the large family of forkhead transcription factors that are characterized by a conserved forkhead box DNA-binding domain. To date, the FoxO group has four mammalian members: FoxO1, FoxO3a, FoxO4 and FoxO6, which are orthologs of DAF16, an insulin-responsive transcription factor involved in regulating longevity of worms and flies. The degree of homology between these four members is high, especially in the forkhead domain, which contains the DNA-binding interface. Yet, mouse FoxO knockouts have revealed that each FoxO gene has its unique role in the physiological process. Whether the functional divergences are primarily due to adaptive selection pressure or relaxed selective constraint remains an open question. As such, this study aims to address the evolutionary mode of FoxO, which may lead to the functional divergence.</p> <p>Results</p> <p>Sequence similarity searches have performed in genome and scaffold data to identify homologues of FoxO in vertebrates. Phylogenetic analysis was used to characterize the family evolutionary history by identifying two duplications early in vertebrate evolution. To determine the mode of evolution in vertebrates, we performed a rigorous statistical analysis with FoxO gene sequences, including relative rate ratio tests, branch-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, site-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, branch-site <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests and clade level amino acid conservation/variation patterns analysis. Our results suggest that FoxO is constrained by strong purifying selection except four sites in FoxO6, which have undergone positive Darwinian selection. The functional divergence in this family is best explained by either relaxed purifying selection or positive selection.</p> <p>Conclusion</p> <p>We present a phylogeny describing the evolutionary history of the FoxO gene family and show that the genes have evolved through duplications followed by purifying selection except for four sites in FoxO6 fixed by positive selection lie mostly within the non-conserved optimal PKB motif in the C-terminal part. Relaxed selection may play important roles in the process of functional differentiation evolved through gene duplications as well.</p

    Benchmark of algorithms for multiple DNA sequence alignment across livestock species

    Get PDF
    Background: Due to the growing amount of biological data, it is often necessary to select the most optimal estimation method for DNA sequence alignment across livestock species. One of the most important benches of genomics is to modelling homology between considered DNA sequences. A multiple sequence alignment is a potent tool for molecular and evolutionary biology, and there are several programs and algorithms applicable for this purpose. The purpose of this paper was to study the most commonly used DNA alignment algorithms to select the optimal tool dedicated for short sequences.Methods: Four steps of bioinformatics pipelines were considered to benchmark the algorithms for multiple DNA sequence alignment across livestock species: 1) selection of reference genome sequences of ARS1.2 for cattle, EquCab3.0 for horse and vicPac2 for alpaca with a low E-value using TBLASTn 2) removing gaps for these sequences 3) alignment of obtained sequences using examined algorithms 4) matching the quality of aligned sequences with sequences of reference genomes by more software. The time of computation was archived for the whole analysis. The seven programs were utilized, each based on different alignment algorithms, namely: ClustalO, ClustalW, Kalign, MAFFT, MUSCLE, Probcons and T-Coffee.Results: The result obtained in this study showed that the fastest is progressive algorithms such as Kalign or MUSCLE-FAST. Moreover, the iterative algorithms like MAFFT and MUSCLE revealed a higher quality of the alignment. The T-Coffee and Probcons programs were computational cost-effective; simultaneously, they were generating a medium-quality calculation in a relatively long time. The best quality of alignment was shown by iterative variants of the MAFFT program; however, the speed of the calculations was relatively low. The fastest algorithm was Kalign, making alignment much faster than the competitors, but achieving average results in the quality of the alignment. The average speed ratio concerning the quality of the analyzed algorithms was obtained by the progressive version of MAFFT, NS1.Conclusions: We conclude that the results of this study can be used to re-alignment of variant primers in new livestock genome releases

    Role of APOBEC3 in Genetic Diversity among Endogenous Murine Leukemia Viruses

    Get PDF
    The ability of human and murine APOBECs (specifically, APOBEC3) to inhibit infecting retroviruses and retrotransposition of some mobile elements is becoming established. Less clear is the effect that they have had on the establishment of the endogenous proviruses resident in the human and mouse genomes. We used the mouse genome sequence to study diversity and genetic traits of nonecotropic murine leukemia viruses (polytropic [Pmv], modified polytropic [Mpmv], and xenotropic [Xmv] subgroups), the best-characterized large set of recently integrated proviruses. We identified 49 proviruses. In phylogenetic analyses, Pmvs and Mpmvs were monophyletic, whereas Xmvs were divided into several clades, implying a greater number of replication cycles between the integration events. Four distinct primer binding site types (Pro, Gln1, Gln2 and Thr) were dispersed within the phylogeny, indicating frequent mispriming. We analyzed the frequency and context of G-to-A mutations for the role of mA3 in formation of these proviruses. In the Pmv and Mpmv (but not Xmv) groups, mutations attributable to mA3 constituted a large fraction of the total. A significant number of nonsense mutations suggests the absence of purifying selection following mutation. A strong bias of G-to-A relative to C-to-T changes was seen, implying a strand specificity that can only have occurred prior to integration. The optimal sequence context of G-to-A mutations, TTC, was consistent with mA3. At least in the Pmv group, a significant 5′ to 3′ gradient of G-to-A mutations was consistent with mA3 editing. Altogether, our results for the first time suggest mA3 editing immediately preceding the integration event that led to retroviral endogenization, contributing to inactivation of infectivity

    Context based mixture model for cell phase identification in automated fluorescence microscopy

    Get PDF
    BACKGROUND: Automated identification of cell cycle phases of individual live cells in a large population captured via automated fluorescence microscopy technique is important for cancer drug discovery and cell cycle studies. Time-lapse fluorescence microscopy images provide an important method to study the cell cycle process under different conditions of perturbation. Existing methods are limited in dealing with such time-lapse data sets while manual analysis is not feasible. This paper presents statistical data analysis and statistical pattern recognition to perform this task. RESULTS: The data is generated from Hela H2B GFP cells imaged during a 2-day period with images acquired 15 minutes apart using an automated time-lapse fluorescence microscopy. The patterns are described with four kinds of features, including twelve general features, Haralick texture features, Zernike moment features, and wavelet features. To generate a new set of features with more discriminate power, the commonly used feature reduction techniques are used, which include Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Maximum Margin Criterion (MMC), Stepwise Discriminate Analysis based Feature Selection (SDAFS), and Genetic Algorithm based Feature Selection (GAFS). Then, we propose a Context Based Mixture Model (CBMM) for dealing with the time-series cell sequence information and compare it to other traditional classifiers: Support Vector Machine (SVM), Neural Network (NN), and K-Nearest Neighbor (KNN). Being a standard practice in machine learning, we systematically compare the performance of a number of common feature reduction techniques and classifiers to select an optimal combination of a feature reduction technique and a classifier. A cellular database containing 100 manually labelled subsequence is built for evaluating the performance of the classifiers. The generalization error is estimated using the cross validation technique. The experimental results show that CBMM outperforms all other classifies in identifying prophase and has the best overall performance. CONCLUSION: The application of feature reduction techniques can improve the prediction accuracy significantly. CBMM can effectively utilize the contextual information and has the best overall performance when combined with any of the previously mentioned feature reduction techniques

    Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance

    Full text link
    This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance. At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • …
    corecore