89 research outputs found

    PRISE2: software for designing sequence-selective PCR primers and probes.

    Get PDF
    BackgroundPRISE2 is a new software tool for designing sequence-selective PCR primers and probes. To achieve high level of selectivity, PRISE2 allows the user to specify a collection of target sequences that the primers are supposed to amplify, as well as non-target sequences that should not be amplified. The program emphasizes primer selectivity on the 3' end, which is crucial for selective amplification of conserved sequences such as rRNA genes. In PRISE2, users can specify desired properties of primers, including length, GC content, and others. They can interactively manipulate the list of candidate primers, to choose primer pairs that are best suited for their needs. A similar process is used to add probes to selected primer pairs. More advanced features include, for example, the capability to define a custom mismatch penalty function. PRISE2 is equipped with a graphical, user-friendly interface, and it runs on Windows, Macintosh or Linux machines.ResultsPRISE2 has been tested on two very similar strains of the fungus Dactylella oviparasitica, and it was able to create highly selective primers and probes for each of them, demonstrating the ability to create useful sequence-selective assays.ConclusionsPRISE2 is a user-friendly, interactive software package that can be used to design high-quality selective primers for PCR experiments. In addition to choosing primers, users have an option to add a probe to any selected primer pair, enabling design of Taqman and other primer-probe based assays. PRISE2 can also be used to design probes for FISH and other hybridization-based assays

    Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint) to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups.</p> <p>Results</p> <p>Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups.</p> <p>Conclusions</p> <p>Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.</p

    Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes

    Get PDF
    >Magister Scientiae - MScINTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests

    Natural variation in Drosophila melanogaster

    Get PDF
    This work is dedicated to studying natural variation in D. melanogaster at the DNA sequence and gene expression level. In addition I present a new version of the DNA polymorphism analysis program VariScan, which includes significant improvements. In CHAPTER 1 I describe a genome scan of single nucleotide polymorphism in two natural D. melanogaster populations (from Africa and Europe) on the third chromosome. Together with polymorphism data previously published for the X chromosome of the same populations, this allows a comparative study of the polymorphism patterns of the X chromosome and an autosome. The frequency spectrum of mutations and the patterns of linkage disequilibrium are investigated. The observed patterns indicate that there is a significant difference in the behavior of the two chromosomes, as has already been suggested by previous studies. To uncover the reasons for this a coalescent based maximum likelihood method is applied that incorporates the effects of demographic history and unequal sex ratios. For the African population the differential behavior of the chromosomes can be explained by its demographic history and an excess of females. In Europe, a population bottleneck and an excess of males alone cannot explain the patterns we observe. The additional action of positive selection in this population is proposed as a possible explanation. In CHAPTER 2 I investigate the variation in gene expression of the two aforementioned populations. Whole-genome microarrays are used to study levels of expression for 88% of all known genes in eight adult males from both populations. The observed levels of expression variation are equal in Africa and Europe, despite the fact that DNA sequence variation is much higher in Africa. This is evidence for the action of stabilizing selection governing levels of expression polymorphism. Supporting this view, genes involved in many different functions, and are therefore on strong selective constraint, show less variation than do genes with only few functions. The experimental design allows the search for genes which differ in their expression patterns between Europe and Africa and might therefore have undergone adaptive evolution. Detected candidates include genes putatively involved in insecticide resistance and food choice. Surprisingly, many genes over-expressed in Africa are involved in the formation and function of the flying apparatus. In CHAPTER 3 I present version 2 of the program VariScan. This program was designed to analyse patterns of DNA sequence polymorphism on a chromosomal scale. The functionality of the core analysis tool, the wavelet decomposition, is described. In addition, multiple improvements to the previous version are presented. The program now supports the “pairwise deletion” option. This is essential for analysing data at the chromosome scale, since such data often contains incomplete information. It is now possible to add outgroup information, which allows the calculation of additional statistics. Furthermore, the separate analysis of different predefined chromosomal regions is added as an option. To increase the user friendliness, a graphical user interface is now included as part of the software package. Finally, VariScan is applied to published and computer-generated data and the ability of the wavelet-based analysis to uncover chromosomal regions with interesting DNA polymorphism patterns is demonstrated

    Mapping and Functional Analysis of cis-Regulatory Elements in Mouse Photoreceptors

    Get PDF
    Photoreceptors are light-sensitive neurons that mediate vision, and they are the most commonly affected cell type in genetic forms of blindness. In mice, there are two basic types of photoreceptors, rods and cones, which mediate vision in dim and bright environments, respectively. The transcription factors (TFs) that control rod and cone development have been studied in detail, but the cis-regulatory elements (CREs) through which these TFs act are less well understood. To comprehensively identify photoreceptor CREs in mice and to understand their relationship with gene expression, we performed open chromatin (ATAC-seq) and transcriptome (RNA-seq) profiling of FACS-purified rods and cones. We find that rods have significantly fewer regions of open chromatin than cones (as well as \u3e60 additional cell types and tissues), and we demonstrate that this uniquely closed chromatin architecture depends on the rod master regulator Nrl. Finally, we find that regions of rod- and cone-specific open chromatin are enriched for distinct sets of TF binding sites, providing insight into the cis-regulatory grammar of these cell types. We also sought to understand how the regulatory activity of rod and cone open chromatin regions is encoded in DNA sequence. Cone-rod homeobox (CRX) is a paired-like homeodomain TF and master regulator of both rod and cone development, and CRX binding sites are by far the most enriched TF binding sites in photoreceptor CREs. The in vitro DNA binding preferences of CRX have been extensively characterized, but how well in vitro models of TF binding site affinity predict in vivo regulatory activity is not known. In addition, paired-class homeodomain TFs bind DNA as both monomers and dimers, but whether monomeric and dimeric CRX binding sites have distinct regulatory activities is not known. To address these questions, we used a massively parallel reporter assay to quantify the activity of thousands native and mutant CRX binding sites in explanted mouse retinas. These data reveal that dimeric CRX binding sites encode stronger enhancers than monomeric CRX binding sites. Moreover, the activity of half-sites within dimeric CRX binding sites is cooperative and spacing-dependent. In addition, saturating mutagenesis of 195 CRX binding sites reveals that, while TF binding site affinity and activity are moderately correlated across mutations within individual CREs, they are poorly correlated across mutations from distinct CREs. Accordingly, we show that accounting for baseline CRE activity improves the prediction of the effects of mutations in regulatory DNA from sequence-based models. Taken together, these data demonstrate that the activity of CRX binding sites depends on multiple layers of sequence context, providing insight into photoreceptor gene regulation and illustrating functional principles of homeodomain TF binding sites

    Bayesian clustering of curves and the search of the partition space

    Get PDF
    This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes
    • …
    corecore