588 research outputs found

    Gene expression profiling in prepubertal and adult male mice using cDNA and oligonucleotide microarrays

    Get PDF
    Variations in gene expression are the basis of differences in cell and tissue function, response to DNA damaging agents, susceptibility to genetic disease, and cellular differentiation. The purpose of this dissertation research was to characterize variation in basal gene expression among adult mouse tissues for selected stress response, DNA repair and damage control genes and to utilize variation in temporal gene expression patterns to identify candidate genes associated with germ cell differentiation from mitosis through meiosis in the prepubertal mouse testis. To accomplish these goals, high throughput analyses of gene expression were performed using custom cDNA and random oligonucleotide microarrays. CDNA microarray technology was optimized by evaluating the effects of multiple hybridization and image analysis methodologies on the magnitude of background-subtracted hybridization signal intensities. The results showed that hybridizing lower probe quantities in a buffer developed at Lawrence Livermore National Laboratory to tryptone-blocked microarrays improved signal intensities. In addition, the error in expression ratio measurements was significantly reduced when microarray images were preprocessed. A custom cDNA microarray comprised of 417 genes and enriched for stress response, DNA repair, and damage control genes was used to investigate basal gene expression differences among adult mouse testis, brain, liver, spleen, and heart. Genes with functions related to stress response exhibited the most variation in expression among tissues whereas DNA repair-associated gene expression varied the least. Random oligonucleotide microarrays comprised of ∼10,000 genes were used to profile changes in gene expression during the first wave of spermatogenesis in the prepubertal mouse testis. Approximately 550 genes were differentially expressed as male germ cells differentiated from spermatogonia to primary spermatocytes. These findings suggest that the 313 unannotated sequences and 178 genes with known functions in other biological pathways have spermatogenesis-associated roles. This dissertation research showed that microarrays are a useful tool for quantitating the expression of large numbers of genes in parallel under normal physiological conditions and during differentiation. It has also provided candidate genes for future investigations of the molecular mechanisms underlying (1) tissue-specific DNA damage response and genetic disease susceptibility and (2) cellular differentiation during the onset and progression of spermatogenesis

    Estimating Gene Signals From Noisy Microarray Images

    Get PDF
    In oligonucleotide microarray experiments, noise is a challenging problem, as biologists now are studying their organisms not in isolation but in the context of a natural environment. In low photomultiplier tube (PMT) voltage images, weak gene signals and their interactions with the background fluorescence noise are most problematic. In addition, nonspecific sequences bind to array spots intermittently causing inaccurate measurements. Conventional techniques cannot precisely separate the foreground and the background signals. In this paper, we propose analytically based estimation technique. We assume a priori spot-shape information using a circular outer periphery with an elliptical center hole. We assume Gaussian statistics for modeling both the foreground and background signals. The mean of the foreground signal quantifies the weak gene signal corresponding to the spot, and the variance gives the measure of the undesired binding that causes fluctuation in the measurement. We propose a foreground-signal and shapeestimation algorithm using the Gibbs sampling method. We compare our developed algorithm with the existing Mann–Whitney (MW)- and expectation maximization (EM)/iterated conditional modes (ICM)-based methods. Our method outperforms the existing methods with considerably smaller mean-square error (MSE) for all signal-to-noise ratios (SNRs) in computer-generated images and gives better qualitative results in low-SNR real-data images. Our method is computationally relatively slow because of its inherent sampling operation and hence only applicable to very noisy-spot images. In a realistic example using our method, we show that the gene-signal fluctuations on the estimated foreground are better observed for the input noisy images with relatively higher undesired bindings

    Topics in genomic image processing

    Get PDF
    The image processing methodologies that have been actively studied and developed now play a very significant role in the flourishing biotechnology research. This work studies, develops and implements several image processing techniques for M-FISH and cDNA microarray images. In particular, we focus on three important areas: M-FISH image compression, microarray image processing and expression-based classification. Two schemes, embedded M-FISH image coding (EMIC) and Microarray BASICA: Background Adjustment, Segmentation, Image Compression and Analysis, have been introduced for M-FISH image compression and microarray image processing, respectively. In the expression-based classification area, we investigate the relationship between optimal number of features and sample size, either analytically or through simulation, for various classifiers

    Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    Get PDF
    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development

    Bioinformatics: a promising field for case-based reasoning

    Get PDF
    Case Based Reasoning has been applied in different fields such as medicine, industry, tutoring systems and others, but in the CBR there are many areas to explore. Nowadays, some research works in Bioinformatics are attempting to use CBR like a tool for classifying DNA genes. Specially the microarrays have been applied increasingly to improve medical decision-making, and to the diagnosis of different diseases like cancer. This research work analyzes the Microarrays structure, and the initial concepts to understand how DNA structure is studied in the Bioinformatics' field. In last years the CBR has been related to Bioinformatics and Microarrays. In this report, our interest is to find out how the Microarrays technique could help in the CBR field, and specially in the Case-Based Maintenance policies.Postprint (published version

    Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

    Get PDF
    The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.Siirretty Doriast

    Data Clustering and Partial Supervision with Some Parallel Developments

    Get PDF
    Data Clustering and Partial Supell'ision with SOllie Parallel Developments by Sameh A. Salem Clustering is an important and irreplaceable step towards the search for structures in the data. Many different clustering algorithms have been proposed. Yet, the sources of variability in most clustering algorithms affect the reliability of their results. Moreover, the majority tend to be based on the knowledge of the number of clusters as one of the input parameters. Unfortunately, there are many scenarios, where this knowledge may not be available. In addition, clustering algorithms are very computationally intensive which leads to a major challenging problem in scaling up to large datasets. This thesis gives possible solutions for such problems. First, new measures - called clustering performance measures (CPMs) - for assessing the reliability of a clustering algorithm are introduced. These CPMs can be used to evaluate: I) clustering algorithms that have a structure bias to certain type of data distribution as well as those that have no such biases, 2) clustering algorithms that have initialisation dependency as well as the clustering algorithms that have a unique solution for a given set of parameter values with no initialisation dependency. Then, a novel clustering algorithm, which is a RAdius based Clustering ALgorithm (RACAL), is proposed. RACAL uses a distance based principle to map the distributions of the data assuming that clusters are determined by a distance parameter, without having to specify the number of clusters. Furthermore, RACAL is enhanced by a validity index to choose the best clustering result, i.e. result has compact clusters with wide cluster separations, for a given input parameter. Comparisons with other clustering algorithms indicate the applicability and reliability of the proposed clustering algorithm. Additionally, an adaptive partial supervision strategy is proposed for using in conjunction with RACAL_to make it act as a classifier. Results from RACAL with partial supervision, RACAL-PS, indicate its robustness in classification. Additionally, a parallel version of RACAL (P-RACAL) is proposed. The parallel evaluations of P-RACAL indicate that P-RACAL is scalable in terms of speedup and scaleup, which gives the ability to handle large datasets of high dimensions in a reasonable time. Next, a novel clustering algorithm, which achieves clustering without any control of cluster sizes, is introduced. This algorithm, which is called Nearest Neighbour Clustering, Algorithm (NNCA), uses the same concept as the K-Nearest Neighbour (KNN) classifier with the advantage that the algorithm needs no training set and it is completely unsupervised. Additionally, NNCA is augmented with a partial supervision strategy, NNCA-PS, to act as a classifier. Comparisons with other methods indicate the robustness of the proposed method in classification. Additionally, experiments on parallel environment indicate the suitability and scalability of the parallel NNCA, P-NNCA, in handling large datasets. Further investigations on more challenging data are carried out. In this context, microarray data is considered. In such data, the number of clusters is not clearly defined. This points directly towards the clustering algorithms that does not require the knowledge of the number of clusters. Therefore, the efficacy of one of these algorithms is examined. Finally, a novel integrated clustering performance measure (lCPM) is proposed to be used as a guideline for choosing the proper clustering algorithm that has the ability to extract useful biological information in a particular dataset. Supplied by The British Library - 'The world's knowledge' Supplied by The British Library - 'The world's knowledge

    Postgenomics: Proteomics and Bioinformatics in Cancer Research

    Get PDF
    Now that the human genome is completed, the characterization of the proteins encoded by the sequence remains a challenging task. The study of the complete protein complement of the genome, the “proteome,” referred to as proteomics, will be essential if new therapeutic drugs and new disease biomarkers for early diagnosis are to be developed. Research efforts are already underway to develop the technology necessary to compare the specific protein profiles of diseased versus nondiseased states. These technologies provide a wealth of information and rapidly generate large quantities of data. Processing the large amounts of data will lead to useful predictive mathematical descriptions of biological systems which will permit rapid identification of novel therapeutic targets and identification of metabolic disorders. Here, we present an overview of the current status and future research approaches in defining the cancer cell's proteome in combination with different bioinformatics and computational biology tools toward a better understanding of health and disease
    corecore