49 research outputs found
Probing the endosperm gene expression landscape in Brassica napus
<p>Abstract</p> <p>Background</p> <p>In species with exalbuminous seeds, the endosperm is eventually consumed and its space occupied by the embryo during seed development. However, the main constituent of the early developing seed is the liquid endosperm, and a significant portion of the carbon resources for the ensuing stages of seed development arrive at the embryo through the endosperm. In contrast to the extensive study of species with persistent endosperm, little is known about the global gene expression pattern in the endosperm of exalbuminous seed species such as crucifer oilseeds.</p> <p>Results</p> <p>We took a multiparallel approach that combines ESTs, protein profiling and microarray analyses to look into the gene expression landscape in the endosperm of the oilseed crop <it>Brassica napus</it>. An EST collection of over 30,000 entries allowed us to detect close to 10,000 unisequences expressed in the endosperm. A protein profile analysis of more than 800 proteins corroborated several signature pathways uncovered by abundant ESTs. Using microarray analyses, we identified genes that are differentially or highly expressed across all developmental stages. These complementary analyses provided insight on several prominent metabolic pathways in the endosperm. We also discovered that a transcription factor <it>LEAFY COTYLEDON </it>(<it>LEC1</it>) was highly expressed in the endosperm and that the regulatory cascade downstream of <it>LEC1 </it>operates in the endosperm.</p> <p>Conclusion</p> <p>The endosperm EST collection and the microarray dataset provide a basic genomic resource for dissecting metabolic and developmental events important for oilseed improvement. Our findings on the featured metabolic processes and the <it>LEC1 </it>regulatory cascade offer new angles for investigation on the integration of endosperm gene expression with embryo development and storage product deposition in seed development.</p
Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm
<p>Abstract</p> <p>Background</p> <p>Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.</p> <p>Results</p> <p>We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (<it>Plasmodium chabaudi</it>), systemic acquired resistance in <it>Arabidopsis thaliana</it>, similarities and differences between inner and outer cotyledon in <it>Brassica napus </it>during seed development, and to <it>Brassica napus </it>whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.</p> <p>Conclusions</p> <p>Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.</p
An Efficient Memory Management System for MicroNATAL Run-Time System
NRC publication: Ye
Guide to Threshold Selection for Motif Prediction Using Positional Weight Matrix
Abstract—In biological sequence research, the positional weight matrix (PWM) is often used to search for putative transcription factor binding sites. A log-odd score is usually applied to measure the closeness of a subsequence to the PWM. However, the log-odd score is motif-length-dependent and thus there is no universally applicable threshold. In this paper, we propose an alternative scoring index (G) varying from zero, where the subsequence is not much different from the background, to one, where the subsequence fits best to the PWM. We also propose a measure evaluating the statistical expectation at each G index. We investigated the PWMs from the TRANSFAC and found that the statistical expectation is significantly ( p < 0.0001) correlated with both the length of the PWMs and the threshold G value. We applied this method to two PWMs (GCN4_C and ROX1_Q6) of yeast transcription factor binding sites and two PWMs (HIC1-02, HIC1_03) of the human tumor suppressor (HIC-1) binding sites from the TRANSFAC database. Finally, our method compares favorably with the broadly used Match method. The results indicate that our method is more flexible and can provide better confidence
Threshold for Positional Weight Matrix
Abstract—In biological sequence research, the positional weight matrix (PWM) is often used to search for putative transcription factor binding sites. A set of experimentally verified oligonucleotides known to be functional motifs are collected and aligned. The frequency of each nucleotide A, C, G, or T at each column of the alignment is calculated in the matrix. Once a PWM is constructed, it can be used to search from a nucleotide sequence for subsequences that can possibly perform the same function. The match between a subsequence and a PWM is usually described by a score function, which measures the closeness of the subsequence to the PWM as compared with the given background. Nevertheless, the score function is usually motif-length-dependent and thus there is no universally applicable threshold. In this paper, we propose an alternative scoring index (G) varying from zero, where the subsequence is not much different from the background, to one, where the subsequence fits best to the PWM. We also propose a measure evaluating the statistical expectation at each G index. We investigated the PWMs from the TRANSFAC and found that the statistical expectation is significantly (p<0.0001) correlated with both the length of the PWMs and the threshold G value. We applied this method to two PWMs (GCN4_C and ROX1_Q6) of yeast transcription factor binding sites and two PWMs (HIC1-02, HIC1_03) of the human tumor suppressor (HIC-1) binding sites from the TRANSFAC database. Finally, our method compares favorably with the broadly used Match method. The results indicate that our method is more flexible and can provide better confidence
Positional Weight Matrix as a Sequence Motif Detector
In biological sequence research, the positional weight matrix (PWM) is often used for motif signal detection. A set of experimentally verified oligonucleotides known to be functional subsequences, which can be bound by a transcription factor (TF), as translational initiation sites or pre-mRNA splicing sites, are collected and aligned. The frequency of each nucleotide A, C, G, or T at each column of the alignment is calculated in the matrix. Once a PWM is constructed, it can be used to search from a nucleotide sequence for the subsequences that possibly perform the same function. The match between a subsequence and a PWM is usually described by a score function, which measures the closeness of the subsequence to the PWM as compared with the given background. However, selection of threshold scores that legitimately qualify a functional subsequence has been a great challenge. Many laboratories have attempted tackling this problem; but there is no significant breakthrough so far. In this chapter, we discuss the characteristics of a PWM and factors that affect motif predictions and propose a new score function that is tied into information content and statistical expectation of a PWM. We also apply this score function in the PWMs from public databases and compare it favorably with the broadly used Match method.Dans la recherche des s\ue9quences biologiques, on utilise fr\ue9quemment la matrice position-poids (MPP) pour d\ue9tecter les signaux de motif. Des oligonucl\ue9otides v\ue9rifi\ue9s exp\ue9rimentalement comme \ue9tant des sous-s\ue9quences fonctionnelles, qui peuvent \ueatre li\ue9s par un facteur de transcription comme promoteurs translationnels ou sites d'\ue9pissage de pr\ue9-ARNm, sont recueillis et align\ue9s. La fr\ue9quence de chaque nucl\ue9otide A, C, G ou T dans chaque colonne de l'alignement est calcul\ue9e dans la matrice. Une fois qu'une MPP est construite, elle peut \ueatre utilis\ue9e pour chercher dans une s\ue9quence de nucl\ue9otides les sous-s\ue9quences qui pourraient avoir la m\ueame fonction. La concordance entre une sous-s\ue9quence et une MPP est g\ue9n\ue9ralement d\ue9crite par une fonction de score, qui mesure la proximit\ue9 de la sous-s\ue9quence et de la MPP comparativement au bruit de fond donn\ue9. Cependant, la d\ue9termination de seuils de score permettant de v\ue9rifier la qualification d'une sous-s\ue9quence fonctionnelle reste un obstacle important. Plusieurs laboratoires ont tent\ue9 de r\ue9soudre ce probl\ue8me, mais il n'y a eu aucune perc\ue9e importante jusqu'ici. Dans le pr\ue9sent chapitre, nous discutons les caract\ue9ristiques d'une MPP et les facteurs qui influent sur la pr\ue9vision des motifs, et nous proposons une nouvelle fonction de score qui est li\ue9e au contenu en information et \ue0 l'esp\ue9rance statistique d'une MPP. Nous utilisons \ue9galement cette fonction de score dans des MPP obtenues de bases de donn\ue9es du domaine public, et elle se compare favorablement \ue0 la m\ue9thode Match couramment utilis\ue9e.NRC publication: Ye
Fault Identification and Prevention for PVC Management in ATM Networks
In order to meet the need of network management for emerging large complex heterogeneous communication networks, a distributed proactive self-adjusting management (DPSAM) framework was developed. The framework facilitates the incorporation of artificial intelligence and distributed computing technologies in building advanced network management systems. PMS, a PVC (Permanent virtual Circuit) management system for ATM networks, is developed based on DPSAM framework. PMS provides a scalable, end-to-end path management solution required for today's ATM network and service management It aims to assist network operators to perform PVC operations with simplified procedures and automatic optimum route selection. It also provides effective decision-making support for PVC fault identification and prevention. In this paper, PVC fault identification and prevention along with an overview of the DPSAM framework and PMS will be presented.Pour r\ue9pondre aux besoins en mati\ue8re de gestion des nouveaux r\ue9seaux de communication h\ue9t\ue9rog\ue8nes, vastes et complexes, un cadre de gestion proactif r\ue9parti \ue0 ajustement automatique a \ue9t\ue9 \ue9labor\ue9. Ce cadre facilite l'introduction de technologies d'intelligence artificielle et de calcul r\ue9parti dans la construction des syst\ue8mes de gestion de r\ue9seaux perfectionn\ue9s. Un syst\ue8me de gestion de CVP (circuits virtuels permanents) pour r\ue9seaux MTA bas\ue9 sur la gestion proactive r\ue9partie \ue0 ajustement automatique est pr\ue9sent\ue9. Ce syst\ue8me offre une solution \ue9volutive de gestion de trajets de bout en bout pour les r\ue9seaux MTA et la gestion des op\ue9rations. Il aide les op\ue9rateurs du r\ue9seau \ue0 effectuer les op\ue9rations sur les CVP en leur offrant des proc\ue9dures simplifi\ue9es et une s\ue9lection automatique des trajets optimaux. Il s'av\ue8re \ue9galement efficace comme aide \ue0 la d\ue9cision pour d\ue9tecter et pr\ue9venir les d\ue9faillances dans les CVP. Une m\ue9thode de d\ue9tection et de pr\ue9vention des d\ue9faillances dans les CVP et un aper\ue7u de la gestion proactive r\ue9partie \ue0 ajustement automatique et du syst\ue8me de gestion de CVP sont expos\ue9s dans cet article.NRC publication: Ye
A Specification for the NATAL Symbolic Instruction Language
NRC publication: Ye