174 research outputs found

    Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries

    Get PDF
    BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a method of large-scale gene expression analysis that has the potential to generate the full list of mRNAs present within a cell population at a given time and their frequency. An essential step in SAGE library analysis is the unambiguous assignment of each 14 bp tag to the transcript from which it was derived. This process, called tag-to-gene mapping, represents a step that has to be improved in the analysis of SAGE libraries. Indeed, the existing web sites providing correspondence between tags and transcripts do not concern all species for which numerous EST and cDNA have already been sequenced. RESULTS: This is the reason why we designed and implemented a freely available tool called Identitag for tag identification that can be used in any species for which transcript sequences are available. Identitag is based on a relational database structure in order to allow rapid and easy storage and updating of data and, most importantly, in order to be able to precisely define identification parameters. This structure can be seen like three interconnected modules : the first one stores virtual tags extracted from a given list of transcript sequences, the second stores experimental tags observed in SAGE experiments, and the third allows the annotation of the transcript sequences used for virtual tag extraction. It therefore connects an observed tag to a virtual tag and to the sequence it comes from, and then to its functional annotation when available. Databases made from different species can be connected according to orthology relationship thus allowing the comparison of SAGE libraries between species. We successfully used Identitag to identify tags from our chicken SAGE libraries and for chicken to human SAGE tags interspecies comparison. Identitag sources are freely available on web site. CONCLUSIONS: Identitag is a flexible and powerful tool for tag identification in any single species and for interspecies comparison of SAGE libraries. It opens the way to comparative transcriptomic analysis, an emerging branch of biology

    Learning Constrained Edit State Machines

    Get PDF
    International audienceLearning the parameters of the edit distance has been increasingly studied during the past few years to improve the assessment of similarities between structured data, such as strings, trees or graphs. Often based on the optimization of the likelihood of pairs of data, the learned models usually take the form of probabilistic state machines, such as pair-Hidden Markov Models (pair-HMM), stochastic transducers, or probabilistic deterministic automata. Although the use of such models has lead to significant improvements of edit distance-based classification tasks, a new challenge has appeared on the horizon: How integrating background knowledge during the learning process? This is the subject matter of this paper in the case of (input,output) pairs of strings. We present a generalization of the pair-HMM in the form of a constrained state machine, where a transition between two states is driven by constraints fulfilled on the input string. Experimental results are provided on a task in molecular biology, aiming to detect transcription factor binding sites

    Model-Based Assessment of the Role of Uneven Partitioning of Molecular Content on Heterogeneity and Regulation of Differentiation in CD8 T-Cell Immune Responses

    Get PDF
    Activation of naive CD8 T-cells can lead to the generation of multiple effector and memory subsets. Multiple parameters associated with activation conditions are involved in generating this diversity that is associated with heterogeneous molecular contents of activated cells. Although naive cell polarisation upon antigenic stimulation and the resulting asymmetric division are known to be a major source of heterogeneity and cell fate regulation, the consequences of stochastic uneven partitioning of molecular content upon subsequent divisions remain unclear yet. Here we aim at studying the impact of uneven partitioning on molecular-content heterogeneity and then on the immune response dynamics at the cellular level. To do so, we introduce a multiscale mathematical model of the CD8 T-cell immune response in the lymph node. In the model, cells are described as agents evolving and interacting in a 2D environment while a set of differential equations, embedded in each cell, models the regulation of intra and extracellular proteins involved in cell differentiation. Based on the analysis of in silico data at the single cell level, we show that immune response dynamics can be explained by the molecular-content heterogeneity generated by uneven partitioning at cell division. In particular, uneven partitioning acts as a regulator of cell differentiation and induces the emergence of two coexisting sub-populations of cells exhibiting antagonistic fates. We show that the degree of unevenness of molecular partitioning, along all cell divisions, affects the outcome of the immune response and can promote the generation of memory cells

    Modeling the emergence of multi-protein dynamic structures by principles of self-organization through the use of 3DSpi, a multi-agent-based software

    Get PDF
    BACKGROUND: There is an increasing need for computer-generated models that can be used for explaining the emergence and predicting the behavior of multi-protein dynamic structures in cells. Multi-agent systems (MAS) have been proposed as good candidates to achieve this goal. RESULTS: We have created 3DSpi, a multi-agent based software that we used to explore the generation of multi-protein dynamic structures. Being based on a very restricted set of parameters, it is perfectly suited for exploring the minimal set of rules needed to generate large multi-protein structures. It can therefore be used to test the hypothesis that such structures are formed and maintained by principles of self-organization. We observed that multi-protein structures emerge and that the system behavior is very robust, in terms of the number and size of the structures generated. Furthermore, the generated structures very closely mimic spatial organization of real life multi-protein structures. CONCLUSION: The behavior of 3DSpi confirms the considerable potential of MAS for modeling subcellular structures. It demonstrates that robust multi-protein structures can emerge using a restricted set of parameters and allows the exploration of the dynamics of such structures. A number of easy-to-implement modifications should make 3DSpi the virtual simulator of choice for scientists wishing to explore how topology interacts with time, to regulate the function of interacting proteins in living cells

    Extraction sous Contraintes d'Ensembles de Cliques Homogènes

    No full text
    Document sur site LIRIS : http://liris.cnrs.fr/Documents/Liris-4915.pdfNational audienceNous proposons une méthode de fouille de données sur des graphes ayant un ensemble d'étiquettes associé à chaque sommet. Une application est, par exemple, d'analyser un réseau social de chercheurs co-auteurs lorsque des étiquettes précisent les conférences dans lesquelles ils publient.Nous définissons l'extraction sous contraintes d'ensembles de cliques tel que chaque sommet des cliques impliquées partage suffisamment d'étiquettes. Nous proposons une méthode pour calculer tous les Ensembles Maximaux de Cliques dits Homogènes qui satisfont une conjonction de contraintes fixée par l'analyste et concernant le nombre de cliques séparées, la taille des cliques ainsi que le nombre d'étiquettes partagées. Les expérimentations montrent que l'approche fonctionne sur de grands graphes construits à partir de données réelles et permet la mise en évidence de structures intéressantes

    A combination of transposable elements and magnetic cell sorting provides a very efficient transgenesis system for chicken primary erythroid progenitors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Stable transgenesis is an undeniable key to understanding any genetic system. Retrovirus-based insertional strategies, which feature several technical challenges when they are used, are often limited to one particular species, and even sometimes to a particular cell type as the infection depends on certain cellular receptors. A universal-like system, which would allow both stable transgene expression independent of the cell type and an efficient sorting of transfected cells, is required when handling cellular models that are incompatible with retroviral strategies.</p> <p>Results</p> <p>We report here on the combination of a stable insertional transgenesis technique, based on the Tol2 transposon system together with the magnetic cell sorting (MACS) technique, which allows specific selection of cells carrying the transgene in an efficient, reliable and rapid way.</p> <p>Conclusion</p> <p>This new Tol2/MACS system leads to stable expression in a culture of primary chicken erythroid cells highly enriched in cells expressing the transgene of interest. This system could be used in a wide variety of vertebrate species.</p

    Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

    No full text
    International audienceBackgroundDiscovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user.ResultsWe take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed.ConclusionsExperiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/ webcite. The software is available at https://bingo2.greyc.fr/?q=node/22 webcite

    A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>"Open" transcriptome analysis methods allow to study gene expression without <it>a priori </it>knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered.</p> <p>Results</p> <p>In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method.</p> <p>Conclusion</p> <p>We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.</p
    • …
    corecore