30 research outputs found
A Bayesian Model for Gene Family Evolution
Background
A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model. Results
We develop a Bayesian approach for estimating the birth and death rate and other parameters in the birth and death model. In addition, a Bayesian hypothesis test is developed to identify the gene families that are unlikely under the birth and death process. Simulation results suggest that the Bayesian estimate is more accurate than the maximum likelihood estimate of the birth and death rate. The Bayesian approach was applied to a real dataset of 3517 gene families across genomes of five yeast species. The results indicate that the Bayesian model assuming a constant birth and death rate among branches of the phylogenetic tree cannot adequately explain the observed pattern of the sizes of gene families across species. The yeast dataset was thus analyzed with a Bayesian heterogeneous rate model that allows the birth and death rate to vary among the branches of the tree. The unlikely gene families identified by the Bayesian heterogeneous rate model are different from those given by the maximum likelihood method. Conclusions
Compared to the maximum likelihood method, the Bayesian approach can produce more accurate estimates of the parameters in the birth and death model. In addition, the Bayesian hypothesis test is able to identify unlikely gene families based on Bayesian posterior p-values. As a powerful statistical technique, the Bayesian approach can effectively extract information from gene family data and thereby provide useful information regarding the evolutionary process of gene families across genomes
Identification of expressed resistance gene-like sequences by data mining in 454-derived transcriptomic sequences of common bean (Phaseolus vulgaris L.)
<p>Abstract</p> <p>Background</p> <p>Common bean (<it>Phaseolus vulgaris </it>L.) is one of the most important legumes in the world. Several diseases severely reduce bean production and quality; therefore, it is very important to better understand disease resistance in common bean in order to prevent these losses. More than 70 resistance (<it>R</it>) genes which confer resistance against various pathogens have been cloned from diverse plant species. Most <it>R </it>genes share highly conserved domains which facilitates the identification of new candidate <it>R </it>genes from the same species or other species. The goals of this study were to isolate expressed <it>R </it>gene-like sequences (RGLs) from 454-derived transcriptomic sequences and expressed sequence tags (ESTs) of common bean, and to develop RGL-tagged molecular markers.</p> <p>Results</p> <p>A data-mining approach was used to identify tentative <it>P. vulgaris R </it>gene-like sequences from approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non-redundant sequences were identified and named as common bean (<it>P. vulgaris </it>= Pv) resistance gene-like sequences (PvRGLs). Among the identified PvRGLs, about 60% (218 PvRGLs) were from 454-derived sequences. Reverse transcriptase-polymerase chain reaction (RT-PCR) analysis confirmed that PvRGLs were actually expressed in the leaves of common bean. Upon comparison to <it>P. vulgaris </it>genomic sequences, 105 (28.77%) of the 365 tentative PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between common bean and soybean, 237 (64.93%) PvRGLs were anchored on the <it>P. vulgaris </it>genetic map and will need to be mapped to determine order. In addition, 11 sequence-tagged-site (STS) and 19 cleaved amplified polymorphic sequence (CAPS) molecular markers were developed for 25 unique PvRGLs.</p> <p>Conclusions</p> <p>In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs available in GenBank and about 65% of PvRGLs were integrated into the common bean genetic map. A total of 30 RGL-tagged markers were developed for 25 unique PvRGLs, including 11 STS and 19 CAPS markers. The expressed PvRGLs identified in this study provide a large sequence resource for development of RGL-tagged markers that could be used further for genetic mapping of disease resistant candidate genes and quantitative trait locus/loci (QTLs). This work also represents an additional method for identifying expressed RGLs from next generation sequencing data.</p
Identification and analysis of common bean (Phaseolus vulgaris L.) transcriptomes by massively parallel pyrosequencing
Common bean (Phaseolus vulgaris) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean
A Bayesian model for gene family evolution
Abstract Background A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model. Results We develop a Bayesian approach for estimating the birth and death rate and other parameters in the birth and death model. In addition, a Bayesian hypothesis test is developed to identify the gene families that are unlikely under the birth and death process. Simulation results suggest that the Bayesian estimate is more accurate than the maximum likelihood estimate of the birth and death rate. The Bayesian approach was applied to a real dataset of 3517 gene families across genomes of five yeast species. The results indicate that the Bayesian model assuming a constant birth and death rate among branches of the phylogenetic tree cannot adequately explain the observed pattern of the sizes of gene families across species. The yeast dataset was thus analyzed with a Bayesian heterogeneous rate model that allows the birth and death rate to vary among the branches of the tree. The unlikely gene families identified by the Bayesian heterogeneous rate model are different from those given by the maximum likelihood method. Conclusions Compared to the maximum likelihood method, the Bayesian approach can produce more accurate estimates of the parameters in the birth and death model. In addition, the Bayesian hypothesis test is able to identify unlikely gene families based on Bayesian posterior p-values. As a powerful statistical technique, the Bayesian approach can effectively extract information from gene family data and thereby provide useful information regarding the evolutionary process of gene families across genomes.</p
Genome-Wide Identification and Expression Analyses of the Cotton AGO Genes and Their Potential Roles in Fiber Development and Stress Response
Argonaute proteins (AGOs) are indispensable components of RNA silencing. However, systematic characterization of the AGO genes have not been completed in cotton until now. In this study, cotton AGO genes were identified and analyzed with respect to their evolution and expression profile during biotic and abiotic stresses. We identified 14 GaAGO, 14 GrAGO, and 28 GhAGO genes in the genomes of Gossypium arboreum, Gossypium raimondii, and Gossypium hirsutum. Cotton AGO proteins were classified into four subgroups. Structural and functional conservation were observed in the same subgroups based on the analysis of the gene structure and conserved domains. Twenty-four duplicated gene pairs were identified in GhAGO genes, and all of them exhibited strong purifying selection during evolution. Moreover, RNA-seq analysis showed that most of the GhAGO genes exhibit high expression levels in the fiber initiation and elongation processes. Furthermore, the expression profiles of GhAGO genes tested by quantitative real-time polymerase chain reaction (qPCR) demonstrated that they were sensitive to Verticillium wilt infection and salt and drought stresses. Overall, our results will pave the way for further functional investigation of the cotton AGO gene family, which may be involved in fiber development and stress response
Research on network security technology of industrial control system
The relationship between industrial control system and Internet is becoming closer and closer, and its network security has attracted much attention. Penetration testing is an active network intrusion detection technology, which plays an indispensable role in protecting the security of the system. This paper mainly introduces the principle of penetration testing, summarizes the current cutting-edge penetration testing technology, and looks forward to its development
Identification and analysis of common bean (<it>Phaseolus vulgaris </it>L.) transcriptomes by massively parallel pyrosequencing
<p>Abstract</p> <p>Background</p> <p>Common bean (<it>Phaseolus vulgaris</it>) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean.</p> <p>Results</p> <p>We obtained 1,692,972 reads with an average read length of 207 nucleotides (nt). These reads were assembled into 59,295 unigenes including 39,572 contigs and 19,723 singletons, in addition to 35,328 singletons less than 100 bp. Comparing the unigenes to common bean ESTs deposited in GenBank, we found that 53.40% or 31,664 of these unigenes had no matches to this dataset and can be considered as new common bean transcripts. Functional annotation of the unigenes carried out by Gene Ontology assignments from hits to <it>Arabidopsis </it>and soybean indicated coverage of a broad range of GO categories. The common bean unigenes were also compared to the bean bacterial artificial chromosome (BAC) end sequences, and a total of 21% of the unigenes (12,724) including 9,199 contigs and 3,256 singletons match to the 8,823 BAC-end sequences. In addition, a large number of simple sequence repeats (SSRs) and transcription factors were also identified in this study.</p> <p>Conclusions</p> <p>This work provides the first large scale identification of the common bean transcriptome derived by 454 pyrosequencing. This research has resulted in a 150% increase in the number of <it>Phaseolus vulgaris </it>ESTs. The dataset obtained through this analysis will provide a platform for functional genomics in common bean and related legumes and will aid in the development of molecular markers that can be used for tagging genes of interest. Additionally, these sequences will provide a means for better annotation of the on-going common bean whole genome sequencing.</p
Analysis on Risk Characteristics of Traffic Accidents in Small-Spacing Expressway Interchange
Many small-spacing interchanges (SSI) appear when the density of the expressway interchanges increases. However, the characteristics of traffic accidents in SSI have not been explained clearly. Therefore, this paper systematically takes the G3001 expressway in Xi’an as the research object to explore the accident characteristics of SSI. Firstly, the expressway is divided into four sections. Furthermore, their safety can be evaluated by the number of accidents per unit distance of 100 million vehicles (NAP). Subsequently, eight indexes, such as mean spacing distance (MSD), are selected to explain the cause affecting expressway safety by developing the least square support vector machine (LSSVM). Secondly, the difference between SSI and normal-spacing interchanges (NSI) is clarified by statistical analysis. Finally, LSSVM, random forest, and logistic regression models are built using 12 indicators, such as the time spent exploring the causes of serious accidents. The results show that the inner ring NAP in Sections I and II with SSI is 27.2 and 33.7, higher than in other sections. The density, annual average daily traffic, and MSD adversely affect expressway traffic safety. The road condition mainly influences the serious traffic accidents in the SSI. This study can provide the theoretical basis for traffic management and accident prevention in the SSI of the expressway
Localization of transcription start sites of the peanut <i>AhLEC1B</i> gene using 5′ RACE.
<p>P1 –Product of the first round PCR; P2 –Product of the second round PCR</p