9 research outputs found

    A double classification tree search algorithm for index SNP selection

    Get PDF
    BACKGROUND: In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets. RESULTS: We have developed a double classification tree search algorithm to generate index SNPs that can distinguish all SNP and haplotype patterns. This algorithm runs very rapidly and generates very good, though not necessarily minimum, sets of index SNPs, as is to be expected for such NP-complete problems. CONCLUSIONS: A new algorithm for index SNP selection has been developed. A webserver for index SNP selection is available a

    Gene functional similarity search tool (GFSST)

    Get PDF
    BACKGROUND: With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic association studies, biomarker and drug target selection, and animal models of human diseases, it is essential to have search engines that can retrieve genes by their functions from proteome databases. In recent years, the development of Gene Ontology (GO) has established structured, controlled vocabularies describing gene functions, which makes it possible to develop novel tools to search genes by functional similarity. RESULTS: By using a statistical model to measure the functional similarity of genes based on the Gene Ontology directed acyclic graph, we developed a novel Gene Functional Similarity Search Tool (GFSST) to identify genes with related functions from annotated proteome databases. This search engine lets users design their search targets by gene functions. CONCLUSION: An implementation of GFSST which works on the UniProt (Universal Protein Resource) for the human and mouse proteomes is available at GFSST Web Server. GFSST provides functions not only for similar gene retrieval but also for gene search by one or more GO terms. This represents a powerful new approach for selecting similar genes and gene products from proteome databases according to their functions

    Computational Identification of cis-Regulatory Elements and Prediction of Gene Expression Level

    No full text
    The dissertation focuses on developing computational methods to discover cis -elements in promoter region of co-regulated genes and predict gene expression level using identified cis -elements. Discovering cis -elements in promoter region of co-regulated genes is important in molecular biology research and recently received extensive attention. In my Ph.D. research I developed an algorithm that is faster and more accurate than well-known tools currently in use to identify cis -elements. The HAMMER algorithm searches for subsequences of desired length whose frequency of occurrence is relatively high, while accounting for slightly perturbed variants using hash table and modulo arithmetic. Candidate cis -elements are evaluated using profile matrices and higher-order Markov background model. Simulation results show that the HAMMER algorithm discovers more cis -elements present in the test sequences when compared with two widely used motif-discovery tools (MDScan and AlignACE). The HAMMER algorithm also produces very promising results on real data set which contain many known cis -elements. Based on the cis -elements found by HAMMER algrithm, I further developed an algorithm to identify structured motifs which consists of two simpler patterns ( half-sites ) separated from each other by a gap, with no restriction on the number of nucleotides that may occur within the gap. First, HAMMER algorithm is used to search for individual cis -elements which will be used as half-sites to create structured motifs. These structured motifs are then evaluated based on the relative frequency of the half-sites as well as the distribution of gap length. Unlike other recent structured motif detection algorithm, the new algorithm does not require the gap length to be prespecified. The algorithm has successfully extracted structured motifs on synthetic data and real testing data. Gene expression level is influenced significantly by the presence or absence of cis -elements. I developed several classification systems in which the occurrences of both activator and repressor motifs constitute important inputs in predicting whether a gene will be up-regulated, down-regulated, or neither. I have experimented with several approaches for classification and best preformance was obtained using Support Vector Machine models with linear kernels and a hierarchical structure. On Saccharomces cerevisiae data, the SVM models yielded 71% accuracy for 3-category classification (up-regulated, down-regulated, neutral) and 85% accuracy for 2-category classification (up-regulated, down-regulated)

    Thermomechanical Fatigue Behavior of Spray-Deposited SiCp/Al-Si Composite Applied in the High-Speed Railway Brake Disc

    No full text
    The thermomechanical fatigue (TMF) behaviors of spray-deposited SiCp-reinforced Al-Si alloy were investigated in terms of the size of Si particles and the Si content. Thermomechanical fatigue experiments were conducted in the temperature range of 150-400°C. The cyclic response behavior indicated that the continuous cyclic softening was exhibited for all materials, and the increase in SiC particles size and Si content aggravated the softening degree, which was attributed to dislocation generation due to differential thermal contraction at the Al matrix/Si phase interface or Al matrix/SiC particle interface. Meanwhile, the TMF life and stress amplitude of SiCp/Al-7Si composites were greater than those of Al-7Si alloy, and increased with the increasing SiC particle size, which was associated with “load sharing” of the direct strengthening mechanism. The stress amplitude of 4.5μmSiCp/Al-Si composite increased as the Si content increased; however, the influence of Si content on the TMF life was not so significant. The TMF failure mechanism revealed that the crack mainly initiated at the agglomeration of small-particulate SiC and the breakage of large-particulate SiC, and the broken primary Si and the exfoliated eutectic Si accelerated the crack propagation

    The First High-quality Reference Genome of Sika Deer Provides Insights into High-tannin Adaptation

    No full text
    Sika deer are known to prefer oak leaves, which are rich in tannins and toxic to most mammals; however, the genetic mechanisms underlying their unique ability to adapt to living in the jungle are still unclear. In identifying the mechanism responsible for the tolerance of a highly toxic diet, we have made a major advancement by explaining the genome of sika deer. We generated the first high-quality, chromosome-level genome assembly of sika deer and measured the correlation between tannin intake and RNA expression in 15 tissues through 180 experiments. Comparative genome analyses showed that the UGT and CYP gene families are functionally involved in the adaptation of sika deer to high-tannin food, especially the expansion of the UGT family 2 subfamily B of UGT genes. The first chromosome-level assembly and genetic characterization of the tolerance to a highly toxic diet suggest that the sika deer genome may serve as an essential resource for understanding evolutionary events and tannin adaptation. Our study provides a paradigm of comparative expressive genomics that can be applied to the study of unique biological features in non-model animals
    corecore