56 research outputs found

    NCNet: Deep Learning Network Models for Predicting Function of Non-coding DNA

    Get PDF
    The human genome consists of 98.5% non-coding DNA sequences, and most of them have no known function. However, a majority of disease-associated variants lie in these regions. Therefore, it is critical to predict the function of non-coding DNA. Hence, we propose the NCNet, which integrates deep residual learning and sequence-to-sequence learning networks, to predict the transcription factor (TF) binding sites, which can then be used to predict non-coding functions. In NCNet, deep residual learning networks are used to enhance the identification rate of regulatory patterns of motifs, so that the sequence-to-sequence learning network may make the most out of the sequential dependency between the patterns. With the identity shortcut technique and deep architectures of the networks, NCNet achieves significant improvement compared to the original hybrid model in identifying regulatory markers

    Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

    Get PDF
    Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used

    Disease burden and related medical costs of rotavirus infections in Taiwan

    Get PDF
    BACKGROUND: The disease burden and associated medical costs of rotavirus infections in inpatient and outpatient sectors in Taiwan were examined in anticipation of the availability of new rotavirus vaccines. METHODS: The yearly national case number and medical costs for all for inpatients and outpatients with acute gastroenteritis (AGE) were extracted from the Bureau of National Health Insurance database in Taiwan according to ICD-9-CM codes. A retrospective study was also performed using records of children with AGE seen at three hospitals in Taiwan in 2001 to identify laboratory confirmed rotavirus infection cases. The annual incidence and related medical costs of AGE due to rotavirus infection were then estimated. RESULTS: Children <5 years old comprised 83.6% of inpatient and 62.0% of outpatient pediatric AGE cases in Taiwan in 2001. Rotavirus was the most common agent detected among AGE patients in this age group in the three hospitals, and was detected in 32.9% (221/672) of inpatient and 24% (23/96) of outpatient stool specimens tested for microbial etiologies. An estimated 277,400 to 624,892 cases of rotavirus infections sought medical care in Taiwan in 2001, equaling one in 2 to 5 children <5 years old required medical care due to rotavirus infection. The incidence of hospitalization due to rotavirus infections was 1,528–1,997/100,000 for children <5 years old. The total associated medical costs due to rotavirus infection were estimated at US $10–16 millions in Taiwan in 2001. Although the per-capita medical cost of rotavirus infection was lower in Taiwan than in the United States or Hong Kong, the personal economic burden was similar among the three places when normalized for gross national incomes per capita. CONCLUSION: Infections caused by rotavirus constitute an important human and economic burden among young children in Taiwan. A safe and effective vaccine is urgently needed

    Molecular imprinting science and technology: a survey of the literature for the years 2004-2011

    Full text link

    Feature amplified voting algorithm for functional analysis of protein superfamily

    No full text
    [[abstract]]Background: Identifying the regions associated with protein function is a singularly important task in the post-genomic era. Biological studies often identify functional enzyme residues by amino acid sequences, particularly when related structural information is unavailable. In some cases of protein superfamilies, functional residues are difficult to detect by current alignment tools or evolutionary strategies when phylogenetic relationships do not parallel their protein functions. The solution proposed in this study is Feature Amplified Voting Algorithm with Three-profile alignment (FAVAT). The core concept of FAVAT is to reveal the desired features of a target enzyme or protein by voting on three different property groups aligned by three-profile alignment method. Functional residues of a target protein can then be retrieved by FAVAT analysis. In this study, the amidohydrolase superfamily was an interesting case for verifying the proposed approach because it contains divergent enzymes and proteins. Results: The FAVAT was used to identify critical residues of mammalian imidase, a member of the amidohydrolase superfamily. Members of this superfamily were first classified by their functional properties and sources of original organisms. After FAVAT analysis, candidate residues were identified and compared to a bacterial hydantoinase in which the crystal structure (1GKQ) has been fully elucidated. One modified lysine, three histidines and one aspartate were found to participate in the coordination of metal ions in the active site. The FAVAT analysis also redressed the misrecognition of metal coordinator Asp57 by the multiple sequence alignment (MSA) method. Several other amino acid residues known to be related to the function or structure of mammalian imidase were also identified. Conclusions: The FAVAT is shown to predict functionally important amino acids in amidohydrolase superfamily. This strategy effectively identifies functionally important residues by analyzing the discrepancy between the sequence and functional properties of related proteins in a superfamily, and it should be applicable to other protein families.[[fileno]]2030220010044[[department]]資訊工程學

    Introducing variable gap penalties into three-sequence alignment for protein sequences

    No full text
    [[abstract]]The common-use gap penalty strategies, constant penalty and affine gap penalty, have been adopted in the traditional three-sequence alignment algorithm which considers the insertion, deletion and substitution. However, these strategies are not suitable to protein sequence alignments. For the alignment accuracy of protein sequences, the gap penalty is a major determinant. Incorporating protein structure information to vary the gap penalties can lead to more biologically correct alignments. Here, we present an algorithm to find a global and optimal alignment among three protein sequences by using position-specific gap penalties which allow gap penalties to be varied. Thus, residue-dependent information and protein structure information can be applied to the three-sequence alignment. The experimental results show that our algorithm achieves the significant improvement in the accuracy of alignments than the three-sequence alignment algorithm with the affine gap penalty for protein sequences.[[fileno]]2030209010056[[department]]資訊工程學
    corecore