6,614 research outputs found

    Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of protein coding regions (exons) in DNA sequences using signal processing techniques is an important component of bioinformatics and biological signal processing. In this paper, a new method is presented for the identification of exonic regions in DNA sequences. This method is based on the cross-correlation technique that can identify periodic regions in DNA sequences.</p> <p>Results</p> <p>The method reduces the dependency of window length on identification accuracy. The proposed algorithm is applied to different eukaryotic datasets and the output results are compared with those of other established methods. The proposed method increased the accuracy of exon detection by 4% to 41% relative to the most common digital signal processing methods for exon prediction.</p> <p>Conclusions</p> <p>We demonstrated that periodic signals can be estimated using cross-correlation. In addition, discrete wavelet transform (DWT) can minimise noise while maintaining the signal. The proposed algorithm, which combines cross-correlation and DWT, significantly increases the accuracy of exonic region identification.</p

    Deep-coverage whole genome sequences and blood lipids among 16,324 individuals.

    Get PDF
    Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth &gt;29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia

    Acceptor splice site prediction

    Get PDF
    Gene finding is an important aspect of biological research. The state of gene finding is such that many approaches exist yet the problem itself is still largely unsolved. The various signals involved in gene location and modification offer a window of opportunity for the accurate prediction of genes. Many algorithms attempt to break down the problem of gene prediction into smaller portions focusing on various signals and properties. The individual study of these signals becomes warranted. This work focuses on splice site prediction, and more specifically, acceptor splice site prediction. Several current approaches, weight matrix models and Markov models, are utilized as well as a novel approach known as the log odds ratio. The log odds ratio is found to be able to double the positive predictive value obtained through the other methods. In agreement with a similar work performed by Lukas Habegger those log odds ratio models which incorporate 2nd order Markov models perform favorably. Also, a maximum dependency decomposition is performed which, in congruence with Lukas Habegger’s findings, highlights a position close to that of the branch point sequence as being a position of maximum dependency. These results suggest that maximum dependency decompositions may be a novel method towards examining the elusive branch point sequence in eukaryotic organisms. Lukas Habegger observed a stronger maximum dependency in Leishmania major most likely because of differences between spliceosome function in lower and upper eukaryotes

    The Roles of Gene Duplication, Gene Conversion and Positive Selection in Rodent \u3ci\u3eEsp\u3c/i\u3e and \u3ci\u3eMup\u3c/i\u3e Pheromone Gene Families with Comparison to the \u3ci\u3eAbp\u3c/i\u3e Family

    Get PDF
    Three proteinaceous pheromone families, the androgen-binding proteins (ABPs), the exocrine-gland secreting peptides (ESPs) and the major urinary proteins (MUPs) are encoded by large gene families in the genomes of Mus musculus and Rattus norvegicus. We studied the evolutionary histories of the Mup and Esp genes and compared them with what is known about the Abp genes. Apparently gene conversion has played little if any role in the expansion of the mouse Class A and Class B Mup genes and pseudogenes, and the rat Mups. By contrast, we found evidence of extensive gene conversion in many Esp genes although not in all of them. Our studies of selection identified at least two amino acid sites in Ξ²-sheets as having evolved under positive selection in the mouse Class A and Class B MUPs and in rat MUPs. We show that selection may have acted on the ESPs by determining Ka/Ks for Exon 3 sequences with and without the converted sequence segment. While it appears that purifying selection acted on the ESP signal peptides, the secreted portions of the ESPs probably have undergone much more rapid evolution. When the inner gene converted fragment sequences were removed, eleven Esp paralogs were present in two or more pairs with Ka/Ks \u3e1.0 and thus we propose that positive selection is detectable by this means in at least some mouse Esp paralogs. We compare and contrast the evolutionary histories of all three mouse pheromone gene families in light of their proposed functions in mouse communication
    • …
    corecore