30 research outputs found

    Grammar-based distance in progressive multiple sequence alignment

    Get PDF
    Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets

    NR-2L: A Two-Level Predictor for Identifying Nuclear Receptor Subfamilies Based on Sequence-Derived Features

    Get PDF
    Nuclear receptors (NRs) are one of the most abundant classes of transcriptional regulators in animals. They regulate diverse functions, such as homeostasis, reproduction, development and metabolism. Therefore, NRs are a very important target for drug development. Nuclear receptors form a superfamily of phylogenetically related proteins and have been subdivided into different subfamilies due to their domain diversity. In this study, a two-level predictor, called NR-2L, was developed that can be used to identify a query protein as a nuclear receptor or not based on its sequence information alone; if it is, the prediction will be automatically continued to further identify it among the following seven subfamilies: (1) thyroid hormone like (NR1), (2) HNF4-like (NR2), (3) estrogen like, (4) nerve growth factor IB-like (NR4), (5) fushi tarazu-F1 like (NR5), (6) germ cell nuclear factor like (NR6), and (7) knirps like (NR0). The identification was made by the Fuzzy K nearest neighbor (FK-NN) classifier based on the pseudo amino acid composition formed by incorporating various physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components. As a demonstration, it was shown through some benchmark datasets derived from the NucleaRDB and UniProt with low redundancy that the overall success rates achieved by the jackknife test were about 93% and 89% in the first and second level, respectively. The high success rates indicate that the novel two-level predictor can be a useful vehicle for identifying NRs and their subfamilies. As a user-friendly web server, NR-2L is freely accessible at either http://icpr.jci.edu.cn/bioinfo/NR2L or http://www.jci-bioinfo.cn/NR2L. Each job submitted to NR-2L can contain up to 500 query protein sequences and be finished in less than 2 minutes. The less the number of query proteins is, the shorter the time will usually be. All the program codes for NR-2L are available for non-commercial purpose upon request

    Complexity Analysis of Resting-State MEG Activity in Early-Stage Parkinson's Disease Patients

    Get PDF
    The aim of the present study was to analyze resting-state brain activity in patients with Parkinson's disease (PD), a degenerative disorder of the nervous system. Magnetoencephalography (MEG) signals were recorded with a 151-channel whole-head radial gradiometer MEG system in 18 early-stage untreated PD patients and 20 age-matched control subjects. Artifact-free epochs of 4 s (1250 samples) were analyzed with Lempel-Ziv complexity (LZC), applying two- and three-symbol sequence conversion methods. The results showed that MEG signals from PD patients are less complex than control subjects' recordings. We found significant group differences (p-values <0.01) for the 10 major cortical areas analyzed (e.g., bilateral frontal, central, temporal, parietal, and occipital regions). In addition, using receiver-operating characteristic curves with a leave-one-out cross-validation procedure, a classification accuracy of 81.58% was obtained. In order to investigate the best combination of LZC results for classification purposes, a forward stepwise linear discriminant analysis with leave-one out cross-validation was employed. LZC results (three-symbol sequence conversion) from right parietal and temporal brain regions were automatically selected by the model. With this procedure, an accuracy of 84.21% (77.78% sensitivity, 90.0% specificity) was achieved. Our findings demonstrate the usefulness of LZC to detect an abnormal type of dynamics associated with PD

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research

    The background in the 0νββ0\nu \beta \beta 0 ν β β experiment Gerda

    Get PDF

    Promoter shuffling has occurred during the evolution of the vertebrate growth hormone gene

    No full text
    Comparative studies of vertebrate gene promoter regions seldom detect gross rearrangements ('promoter shuffling') since such analyses usually employ relatively similar DNA sequences. Conversely, attempts to compare evolutionarily more divergent promoter sequences have been largely unsuccessful owing to the inability of conventional alignment procedures to deal with gross rearrangements. These limitations have been circumvented in the present study by using the novel technique of complexity analysis to identify modular components ('blocks') in the growth hormone (GH) gene promoter sequences of some 22 vertebrate species, from salmon to human. Significant rearrangement of blocks was found to have occurred, indicating that they have evolved as independent units. Some blocks appear to be ubiquitous, whereas others are restricted to a specific taxon. Considerable variation between orthologous GH gene promoters was apparent in terms of block length, copy number and relative location. It may be inferred that a wide variety of different mutational mechanisms have operated upon the GH gene promoter over evolutionary time. These include gross changes such as deletion, duplication, amplification, elongation, contraction, transposition, inversion and fusion, as well as the slow, steady accumulation of single base-pair substitutions. Thus the patchwork structure of the modular GH promoter region, and those of its paralogous GH2 and prolactin (PRL) counterparts, have continually been shuffled into new combinations through the rearrangement of pre-existing blocks. Although some of these changes may have had no influence on promoter function, others could have served to alter either the level of gene expression or the responsiveness of the promoter to external stimuli
    corecore