68 research outputs found

    Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics

    Get PDF
    The initiate site of DNA replication is called origins of replication (ORI) which is regulated by a set of regulatory proteins and plays important roles in the basic biochemical process during cell growth and division in all living organisms. Therefore, the study of ORIs is essential for understanding the cell-division cycle and gene expression regulation so that scholars can develop a new strategy against genetic diseases by using the knowledge of DNA replication. Thus, the accurate identification of ORIs will provide key clues for DNA replication research and clinical medicine. Although, the conventional experiments could provide accurate results, they are time-consuming and cost ineffective. On the contrary, bioinformatics-based methods can overcome these shortcomings. Especially, with the emergence of DNA sequences in the post-genomic era, it is highly expected to develop high throughput tools to identify ORIs based on sequence information. In this review, we will summarize the current progress in computational prediction of eukaryotic ORIs including the collection of benchmark dataset, the application of machine learning-based techniques, the results obtained by these methods, and the construction of web servers. Finally, we gave the future perspectives on ORIs prediction. The review provided readers with a whole background of ORIs prediction based on machine learning methods, which will be helpful for researchers to study DNA replication in-depth and drug therapy of genetic defect

    Spin-resolved imaging of atomic-scale helimagnetism in monolayer NiI2

    Full text link
    Identifying intrinsic noncollinear magnetic order in monolayer van der Waals (vdW) crystals is highly desirable for understanding the delicate magnetic interactions at reduced spatial constraints and miniaturized spintronic applications, but remains elusive in experiments. Here, we achieved spin-resolved imaging of helimagnetism at atomic scale in monolayer NiI2 crystals, that were grown on graphene-covered SiC(0001) substrate, using spin-polarized scanning tunneling microscopy. Our experiments identify the existence of a spin spiral state with canted plane in monolayer NiI2. The spin modulation Q vector of the spin spiral is determined as (0.2203, 0, 0), which is different from its bulk value or its in-plane projection, but agrees well with our first principles calculations. The spin spiral surprisingly indicates collective spin switching behavior under magnetic field, whose origin is ascribed to the incommensurability between the spin spiral and the crystal lattice. Our work unambiguously identifies the helimagnetic state in monolayer NiI2, paving the way for illuminating its expected type-II multiferroic order and developing spintronic devices based on vdW magnets.Comment: 22 pages, 4 figure

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Recent Advances in Conotoxin Classification by Using Machine Learning Methods

    No full text
    Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research

    Accurate identification of DNA replication origin by fusing epigenomics and chromatin interaction information

    No full text
    DNA replication initiation is a complex process involving various genetic and epigenomic signatures. The correct identification of replication origins (ORIs) could provide important clues for the study of a variety of diseases caused by replication. Here, we design a computational approach named iORI-Epi to recognize ORIs by incorporating epigenome-based features, sequence-based features, and 3D genome-based features. The iORI-Epi displays excellent robustness and generalization ability on both training datasets and independent datasets of K562 cell line. Further experiments confirm that iORI-Epi is highly scalable in other cell lines (MCF7 and HCT116). We also analyze and clarify the regulatory role of epigenomic marks, DNA motifs, and chromatin interaction in DNA replication initiation of eukaryotic genomes. Finally, we discuss gene enrichment pathways from the perspective of ORIs in different replication timing states and heuristically dissect the effect of promoters on replication initiation. Our computational methodology is worth extending to ORI identification in other eukaryotic species.Published versio

    Detection of QTLs for Plant Height Related Traits in Brassica napus

    No full text

    Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods

    No full text
    Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins

    Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in <i>Geobacter pickeringii</i> by Using Correlation-Based Feature Selection Technique

    No full text
    4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model

    Dispersal of the Japanese pine sawyer, Monochamus alternatus (Coleoptera: Cerambycidae), in mainland China as inferred from molecular data and associations to indices of human activity.

    Get PDF
    The Japanese pine sawyer, Monochamus alternatus Hope (Coleoptera: Cerambycidae), is an important forest pest as well as the principal vector of the pinewood nematode (PWN), Bursaphelenchus xylophilus (Steiner et Buhrer), in mainland China. Despite the economic importance of this insect-disease complex, only a few studies are available on the population genetic structure of M. alternatus and the relationship between its historic dispersal pattern and various human activities. The aim of the present study was to further explore aspects of human activity on the population genetic structure of M. alternatus in mainland China. The molecular data based on the combined mitochondrial cox1 and cox2 gene fragments from 140 individuals representing 14 Chinese populations yielded 54 haplotypes. Overall, a historical (natural) expansion that originated from China's eastern coast to the western interior was revealed by the haplotype network, as well as several recent, long-distant population exchanges. Correlation analysis suggested that regional economic status and proximity to marine ports significantly influenced the population genetic structure of M. alternatus as indicated by both the ratio of shared haplotypes and the haplotype diversity, however, the PWN distribution in China was significantly correlated with only the ratio of shared haplotypes. Our results suggested that the modern logistical network (i.e., the transportation system) in China is a key medium by which humans have brought about population exchange of M. alternatus in mainland China, likely through inadvertent movement of infested wood packaging material associated with trade, and that this genetic exchange was primarily from the economically well-developed east coast of China, westward, to the less-developed interior. In addition, this study demonstrated the existence of non-local M. alternatus in new PWN-infested localities in China, but not all sites with non-local M. alternatus were infested with PWN

    Human C-reactive Protein (CRP) Gene 1059 G > C Polymorphism is Associated with Plasma CRP Concentration in Patients Receiving Coronary Angiography

    Get PDF
    Elevation of C-reactive protein (CRP) level is associated with increased risk of cardiovascular events. The 1059 G > C polymorphism in exon 2 of the CRP gene has been shown to affect plasma concentration of CRP. We want to elucidate the effect of this polymorphism on the development of coronary artery disease (CAD) among the Chinese population in Taiwan. Methods: We scrutinized 536 patients undergoing coronary angiography (365 patients with CAD and 171 controls with patent coronaries) and evaluated the association of CRP gene 1059 G > C polymorphism with CAD. Genotyping of the polymorphism was performed by polymerase chain reaction and MaeIII restriction enzyme digestion. Results: The CC genotype was associated with lower plasma CRP concentration (GG, 6.5 ± 5.8; GC, 3.3 ± 4.4; CC, 2.3 ± 3.1 mg/L; p = 0.02). Subjects with CAD or myocardial infarction (MI) had significantly higher plasma CRP concentration than that in controls (CAD vs. controls, 8.9 ± 18.9 vs. 3.3 ± 7.2 mg/L; p < 0.001), while patients with MI showed higher CRP when compared to those with chronic stable angina (13.5 ± 22.9 vs. 5.2 ± 14.1 mg/L; p < 0.001). However, this polymorphism was not associated with CAD in our population. Conclusion: Our data suggest that human CRP gene 1059 G > C polymorphism is associated with plasma CRP concentration among Chinese in Taiwan receiving coronary angiography
    corecore