328,913 research outputs found

    Data Mining for Simple Sequence Repeats in Oil Palm Expressed Sequence Tags

    Get PDF
    Expressed Sequence Tags or ESTs are small pieces of DNA sequence that are generated by sequencing either one or both ends of an expressed gene. ESTs provide researchers with a quick and inexpensive route for discovering new genes, for obtaining data on gene expression and regulation, and for constructing genome maps. Oil palm EST sequences as available in public domain are downloaded. They were grouped and made contigs using CAP3 and Phrap. Microsatellite repeats are located using 5 softwares (MISA, TRA, TROLL, SSRIT, SSR primer). Among the 5 methods MISA is found to be the best. It can elucidate the compound repeat also. Frequency and total number (202) of SSR were detected. Mononucleotide repeat is more abundant especially ‘A/T’ repeats in Oil palm. Flanking primers were designed using primer3, SSR primers. The results of the study are given as an online database ‘MEMCO’ to help Oil palm researchers

    Mining frequent biological sequences based on bitmap without candidate sequence generation

    Get PDF
    Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability

    An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

    Full text link
    The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin

    Data Mining Approach for Amino Acid Sequence Classification

    Get PDF
    Computerized applications are employed all around the world, an enormous amount of data is collected. The essential information contained in large amounts of data is attracting scholars from a variety of disciplines to examine how to extract the hidden knowledge inside them. The technique of obtaining or mining usable and valuable knowledge from enormous amounts of data is known as data mining. Text mining, picture mining, sequential pattern mining, web mining, and so on are all examples of data mining fields. Sequencing mining is one of the most important technologies in this field, as it aids in the discovery of sequential connections in data. Sequence mining is used in a variety of applications, including customers' buying trends analysis, web access trends analysis, atmospheric observation, amino acid sequences, Gene sequencing, and so on. Sequence mining techniques are utilized in protein and DNA analysis for sequence alignment, pattern searching, and pattern categorization. Researchers are exhibiting an interest in the subject of amino acid sequence categorization in the field of amino acid sequence analysis. It has the ability to find recurrent patterns in homologous proteins. This study describes the numerous methods used by numerous studies to categories proteins and gives an overview of the most important sequence classification techniques
    • …
    corecore