17,346 research outputs found

    Chi-square-based scoring function for categorization of MEDLINE citations

    Full text link
    Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

    Computer Aided Simulation of DNA Fingerprint Amplified Fragment Length Polymophism (AFLP) Using Suffix Tree Indexing and Data Mining

    Get PDF
    AFLP is one of the DNA Fingerprinting techniques which have broad application as genetic marker in various fields. Begin with the DNA sequence digestion using one or more particular restriction enzyme, ligation of the adapters to the overhanging sticky ends followed by DNA fragments amplification using PCR. The PCR reaction uses primers that match the adapter sequence and have some (1 to 3) dditional “selective” bases which could be any bases, this reduces the number of bands that will be amplified. Such technique intended to increase the amplified fragments peculiarity so the polymorphism of the organism being studied could be well visualized by gel electrophoresis. The computer aided of AFLP simulation developed in this research was aimed to predict this electrophoresis result by simulate the digestion, ligation and PCR process using some pattern recognition algorithm applied to the DNA sequence from online databases. Through this simulation the researcher could determine the best combination of restriction enzyme and selective bases for their laboratory experiment. Suffix tree indexing was conducted during the exploration process of the genome sequence (in FASTA format) to find the restriction sites rapidly and create fragments of it. Data modeling enable the system draws the fragments into virtual DNA’s electrophoresis pattern. Data mining accomplish the simulation by exploring overall possible virtual DNA’s electrophoresis pattern and determine the best restriction enzyme and selective bases combination by calculating certain quantitative criteria

    A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

    Full text link
    The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses
    corecore