79,604 research outputs found

    A comparison study on algorithms of detecting long forms for short forms in biomedical text

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases.</p> <p>Method</p> <p>We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases.</p> <p>Results</p> <p>We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases.</p> <p>Availability</p> <p>The web site is <url>http://gauss.dbb.georgetown.edu/liblab/SFThesaurus</url>.</p

    FuzzyRULES-II: A new approach to fuzzy rule induction from numerical data

    Get PDF
    © 2016 The authors and IOS Press. All rights reserved. Data mining is a broad area that integrates research efforts from several fields with the aim of processing large volumes of data into knowledge bases for better decision making. Since numerical and nominal data are equally important in practical data mining applications, dealing with different types of data items are among the most important problems in data mining research and development. This paper introduces a new fuzzy rule induction algorithm, able to deal properly with either numerical or nominal attributes, for the creation of classification and predictive models. To better handle numerical data, fuzzy sets are used to represent intervals in the domains of numerical attributes. Experimental results have shown that the proposed algorithm produces robust and general models that can be used for prediction as well as for classification

    Computing iceberg concept lattices with Titanic

    Get PDF
    International audienceWe introduce the notion of iceberg concept lattices and show their use in knowledge discovery in databases. Iceberg lattices are a conceptual clustering method, which is well suited for analyzing very large databases. They also serve as a condensed representation of frequent itemsets, as starting point for computing bases of association rules, and as a visualization method for association rules. Iceberg concept lattices are based on the theory of Formal Concept Analysis, a mathematical theory with applications in data analysis, information retrieval, and knowledge discovery. We present a new algorithm called TITANIC for computing (iceberg) concept lattices. It is based on data mining techniques with a level-wise approach. In fact, TITANIC can be used for a more general problem: Computing arbitrary closure systems when the closure operator comes along with a so-called weight function. The use of weight functions for computing closure systems has not been discussed in the literature up to now. Applications providing such a weight function include association rule mining, functional dependencies in databases, conceptual clustering, and ontology engineering. The algorithm is experimentally evaluated and compared with Ganter's Next-Closure algorithm. The evaluation shows an important gain in efficiency, especially for weakly correlated data

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

    Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules

    Full text link
    Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Data mining as a tool for environmental scientists

    Get PDF
    Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
    corecore