128,701 research outputs found

    Unsupervised Extraction of Representative Concepts from Scientific Literature

    Full text link
    This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human effort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201

    Temporal fuzzy association rule mining with 2-tuple linguistic representation

    Get PDF
    This paper reports on an approach that contributes towards the problem of discovering fuzzy association rules that exhibit a temporal pattern. The novel application of the 2-tuple linguistic representation identifies fuzzy association rules in a temporal context, whilst maintaining the interpretability of linguistic terms. Iterative Rule Learning (IRL) with a Genetic Algorithm (GA) simultaneously induces rules and tunes the membership functions. The discovered rules were compared with those from a traditional method of discovering fuzzy association rules and results demonstrate how the traditional method can loose information because rules occur at the intersection of membership function boundaries. New information can be mined from the proposed approach by improving upon rules discovered with the traditional method and by discovering new rules

    Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results

    Get PDF
    The analysis and an efficient scientific exploration of the Digital Palomar Observatory Sky Survey (DPOSS) represents a major technical challenge. The input data set consists of 3 Terabytes of pixel information, and contains a few billion sources. We describe some of the specific scientific problems posed by the data, including searches for distant quasars and clusters of galaxies, and the data-mining techniques we are exploring in addressing them. Machine-assisted discovery methods may become essential for the analysis of such multi-Terabyte data sets. New and future approaches involve unsupervised classification and clustering analysis in the Giga-object data space, including various Bayesian techniques. In addition to the searches for known types of objects in this data base, these techniques may also offer the possibility of discovering previously unknown, rare types of astronomical objects.Comment: Invited paper, to appear in Applications of Digital Image Processing XX, ed. A. Tescher, Proc. S.P.I.E. vol. 3164, in press; 10 pages, a self-contained TeX file, and 3 separate postscript figure
    corecore