44,710 research outputs found

    Managing large collections of data mining models

    Full text link

    Meeting of the MINDS: an information retrieval research agenda

    Get PDF
    Since its inception in the late 1950s, the field of Information Retrieval (IR) has developed tools that help people find, organize, and analyze information. The key early influences on the field are well-known. Among them are H. P. Luhn's pioneering work, the development of the vector space retrieval model by Salton and his students, Cleverdon's development of the Cranfield experimental methodology, Spärck Jones' development of idf, and a series of probabilistic retrieval models by Robertson and Croft. Until the development of the WorldWideWeb (Web), IR was of greatest interest to professional information analysts such as librarians, intelligence analysts, the legal community, and the pharmaceutical industry

    Illinois Digital Scholarship: Preserving and Accessing the Digital Past, Present, and Future

    Get PDF
    Since the University's establishment in 1867, its scholarly output has been issued primarily in print, and the University Library and Archives have been readily able to collect, preserve, and to provide access to that output. Today, technological, economic, political and social forces are buffeting all means of scholarly communication. Scholars, academic institutions and publishers are engaged in debate about the impact of digital scholarship and open access publishing on the promotion and tenure process. The upsurge in digital scholarship affects many aspects of the academic enterprise, including how we record, evaluate, preserve, organize and disseminate scholarly work. The result has left the Library with no ready means by which to archive digitally produced publications, reports, presentations, and learning objects, much of which cannot be adequately represented in print form. In this incredibly fluid environment of digital scholarship, the critical question of how we will collect, preserve, and manage access to this important part of the University scholarly record demands a rational and forward-looking plan - one that includes perspectives from diverse scholarly disciplines, incorporates significant research breakthroughs in information science and computer science, and makes effective projections for future integration within the Library and computing services as a part of the campus infrastructure.Prepared jointly by the University of Illinois Library and CITES at the University of Illinois at Urbana-Champaig

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Using Biotic Interaction Networks for Prediction in Biodiversity and Emerging Diseases

    Get PDF
    Networks offer a powerful tool for understanding and visualizing inter-species interactions within an ecology. Previously considered examples, such as trophic networks, are just representations of experimentally observed direct interactions. However, species interactions are so rich and complex it is not feasible to directly observe more than a small fraction. In this paper, using data mining techniques, we show how potential interactions can be inferred from geographic data, rather than by direct observation. An important application area for such a methodology is that of emerging diseases, where, often, little is known about inter-species interactions, such as between vectors and reservoirs. Here, we show how using geographic data, biotic interaction networks that model statistical dependencies between species distributions can be used to infer and understand inter-species interactions. Furthermore, we show how such networks can be used to build prediction models. For example, for predicting the most important reservoirs of a disease, or the degree of disease risk associated with a geographical area. We illustrate the general methodology by considering an important emerging disease - Leishmaniasis. This data mining approach allows for the use of geographic data to construct inferential biotic interaction networks which can then be used to build prediction models with a wide range of applications in ecology, biodiversity and emerging diseases

    Managing the KM Trade-Off: Knowledge Centralization versus Distribution

    Get PDF
    KM is more an archipelago of theories and practices rather than a monolithic approach. We propose a conceptual map that organizes some major approaches to KM according to their assumptions on the nature of knowledge. The paper introduces the two major views on knowledge ­objectivist, subjectivist - and explodes each of them into two major approaches to KM: knowledge as a market, and knowledge as intellectual capital (the objectivistic perspective); knowledge as mental models, and knowledge as practice (the subjectivist perspective). We argue that the dichotomy between objective and subjective approaches is intrinsic to KM within complex organizations, as each side of the dichotomy responds to different, and often conflicting, needs: on the one hand, the need to maximize the value of knowledge through its replication; on the other hand, the need to keep knowledge appropriate to an increasingly complex and changing environment. Moreover, as a proposal for a deeper discussion, such trade-off will be suggested as the origin of other relevant KM related trade-offs that will be listed. Managing these trade-offs will be proposed as a main challenge of KM

    Random Indexing K-tree

    Get PDF
    Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere
    corecore