81,506 research outputs found

    Set-Oriented Mining for Association Rules in Relational Databases

    Get PDF
    Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extension

    Set-oriented data mining in relational databases

    Get PDF
    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

    ALIA LIS research environmental scan report

    Get PDF
    Executive summary: An environmental scan of Australian Library and Information Studies (LIS) research was undertaken focusing on the period 2005–2013. This was in response to a brief from ALIA that sought such an analysis to inform its decisions in relation to content of a future research agenda, support, advocacy, and future funding. The investigation was expected to include research priorities of other library and information organisations, topics of research undertaken in Australia, types of research, persons/organisations undertaking research, and how research activities are funded, communicated and applied. The report took into account: research priorities of LIS professional associations both within and outside Australia production of higher degree theses over the period publication by practitioners and academics in both Australian and international publications and grant or other support for research or investigatory projects. METHODOLOGY AND LIMITATIONS: Methodologies employed included: Website analysis for research priorities of LIS organisations Database searching using Trove for higher degree theses Database searching using multiple databases for publications In the case of research in progress and resourcing via grants, methods employed were database searching, consultation and by survey methods The limitations in these approaches are explained in each related Section or Appendix. However, the major limitations were: Poor response to the online survey despite its wide dissemination through ALIA and other associations Inconsistent responses to individual surveys directed specifically at academic departments Coverage of publications by databases, particularly of material outside periodicals Difficulties in categorising document
    • …
    corecore