42,507 research outputs found

    Set-oriented data mining in relational databases

    Get PDF
    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

    MODELS AND SOLUTIONS FOR THE IMPLEMENTATION OF DISTRIBUTED SYSTEMS

    Get PDF
    Software applications may have different degrees of complexity depending on the problems they try to solve and can integrate very complex elements that bring together functionality that sometimes are competing or conflicting. We can take for example a mobile communications system. Functionalities of such a system are difficult to understand, and they add to the non-functional requirements such as the use in practice, performance, cost, durability and security. The transition from local computer networks to cover large networks that allow millions of machines around the world at speeds exceeding one gigabit per second allowed universal access to data and design of applications that require simultaneous use of computing power of several interconnected systems. The result of these technologies has enabled the evolution from centralized to distributed systems that connect a large number of computers. To enable the exploitation of the advantages of distributed systems one had developed software and communications tools that have enabled the implementation of distributed processing of complex solutions. The objective of this document is to present all the hardware, software and communication tools, closely related to the possibility of their application in integrated social and economic level as a result of globalization and the evolution of e-society. These objectives and national priorities are based on current needs and realities of Romanian society, while being consistent with the requirements of Romania's European orientation towards the knowledge society, strengthening the information society, the target goal representing the accomplishment of e-Romania, with its strategic e-government component. Achieving this objective repositions Romania and gives an advantage for sustainable growth, positive international image, rapid convergence in Europe, inclusion and strengthening areas of high competence, in line with Europe 2020, launched by the European Council in June 2010.information society, databases, distributed systems, e-society, implementation of distributed systems

    Analysis, classification and comparison of scheduling techniques for software transactional memories

    Get PDF
    Transactional Memory (TM) is a practical programming paradigm for developing concurrent applications. Performance is a critical factor for TM implementations, and various studies demonstrated that specialised transaction/thread scheduling support is essential for implementing performance-effective TM systems. After one decade of research, this article reviews the wide variety of scheduling techniques proposed for Software Transactional Memories. Based on peculiarities and differences of the adopted scheduling strategies, we propose a classification of the existing techniques, and we discuss the specific characteristics of each technique. Also, we analyse the results of previous evaluation and comparison studies, and we present the results of a new experimental study encompassing techniques based on different scheduling strategies. Finally, we identify potential strengths and weaknesses of the different techniques, as well as the issues that require to be further investigated

    A new taxonomy for distributed computer systems based upon operating system structure

    Get PDF
    Characteristics of the resource structure found in the operating system are considered as a mechanism for classifying distributed computer systems. Since the operating system resources, themselves, are too diversified to provide a consistent classification, the structure upon which resources are built and shared are examined. The location and control character of this indivisibility provides the taxonomy for separating uniprocessors, computer networks, network computers (fully distributed processing systems or decentralized computers) and algorithm and/or data control multiprocessors. The taxonomy is important because it divides machines into a classification that is relevant or important to the client and not the hardware architect. It also defines the character of the kernel O/S structure needed for future computer systems. What constitutes an operating system for a fully distributed processor is discussed in detail

    A grid-based approach for processing group activity log files

    Get PDF
    The information collected regarding group activity in a collaborative learning environment requires classifying, structuring and processing. The aim is to process this information in order to extract, reveal and provide students and tutors with valuable knowledge, awareness and feedback in order to successfully perform the collaborative learning activity. However, the large amount of information generated during online group activity may be time-consuming to process and, hence, can hinder the real-time delivery of the information. In this study we show how a Grid-based paradigm can be used to effectively process and present the information regarding group activity gathered in the log files under a collaborative environment. The computational power of the Grid makes it possible to process a huge amount of event information, compute statistical results and present them, when needed, to the members of the online group and the tutors, who are geographically distributed.Peer ReviewedPostprint (author's final draft

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739
    corecore