33 research outputs found

    Concurrent software architectures for exploratory data analysis

    Get PDF
    Decades ago, increased volume of data made manual analysis obsolete and prompted the use of computational tools with interactive user interfaces and rich palette of data visualizations. Yet their classic, desktop-based architectures can no longer cope with the ever-growing size and complexity of data. Next-generation systems for explorative data analysis will be developed on client–server architectures, which already run concurrent software for data analytics but are not tailored to for an engaged, interactive analysis of data and models. In explorative data analysis, the key is the responsiveness of the system and prompt construction of interactive visualizations that can guide the users to uncover interesting data patterns. In this study, we review the current software architectures for distributed data analysis and propose a list of features to be included in the next generation frameworks for exploratory data analysis. The new generation of tools for explorative data analysis will need to address integrated data storage and processing, fast prototyping of data analysis pipelines supported by machine-proposed analysis workflows, preemptive analysis of data, interactivity, and user interfaces for intelligent data visualizations. The systems will rely on a mixture of concurrent software architectures to meet the challenge of seamless integration of explorative data interfaces at client site with management of concurrent data mining procedures on the servers

    Parallel and Distributed Data Mining

    Get PDF

    Concurrent software architectures for exploratory data analysis

    Get PDF
    Decades ago, increased volume of data made manual analysis obsolete and prompted the use of computational tools with interactive user interfaces and rich palette of data visualizations. Yet their classic, desktop-based architectures can no longer cope with the ever-growing size and complexity of data. Next-generation systems for explorative data analysis will be developed on client–server architectures, which already run concurrent software for data analytics but are not tailored to for an engaged, interactive analysis of data and models. In explorative data analysis, the key is the responsiveness of the system and prompt construction of interactive visualizations that can guide the users to uncover interesting data patterns. In this study, we review the current software architectures for distributed data analysis and propose a list of features to be included in the next generation frameworks for exploratory data analysis. The new generation of tools for explorative data analysis will need to address integrated data storage and processing, fast prototyping of data analysis pipelines supported by machine-proposed analysis workflows, preemptive analysis of data, interactivity, and user interfaces for intelligent data visualizations. The systems will rely on a mixture of concurrent software architectures to meet the challenge of seamless integration of explorative data interfaces at client site with management of concurrent data mining procedures on the servers

    Proof-of-Concept Application - Annual Report Year 2

    Get PDF
    This document first gives an introduction to Application Layer Networks and subsequently presents the catallactic resource allocation model and its integration into the middleware architecture of the developed prototype. Furthermore use cases for employed service models in such scenarios are presented as general application scenarios as well as two very detailed cases: Query services and Data Mining services. This work concludes by describing the middleware implementation and evaluation as well as future work in this area. --Grid Computing

    Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix.

    Full text link
    A large number of the population around the world suffers from various disabilities. Disabilities affect not only children but also adults of different professions. Smart technology can assist the disabled population and lead to a comfortable life in an enhanced living environment (ELE). In this paper, we propose an effective voice pathology assessment system that works in a smart home framework. The proposed system takes input from various sensors, and processes the acquired voice signals and electroglottography (EGG) signals. Co-occurrence matrices in different directions and neighborhoods from the spectrograms of these signals were obtained. Several features such as energy, entropy, contrast, and homogeneity from these matrices were calculated and fed into a Gaussian mixture model-based classifier. Experiments were performed with a publicly available database, namely, the Saarbrucken voice database. The results demonstrate the feasibility of the proposed system in light of its high accuracy and speed. The proposed system can be extended to assess other disabilities in an ELE

    Service-Oriented Data Mining

    Get PDF

    Design and Implementation of a control plane for a L2 virtualized network

    Get PDF
    The main objective of this project is to design and implement a virtual network control plane architecture to allow the users to utilise effectively all the network resources at L2. As a first step, an analysis of current state-of-the-art management and control architectures for resource virtualization must be done, for the purpose of providing operational support for the user of the virtual network infrastructure at L2. This project also develops a tool-bench composed by basic components for managing the slicing and virtualized network provision at L2. The control plane of the tool-bench will interact with devices at layer 2 so that end-to-end virtual network services may be offered at this level. The tool-bench will be used to manage the network infrastructure and may eventually allow the users to create and configure their own dedicated and customized network slice at L2. A slice is defined as a network that has been customized to fit the needs of a specific user, and that is running on top of the physical infrastructure, sharing it with other parallel slices. Activities: * Analyse the status and advances in Network Control and Management Plane for virtual network infrastructures. * Define the basic components of the architecture (elements, interfaces) which provide virtualized network resources at L2. Define the interfaces of the control plane and of each basic component. * Define a set of services that the L2 virtualized network will be provided to end users. * Implement the L2 tool-bench

    Integration of Data Mining into Scientific Data Analysis Processes

    Get PDF
    In recent years, using advanced semi-interactive data analysis algorithms such as those from the field of data mining gained more and more importance in life science in general and in particular in bioinformatics, genetics, medicine and biodiversity. Today, there is a trend away from collecting and evaluating data in the context of a specific problem or study only towards extensively collecting data from different sources in repositories which is potentially useful for subsequent analysis, e.g. in the Gene Expression Omnibus (GEO) repository of high throughput gene expression data. At the time the data are collected, it is analysed in a specific context which influences the experimental design. However, the type of analyses that the data will be used for after they have been deposited is not known. Content and data format are focused only to the first experiment, but not to the future re-use. Thus, complex process chains are needed for the analysis of the data. Such process chains need to be supported by the environments that are used to setup analysis solutions. Building specialized software for each individual problem is not a solution, as this effort can only be carried out for huge projects running for several years. Hence, data mining functionality was developed to toolkits, which provide data mining functionality in form of a collection of different components. Depending on the different research questions of the users, the solutions consist of distinct compositions of these components. Today, existing solutions for data mining processes comprise different components that represent different steps in the analysis process. There exist graphical or script-based toolkits for combining such components. The data mining tools, which can serve as components in analysis processes, are based on single computer environments, local data sources and single users. However, analysis scenarios in medical- and bioinformatics have to deal with multi computer environments, distributed data sources and multiple users that have to cooperate. Users need support for integrating data mining into analysis processes in the context of such scenarios, which lacks today. Typically, analysts working with single computer environments face the problem of large data volumes since tools do not address scalability and access to distributed data sources. Distributed environments such as grid environments provide scalability and access to distributed data sources, but the integration of existing components into such environments is complex. In addition, new components often cannot be directly developed in distributed environments. Moreover, in scenarios involving multiple computers, multiple distributed data sources and multiple users, the reuse of components, scripts and analysis processes becomes more important as more steps and configuration are necessary and thus much bigger efforts are needed to develop and set-up a solution. In this thesis we will introduce an approach for supporting interactive and distributed data mining for multiple users based on infrastructure principles that allow building on data mining components and processes that are already available instead of designing of a completely new infrastructure, so that users can keep working with their well-known tools. In order to achieve the integration of data mining into scientific data analysis processes, this thesis proposes an stepwise approach of supporting the user in the development of analysis solutions that include data mining. We see our major contributions as the following: first, we propose an approach to integrate data mining components being developed for a single processor environment into grid environments. By this, we support users in reusing standard data mining components with small effort. The approach is based on a metadata schema definition which is used to grid-enable existing data mining components. Second, we describe an approach for interactively developing data mining scripts in grid environments. The approach efficiently supports users when it is necessary to enhance available components, to develop new data mining components, and to compose these components. Third, building on that, an approach for facilitating the reuse of existing data mining processes based on process patterns is presented. It supports users in scenarios that cover different steps of the data mining process including several components or scripts. The data mining process patterns support the description of data mining processes at different levels of abstraction between the CRISP model as most general and executable workflows as most concrete representation

    Exploring distributed computing tools through data mining tasks

    Get PDF
    Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.Tirar partido dos recursos de CPU disponíveis, do espaço de armazenamento, e de outros recursos de computadores interligados em rede, de modo a que possam trabalhar conjuntamente, são características comuns a todos os grandes projetos de investigação em grid computing. Hoje em dia, a maioria dos laboratórios informáticos dos centros de investigação das instituições de ensino superior encontra-se equipada com poderosos computadores. Constata-se que, na maioria do tempo, estas máquinas não estão a utilizar o seu poder de processamento ou, pelo menos, não o utilizam na sua plenitude. No entanto, para problemas complexos e para a análise de grandes quantidades de dados, são necessários vastos recursos computacionais. Em tais situações, os algoritmos de análise requerem computadores muito potentes e caros, o que reduz o número de utilizadores que podem realizar essas tarefas de análise de dados. Em vez de se utilizarem máquinas individuais dispendiosas, os sistemas de computação distribuída oferecem a possibilidade de se utilizar um conjunto de máquinas muito menos onerosas que realizam a mesma tarefa. Os projectos BOINC e Condor têm sido utilizados com sucesso em trabalhos de investigação científica, em todo o mundo, com um custo reduzido. Neste trabalho, o objetivo principal é explorar ambas as ferramentas de computação distribuída, Condor e BOINC, para que se possa aproveitar os recursos computacionais disponíveis dos computadores, utilizando-os de modo a que os investigadores possam tirar partido deles nos seus trabalhos de investigação. Nesta dissertação, são realizadas tarefas de data mining com diferentes algoritmos de aprendizagem automática, num ambiente de computação distribuída
    corecore