145 research outputs found

    Semantic-Based, Scalable, Decentralized and Dynamic Resource Discovery for Internet-Based Distributed System

    Get PDF
    Resource Discovery (RD) is a key issue in Internet-based distributed sytems such as grid. RD is about locating an appropriate resource/service type that matches the user's application requirements. This is very important, as resource reservation and task scheduling are based on it. Unfortunately, RD in grid is very challenging as resources and users are distributed, resources are heterogeneous in their platforms, status of the resources is dynamic (resources can join or leave the system without any prior notice) and most recently the introduction of a new type of grid called intergrid (grid of grids) with the use of multi middlewares. Such situation requires an RD system that has rich interoperability, scalability, decentralization and dynamism features. However, existing grid RD systems have difficulties to attain these features. Not only that, they lack the review and evaluation studies, which may highlight the gap in achieving the required features. Therefore, this work discusses the problem associated with intergrid RD from two perspectives. First, reviewing and classifying the current grid RD systems in such a way that may be useful for discussing and comparing them. Second, propose a novel RD framework that has the aforementioned required RD features. In the former, we mainly focus on the studies that aim to achieve interoperability in the first place, which are known as RD systems that use semantic information (semantic technology). In particular, we classify such systems based on their qualitative use of the semantic information. We evaluate the classified studies based on their degree of accomplishment of interoperability and the other RD requirements, and draw the future research direction of this field. Meanwhile in the latter, we name the new framework as semantic-based scalable decentralized dynamic RD. The framework further contains two main components which are service description, and service registration and discovery models. The earlier consists of a set of ontologies and services. Ontologies are used as a data model for service description, whereas the services are to accomplish the description process. The service registration is also based on ontology, where nodes of the service (service providers) are classified to some classes according to the ontology concepts, which means each class represents a concept in the ontology. Each class has a head, which is elected among its own class I nodes/members. Head plays the role of a registry in its class and communicates with I the other heads of the classes in a peer to peer manner during the discovery process. We further introduce two intelligent agents to automate the discovery process which are Request Agent (RA) and Description Agent (DA). Eaclj. node is supposed to have both agents. DA describes the service capabilities based on the ontology, and RA I carries the service requests based on the ontology as well. We design a service search I algorithm for the RA that starts the service look up from the class of request origin first, then to the other classes. We finally evaluate the performance of our framework ~ith extensive simulation experiments, the result of which confirms the effectiveness of the proposed system in satisfying the required RD features (interoperability, scalability, decentralization and dynamism). In short, our main contributions are outlined new key taxonomy for the semantic-based grid RD studies; an interoperable semantic description RD component model for intergrid services metadata representation; a semantic distributed registry architecture for indexing service metadata; and an agent-qased service search and selection algorithm. Vll

    Distributed cloud-edge analytics and machine learning for transportation emissions estimation

    Get PDF
    (English) In recent years IoT and Smart Cities have become a popular paradigm of computing that is based on network-enabled devices connected providing different functionalities, from sensor measures to domotic actions. With this paradigm, it is possible to provide to the stakeholders near-realtime information of the field, e.g. the current pollution of the city. Along with the mentioned paradigms, Fog Computing enables computation near the sensors where the data is produced, i.e. Edge nodes. This paradigm provides low latency and fault tolerance given the possible independence of the sensor devices. Moreover, pushing this computation enables derived results in a near-realtime fashion. This ability to push the computation to where the data is produced can be beneficial in many situations, however it also requires to include in the Edge the data preparation processes that ensure the fitness for use of the data as the incoming data can be erroneous. Given this situation, Machine Learning can be useful to correct data and also to produce predictions of the future values. Even though there have been studies regarding on the uses of data at the Edge, to our knowledge there is no evaluation of the different modeling situations and the viability of the approach. Therefore, this thesis aims to evaluate the possibility of building a distributed system that ensures the fitness for use of the incoming data through Machine Learning enabled Data Preparation, estimates the emissions and predicts the future status of the city in a near-realtime fashion. We evaluate the viability through three contributions. The first contribution focuses on forecasting in a distributed scenario with road traffic dataset for evaluation. It provides a robust solution to build a central model. This approach is based on Federated Learning, which allows training models at the Edge nodes and then merging them centrally. This way the models in the Edge can be independent but also can be synchronized. The results show the trade-off between accuracy versions training time and a comparison between low-powered devices versus server-class machines. These analyses show that it is viable to use Machine Learning with this paradigm. The second contribution focuses on a particular use case of ship emission estimation. To estimate exhaust emissions data must be correct, which is not always the case. This contribution explores the different techniques available to correct ship registry data and proposes the usage of simple Machine Learning techniques to do imputation of missing or erroneous values. This contribution analyzes the different variables and their relationship to provide the practitioners with guidelines for correction and data treatment. The results show that with classical Machine Learning it is possible to improve the state-of-the-art results. Moreover, as these algorithms are simple enough, they can be used in an Edge device if required. The third contribution focuses on generating new variables from the ones available with a ship trace dataset obtained from the Automatic Identification System (AIS). We use a pipeline of two different methods, a Neural Networks and a clustering algorithm, to group movements into movement patterns or \emph{behaviors}. We test the predicting power of these behaviors to predict ship type, main engine power, and navigational status. The prediction of the main engine power is compared against the standard technique used in ship emission estimation when the ship registry is missing. Our approach was able to detect 45\% of the otherwise undetected emissions if the baseline method was to be used. As ship navigational status is prone to error, the behaviors found are proposed as an alternative variable based in robust data. These contributions build a framework that can distribute the learning processes and that resists network failures in low-powered devices.(Español) En los últimos años, IoT y las Smart Cities se han convertido en un paradigma popular de computación que se basa en dispositivos conectados a la red que proporcionan diferentes funcionalidades, desde medidas de sensores hasta acciones domóticas. Con este paradigma, es posible tener información en casi tiempo real, como por ejemplo la contaminación actual de la ciudad. Junto con los paradigmas mencionados, Fog Computing permite computar cerca de donde se producen los datos, es decir, los nodos Edge. Este paradigma proporciona baja latencia y tolerancia a fallos dada la posible independencia de los dispositivos sensores. Esta posibilidad puede ser beneficiosa en muchas situaciones, sin embargo, requiere incluir en el Edge los procesos de preparación de datos que aseguran la idoneidad para su uso, ya que los datos entrantes pueden ser erróneos. Ante esta situación, el Machine Learning es útil para corregir datos y también para producir predicciones de los valores futuros. A pesar de que se han realizado estudios sobre los usos de los datos en el Edge, hasta donde sabemos, no hay una evaluación de las diferentes situaciones de modelado y la viabilidad del enfoque. Por lo tanto, esta tesis tiene como objetivo evaluar la posibilidad de construir un sistema distribuido que garantice que los datos sean correctos a través de su preparación con Machine Learning. También el sistema deberá estimar las emisiones y predecir el estado futuro de la ciudad de una manera casi en tiempo real. La viabilidad se evalúa a través a través de tres contribuciones. La primera contribución se centra en escenario distribuido con un conjunto de datos de tráfico vial que proporciona una solución robusta para construir un modelo central. Este enfoque se basa en Federated Learning, que permite entrenar modelos en los nodos Edge y luego fusionarlos de forma centralizada. De esta manera, los modelos en el Edge pueden ser independientes, pero también se pueden sincronizar. Los resultados muestran la comparación de la precisión con un modelo central y uno distribuido y una comparación con dispositivos de bajo consumos contra servidores. Estos análisis muestran que es viable utilizar el Machine Learning en este paradigma. La segunda contribución se centra en un caso de uso particular de estimación de las emisiones de barcos. Para estimar las emisiones, los datos deben ser correctos, cosa que no siempre pasa. Esta contribución explora las diferentes técnicas disponibles para corregir los datos del registro de barcos y propone el uso de técnicas simples de Machine Learning para hacer imputación de valores faltantes o erróneos. Esta contribución analiza las diferentes variables y su relación para proporcionar a los profesionales pautas para la corrección y el tratamiento de datos. Los resultados muestran que con el Machine Learning clásico es posible mejorar los resultados frente a métodos del estado del arte. Además, como estos algoritmos son lo suficientemente simples como para poder utilizarse en dispositivos Edge. La tercera contribución se centra en generar nuevas variables a partir de las disponibles con un conjunto de datos de trazabilidad de barcos obtenido del Sistema AIS. Esto se hace utilizando en conjunto una red neuronal y un algoritmo de agrupación para agrupar los movimientos en patrones de movimiento o comportamientos. Se evalúa su funcionamiento para predecir el tipo de barco, la potencia del motor principal y el estado de navegación. Con esta predicción, nuestro sistema es capaz de detectar el 45% de las emisiones que no se detectan con métodos standard. Como el estado de navegación del barco es propenso a errores, los comportamientos encontrados se proponen como una variable alternativa basada en datos robustos. Estas contribuciones constituyen un marco para distribuir los procesos de aprendizaje y que resiste errores en la red con dispositivos de bajo consumo.Arquitectura de computador

    Semantic-Based, Scalable, Decentralized and Dynamic Resource Discovery for Internet-Based Distributed System

    Get PDF
    Resource Discovery (RD) is a key issue in Internet-based distributed sytems such as grid. RD is about locating an appropriate resource/service type that matches the user's application requirements. This is very important, as resource reservation and task scheduling are based on it. Unfortunately, RD in grid is very challenging as resources and users are distributed, resources are heterogeneous in their platforms, status of the resources is dynamic (resources can join or leave the system without any prior notice) and most recently the introduction of a new type of grid called intergrid (grid of grids) with the use of multi middlewares. Such situation requires an RD system that has rich interoperability, scalability, decentralization and dynamism features. However, existing grid RD systems have difficulties to attain these features. Not only that, they lack the review and evaluation studies, which may highlight the gap in achieving the required features. Therefore, this work discusses the problem associated with intergrid RD from two perspectives. First, reviewing and classifying the current grid RD systems in such a way that may be useful for discussing and comparing them. Second, propose a novel RD framework that has the aforementioned required RD features. In the former, we mainly focus on the studies that aim to achieve interoperability in the first place, which are known as RD systems that use semantic information (semantic technology). In particular, we classify such systems based on their qualitative use of the semantic information. We evaluate the classified studies based on their degree of accomplishment of interoperability and the other RD requirements, and draw the future research direction of this field. Meanwhile in the latter, we name the new framework as semantic-based scalable decentralized dynamic RD. The framework further contains two main components which are service description, and service registration and discovery models. The earlier consists of a set of ontologies and services. Ontologies are used as a data model for service description, whereas the services are to accomplish the description process. The service registration is also based on ontology, where nodes of the service (service providers) are classified to some classes according to the ontology concepts, which means each class represents a concept in the ontology. Each class has a head, which is elected among its own class I nodes/members. Head plays the role of a registry in its class and communicates with I the other heads of the classes in a peer to peer manner during the discovery process. We further introduce two intelligent agents to automate the discovery process which are Request Agent (RA) and Description Agent (DA). Eaclj. node is supposed to have both agents. DA describes the service capabilities based on the ontology, and RA I carries the service requests based on the ontology as well. We design a service search I algorithm for the RA that starts the service look up from the class of request origin first, then to the other classes. We finally evaluate the performance of our framework ~ith extensive simulation experiments, the result of which confirms the effectiveness of the proposed system in satisfying the required RD features (interoperability, scalability, decentralization and dynamism). In short, our main contributions are outlined new key taxonomy for the semantic-based grid RD studies; an interoperable semantic description RD component model for intergrid services metadata representation; a semantic distributed registry architecture for indexing service metadata; and an agent-qased service search and selection algorithm. Vll

    ICE-B 2010:proceedings of the International Conference on e-Business

    Get PDF
    The International Conference on e-Business, ICE-B 2010, aims at bringing together researchers and practitioners who are interested in e-Business technology and its current applications. The mentioned technology relates not only to more low-level technological issues, such as technology platforms and web services, but also to some higher-level issues, such as context awareness and enterprise models, and also the peculiarities of different possible applications of such technology. These are all areas of theoretical and practical importance within the broad scope of e-Business, whose growing importance can be seen from the increasing interest of the IT research community. The areas of the current conference are: (i) e-Business applications; (ii) Enterprise engineering; (iii) Mobility; (iv) Business collaboration and e-Services; (v) Technology platforms. Contributions vary from research-driven to being more practical oriented, reflecting innovative results in the mentioned areas. ICE-B 2010 received 66 submissions, of which 9% were accepted as full papers. Additionally, 27% were presented as short papers and 17% as posters. All papers presented at the conference venue were included in the SciTePress Digital Library. Revised best papers are published by Springer-Verlag in a CCIS Series book

    Coping with new Challenges in Clustering and Biomedical Imaging

    Get PDF
    The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people
    corecore