65 research outputs found

    Fouille de données : vers une nouvelle approche intégrant de façon cohérente et transparente la composante spatiale

    Get PDF
    Depuis quelques décennies, on assiste à une présence de plus en plus accrue de l’information géo-spatiale au sein des organisations. Cela a eu pour conséquence un stockage massif d’informations de ce type. Ce phénomène, combiné au potentiel d’informations que renferment ces données, on fait naître le besoin d’en apprendre davantage sur elles, de les utiliser à des fins d’extraction de connaissances qui puissent servir de support au processus de décision de l’entreprise. Pour cela, plusieurs approches ont été envisagées dont premièrement la mise à contribution des outils de fouille de données « traditionnelle ». Mais face à la particularité de l’information géo-spatiale, cette approche s’est soldée par un échec. De cela, est apparue la nécessité d’ériger le processus d’extraction de connaissances à partir de données géographiques en un domaine à part entière : le Geographic Knowlegde Discovery (GKD). La réponse à cette problématique, par le GKD, s’est traduite par la mise en œuvre d’approches qu’on peut catégoriser en deux grandes catégories: les approches dites de prétraitement et celles de traitement dynamique de l’information spatiale. Pour faire face aux limites de ces méthodes et outils nous proposons une nouvelle approche intégrée qui exploite l’existant en matière de fouille de données « traditionnelle ». Cette approche, à cheval entre les deux précédentes vise comme objectif principal, le support du type géo-spatial à toutes les étapes du processus de fouille de données. Pour cela, cette approche s’attachera à exploiter les relations usuelles que les entités géo-spatiales entretiennent entre elles. Un cadre viendra par la suite décrire comment cette approche supporte la composante spatiale en mettant à contribution des bibliothèques de traitement de la donnée géo-spatiale et les outils de fouille « traditionnelle »In recent decades, geospatial data has been more and more present within our organization. This has resulted in massive storage of such information and this, combined with the learning potential of such information, gives birth to the need to learn from these data, to extract knowledge that can be useful in supporting decision-making process. For this purpose, several approaches have been proposed. Among this, the first has been to deal with existing data mining tools in order to extract any knowledge of such data. But due to a specificity of geospatial information, this approach failed. From this arose the need to erect the process of extracting knowledge from geospatial data in its own right; this lead to Geographic Knowledge Discovery. The answer to this problem, by GKD, is reflected in the implementation of approaches that can be categorized into two: the so-called pre-processing approaches and the dynamic treatment of spatial relationships. Given the limitations of these approaches we propose a new approach that exploits the existing data mining tools. This approach can be seen as a compromise of the two previous. It main objective is to support geospatial data type during all steps of data mining process. To do this, the proposed approach will exploit the usual relationships that geo-spatial entities share each other. A framework will then describe how this approach supports the spatial component involving geo-spatial libraries and "traditional" data mining tool

    Model analytics and management

    Get PDF

    Model analytics and management

    Get PDF

    A Smart Products Lifecycle Management (sPLM) Framework - Modeling for Conceptualization, Interoperability, and Modularity

    Get PDF
    Autonomy and intelligence have been built into many of today’s mechatronic products, taking advantage of low-cost sensors and advanced data analytics technologies. Design of product intelligence (enabled by analytics capabilities) is no longer a trivial or additional option for the product development. The objective of this research is aimed at addressing the challenges raised by the new data-driven design paradigm for smart products development, in which the product itself and the smartness require to be carefully co-constructed. A smart product can be seen as specific compositions and configurations of its physical components to form the body, its analytics models to implement the intelligence, evolving along its lifecycle stages. Based on this view, the contribution of this research is to expand the “Product Lifecycle Management (PLM)” concept traditionally for physical products to data-based products. As a result, a Smart Products Lifecycle Management (sPLM) framework is conceptualized based on a high-dimensional Smart Product Hypercube (sPH) representation and decomposition. First, the sPLM addresses the interoperability issues by developing a Smart Component data model to uniformly represent and compose physical component models created by engineers and analytics models created by data scientists. Second, the sPLM implements an NPD3 process model that incorporates formal data analytics process into the new product development (NPD) process model, in order to support the transdisciplinary information flows and team interactions between engineers and data scientists. Third, the sPLM addresses the issues related to product definition, modular design, product configuration, and lifecycle management of analytics models, by adapting the theoretical frameworks and methods for traditional product design and development. An sPLM proof-of-concept platform had been implemented for validation of the concepts and methodologies developed throughout the research work. The sPLM platform provides a shared data repository to manage the product-, process-, and configuration-related knowledge for smart products development. It also provides a collaborative environment to facilitate transdisciplinary collaboration between product engineers and data scientists

    Ferramentas open source de Data Mining

    Get PDF
    Em época de crise financeira, as ferramentas open source de data mining representam uma nova tendência na investigação, educação e nas aplicações industriais, especialmente para as pequenas e médias empresas. Com o software open source, estas podem facilmente iniciar um projeto de data mining usando as tecnologias mais recentes, sem se preocuparem com os custos de aquisição das mesmas, podendo apostar na aprendizagem dos seus colaboradores. Os sistemas open source proporcionam o acesso ao código, facilitando aos colaboradores a compreensão dos sistemas e algoritmos e permitindo que estes o adaptem às necessidades dos seus projetos. No entanto, existem algumas questões inerentes ao uso deste tipo de ferramenta. Uma das mais importantes é a diversidade, e descobrir, tardiamente, que a ferramenta escolhida é inapropriada para os objetivos do nosso negócio pode ser um problema grave. Como o número de ferramentas de data mining continua a crescer, a escolha sobre aquela que é realmente mais apropriada ao nosso negócio torna-se cada vez mais difícil. O presente estudo aborda um conjunto de ferramentas de data mining, de acordo com as suas características e funcionalidades. As ferramentas abordadas provém da listagem do KDnuggets referente a Software Suites de Data Mining. Posteriormente, são identificadas as que reúnem melhores condições de trabalho, que por sua vez são as mais populares nas comunidades, e é feito um teste prático com datasets reais. Os testes pretendem identificar como reagem as ferramentas a cenários diferentes do tipo: performance no processamento de grandes volumes de dados; precisão de resultados; etc. Nos tempos que correm, as ferramentas de data mining open source representam uma oportunidade para os seus utilizadores, principalmente para as pequenas e médias empresas, deste modo, os resultados deste estudo pretendem ajudar no processo de tomada de decisão relativamente às mesmas

    Proposal of an adaptive infotainment system depending on driving scenario complexity

    Get PDF
    Tesi en modalitat Doctorat industrialPla de Doctorat industrial de la Generalitat de CatalunyaThe PhD research project is framed within the plan of industrial doctorates of the “Generalitat de Catalunya”. During the investigation, most of the work was carried out at the facilities of the vehicle manufacturer SEAT, specifically at the information and entertainment (infotainment) department. In the same way, there was a continuous cooperation with the telematics department of the UPC. The main objective of the project consisted in the design and validation of an adaptive infotainment system dependent on the driving complexity. The system was created with the purpose of increasing driver’ experience while guaranteeing a proper level of road safety. Given the increasing number of application and services available in current infotainment systems, it becomes necessary to devise a system capable of balancing these two counterparts. The most relevant parameters that can be used for balancing these metrics while driving are: type of services offered, interfaces available for interacting with the services, the complexity of driving and the profile of the driver. The present study can be divided into two main development phases, each phase had as outcome a real physical block that came to be part of the final system. The final system was integrated in a vehicle and validated in real driving conditions. The first phase consisted in the creation of a model capable of estimating the driving complexity based on a set of variables related to driving. The model was built by employing machine learning methods and the dataset necessary to create it was collected from several driving routes carried out by different participants. This phase allowed to create a model capable of estimating, with a satisfactory accuracy, the complexity of the road using easily extractable variables in any modern vehicle. This approach simplify the implementation of this algorithm in current vehicles. The second phase consisted in the classification of a set of principles that allow the design of the adaptive infotainment system based on the complexity of the road. These principles are defined based on previous researches undertaken in the field of usability and user experience of graphical interfaces. According to these of principles, a real adaptive infotainment system with the most commonly used functionalities; navigation, radio and media was designed and integrated in a real vehicle. The developed system was able to adapt the presentation of the content according to the estimation of the driving complexity given by the block developed in phase one. The adaptive system was validated in real driving scenarios by several participants and results showed a high level of acceptance and satisfaction towards this adaptive infotainment. As a starting point for future research, a proof of concept was carried out to integrate new interfaces into a vehicle. The interface used as reference was a Head Mounted screen that offered redundant information in relation to the instrument cluster. Tests with participants served to understand how users perceive the introduction of new technologies and how objective benefits could be blurred by initial biases.El proyecto de investigación de doctorado se enmarca dentro del plan de doctorados industriales de la Generalitat de Catalunya. Durante la investigación, la mayor parte del trabajo se llevó a cabo en las instalaciones del fabricante de vehículos SEAT, específicamente en el departamento de información y entretenimiento (infotainment). Del mismo modo, hubo una cooperación continua con el departamento de telemática de la UPC. El objetivo principal del proyecto consistió en el diseño y la validación de un sistema de información y entretenimiento adaptativo que se ajustaba de acuerdo a la complejidad de la conducción. El sistema fue creado con el propósito de aumentar la experiencia del conductor y garantizar un nivel adecuado en la seguridad vial. El proyecto surge dado el número creciente de aplicaciones y servicios disponibles en los sistemas actuales de información y entretenimiento; es por ello que se hace necesario contar con un sistema capaz de equilibrar estas dos contrapartes. Los parámetros más relevantes que se pueden usar para equilibrar estas métricas durante la conducción son: el tipo de servicios ofrecidos, las interfaces disponibles para interactuar con los servicios, la complejidad de la conducción y el perfil del conductor. El presente estudio se puede dividir en dos fases principales de desarrollo, cada fase tuvo como resultado un componente que se convirtió en parte del sistema final. El sistema final fue integrado en un vehículo y validado en condiciones reales de conducción. La primera fase consistió en la creación de un modelo capaz de estimar la complejidad de la conducción en base a un conjunto de variables relacionadas con la conducción. El modelo se construyó empleando "Machine Learning Methods" y el conjunto de datos necesario para crearlo se recopiló a partir de varias rutas de conducción realizadas por diferentes participantes. Esta fase permitió crear un modelo capaz de estimar, con una precisión satisfactoria, la complejidad de la carretera utilizando variables fácilmente extraíbles en cualquier vehículo moderno. Este enfoque simplifica la implementación de este algoritmo en los vehículos actuales. La segunda fase consistió en la clasificación de un conjunto de principios que permiten el diseño del sistema de información y entretenimiento adaptativo basado en la complejidad de la carretera. Estos principios se definen en base a investigaciones anteriores realizadas en el campo de usabilidad y experiencia del usuario con interfaces gráficas. De acuerdo con estos principios, un sistema de entretenimiento y entretenimiento real integrando las funcionalidades más utilizadas; navegación, radio y audio fue diseñado e integrado en un vehículo real. El sistema desarrollado pudo adaptar la presentación del contenido según la estimación de la complejidad de conducción dada por el bloque desarrollado en la primera fase. El sistema adaptativo fue validado en escenarios de conducción reales por varios participantes y los resultados mostraron un alto nivel de aceptación y satisfacción hacia este entretenimiento informativo adaptativo. Como punto de partida para futuras investigaciones, se llevó a cabo una prueba de concepto para integrar nuevas interfaces en un vehículo. La interfaz utilizada como referencia era una pantalla a la altura de los ojos (Head Mounted Display) que ofrecía información redundante en relación con el grupo de instrumentos. Las pruebas con los participantes sirvieron para comprender cómo perciben los usuarios la introducción de nuevas tecnologías y cómo los sesgos iniciales podrían difuminar los beneficios.Postprint (published version

    Proposal of an adaptive infotainment system depending on driving scenario complexity

    Get PDF
    The PhD research project is framed within the plan of industrial doctorates of the “Generalitat de Catalunya”. During the investigation, most of the work was carried out at the facilities of the vehicle manufacturer SEAT, specifically at the information and entertainment (infotainment) department. In the same way, there was a continuous cooperation with the telematics department of the UPC. The main objective of the project consisted in the design and validation of an adaptive infotainment system dependent on the driving complexity. The system was created with the purpose of increasing driver’ experience while guaranteeing a proper level of road safety. Given the increasing number of application and services available in current infotainment systems, it becomes necessary to devise a system capable of balancing these two counterparts. The most relevant parameters that can be used for balancing these metrics while driving are: type of services offered, interfaces available for interacting with the services, the complexity of driving and the profile of the driver. The present study can be divided into two main development phases, each phase had as outcome a real physical block that came to be part of the final system. The final system was integrated in a vehicle and validated in real driving conditions. The first phase consisted in the creation of a model capable of estimating the driving complexity based on a set of variables related to driving. The model was built by employing machine learning methods and the dataset necessary to create it was collected from several driving routes carried out by different participants. This phase allowed to create a model capable of estimating, with a satisfactory accuracy, the complexity of the road using easily extractable variables in any modern vehicle. This approach simplify the implementation of this algorithm in current vehicles. The second phase consisted in the classification of a set of principles that allow the design of the adaptive infotainment system based on the complexity of the road. These principles are defined based on previous researches undertaken in the field of usability and user experience of graphical interfaces. According to these of principles, a real adaptive infotainment system with the most commonly used functionalities; navigation, radio and media was designed and integrated in a real vehicle. The developed system was able to adapt the presentation of the content according to the estimation of the driving complexity given by the block developed in phase one. The adaptive system was validated in real driving scenarios by several participants and results showed a high level of acceptance and satisfaction towards this adaptive infotainment. As a starting point for future research, a proof of concept was carried out to integrate new interfaces into a vehicle. The interface used as reference was a Head Mounted screen that offered redundant information in relation to the instrument cluster. Tests with participants served to understand how users perceive the introduction of new technologies and how objective benefits could be blurred by initial biases.El proyecto de investigación de doctorado se enmarca dentro del plan de doctorados industriales de la Generalitat de Catalunya. Durante la investigación, la mayor parte del trabajo se llevó a cabo en las instalaciones del fabricante de vehículos SEAT, específicamente en el departamento de información y entretenimiento (infotainment). Del mismo modo, hubo una cooperación continua con el departamento de telemática de la UPC. El objetivo principal del proyecto consistió en el diseño y la validación de un sistema de información y entretenimiento adaptativo que se ajustaba de acuerdo a la complejidad de la conducción. El sistema fue creado con el propósito de aumentar la experiencia del conductor y garantizar un nivel adecuado en la seguridad vial. El proyecto surge dado el número creciente de aplicaciones y servicios disponibles en los sistemas actuales de información y entretenimiento; es por ello que se hace necesario contar con un sistema capaz de equilibrar estas dos contrapartes. Los parámetros más relevantes que se pueden usar para equilibrar estas métricas durante la conducción son: el tipo de servicios ofrecidos, las interfaces disponibles para interactuar con los servicios, la complejidad de la conducción y el perfil del conductor. El presente estudio se puede dividir en dos fases principales de desarrollo, cada fase tuvo como resultado un componente que se convirtió en parte del sistema final. El sistema final fue integrado en un vehículo y validado en condiciones reales de conducción. La primera fase consistió en la creación de un modelo capaz de estimar la complejidad de la conducción en base a un conjunto de variables relacionadas con la conducción. El modelo se construyó empleando "Machine Learning Methods" y el conjunto de datos necesario para crearlo se recopiló a partir de varias rutas de conducción realizadas por diferentes participantes. Esta fase permitió crear un modelo capaz de estimar, con una precisión satisfactoria, la complejidad de la carretera utilizando variables fácilmente extraíbles en cualquier vehículo moderno. Este enfoque simplifica la implementación de este algoritmo en los vehículos actuales. La segunda fase consistió en la clasificación de un conjunto de principios que permiten el diseño del sistema de información y entretenimiento adaptativo basado en la complejidad de la carretera. Estos principios se definen en base a investigaciones anteriores realizadas en el campo de usabilidad y experiencia del usuario con interfaces gráficas. De acuerdo con estos principios, un sistema de entretenimiento y entretenimiento real integrando las funcionalidades más utilizadas; navegación, radio y audio fue diseñado e integrado en un vehículo real. El sistema desarrollado pudo adaptar la presentación del contenido según la estimación de la complejidad de conducción dada por el bloque desarrollado en la primera fase. El sistema adaptativo fue validado en escenarios de conducción reales por varios participantes y los resultados mostraron un alto nivel de aceptación y satisfacción hacia este entretenimiento informativo adaptativo. Como punto de partida para futuras investigaciones, se llevó a cabo una prueba de concepto para integrar nuevas interfaces en un vehículo. La interfaz utilizada como referencia era una pantalla a la altura de los ojos (Head Mounted Display) que ofrecía información redundante en relación con el grupo de instrumentos. Las pruebas con los participantes sirvieron para comprender cómo perciben los usuarios la introducción de nuevas tecnologías y cómo los sesgos iniciales podrían difuminar los beneficios

    Open Source Workflow Engine for Cheminformatics: From Data Curation to Data Analysis

    Get PDF
    The recent release of large open access chemistry databases into the public domain generates a demand for flexible tools to process them so as to discover new knowledge. To support Open Drug Discovery and Open Notebook Science on top of these data resources, is it desirable for the processing tools to be Open Source and available to everyone. The aim of this project was the development of an Open Source workflow engine to solve crucial cheminformatics problems. As a consequence, the CDK-Taverna project developed in the course of this thesis builds a cheminformatics workflow solution through the combination of different Open Source projects such as Taverna (workflow engine), the Chemistry Development Kit (CDK, cheminformatics library) and Pgchem::Tigress (chemistry database cartridge). The work on this project includes the implementation of over 160 different workers, which focus on cheminformatics tasks. The application of the developed methods to real world problems was the final objective of the project. The validation of Open Source software libraries and of chemical data derived from different databases is mandatory to all cheminformatics workflows. Methods to detect the atom types of chemical structures were used to validate the atom typing of the Chemistry Development Kit and to identify curation problems while processing different public databases, including the EBI drug databases ChEBI and ChEMBL as well as the natural products Chapman & Hall Chemical Database. The CDK atom typing shows a lack on atom types of heavier atoms but fits the need of databases containing organic substances including natural products. To support combinatorial chemistry an implementation of a reaction enumeration workflow was realized. It is based on generic reactions with lists of reactants and allows the generation of chemical libraries up to O(1000) molecules. Supervised machine learning techniques (perceptron-type artificial neural networks and support vector machines) were used as a proof of concept for quantitative modelling of adhesive polymer kinetics with the Mathematica GNWI.CIP package. This opens the perspective of an integration of high-level "experimental mathematics" into the CDK-Taverna based scientific pipelining. A chemical diversity analysis based on two different public and one proprietary databases including over 200,000 molecules was a large-scale application of the methods developed. For the chemical diversity analysis different molecular properties are calculated using the Chemistry Development Kit. The analysis of these properties was performed with Adaptive-Resonance-Theory (ART 2-A algorithm) for an automatic unsupervised classification of open categorical problems. The result shows a similar coverage of the chemical space of the two databases containing natural products (one public, one proprietary) whereas the ChEBI database covers a distinctly different chemical space. As a consequence these comparisons reveal interesting white-spots in the proprietary database. The combination of these results with pharmacological annotations of the molecules leads to further research and modelling activities
    corecore