Search CORE

1,249 research outputs found

Balancing Exploration and Exploitation: A New Algorithm for Active Machine Learning

Author: Kun Deng
Osugi Thomas
Scott Stephen
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2005
Field of study

Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise

DigitalCommons@University of Nebraska

Advanced physics-based and data-driven strategies

Author: Ibáñez Pinillo Rubén
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Simulation Based Engineering Science (SBES) has brought major improvements in optimization, control and inverse analysis, all leading to a deeper understanding in many processes occuring in the real world. These noticeable breakthroughts are present in a vast variety of sectors such as aeronautic or automotive industries, mobile telecommunications or healthcare among many other fields. Nevertheless, SBES is currently confronting several difficulties to provide accurate results in complex industrial problems. Apart from the high computational costs associated with industrial applications, the errors introduced by constitutive modeling become more and more important when dealing with new materials. Concurrently, an unceasingly growing interest in concepts such as Big-Data, Machine Learning or Data-Analytics has been experienced. Indeed, this interest is intrinsically motivated by an exhaustive development in both data-acquisition and data-storage systems. For instance, an aircraft may produce over 500 GB of data during a single flight. This panorama brings a perfect opportunity to the so-called Dynamic Data Driven Application Systems (DDDAS), whose main objective is to merge classical simulation algorithms with data coming from experimental measures in a dynamic way. Within this scenario, data and simulations would no longer be uncoupled but rather a symbiosis that is to be exploited would achieve milestones which were inconceivable until these days. Indeed, data will no longer be understood as a static calibration of a given constitutive model but rather the model will be corrected dynamicly as soon as experimental data and simulations tend to diverge. Several numerical algorithms will be presented throughout this manuscript whose main objective is to strengthen the link between data and computational mechanics. The first part of the thesis is mainly focused on parameter identification, data-driven and data completion techniques. The second part is focused on Model Order Reduction (MOR) techniques, since they constitute a fundamental ally to achieve real time constraints arising from DDDAS framework.La Ciencia de la Ingeniería Basada en la Simulación (SBES) ha aportado importantes mejoras en la optimización, el control y el análisis inverso, todo lo cual ha llevado a una comprensión más profunda de muchos de los procesos que ocurren en el mundo real. Estos notables avances están presentes en una gran variedad de sectores como la industria aeronáutica o automotriz, las telecomunicaciones móviles o la salud, entre muchos otros campos. Sin embargo, SBES se enfrenta actualmente a varias dificultades para proporcionar resultados precisos en problemas industriales complejos. Aparte de los altos costes computacionales asociados a las aplicaciones industriales, los errores introducidos por el modelado constitutivo son cada vez más importantes a la hora de tratar con nuevos materiales. Al mismo tiempo, se ha experimentado un interés cada vez mayor en conceptos como Big-Data, Machine Learning o Data-Analytics. Ciertamente, este interés está intrínsecamente motivado por un desarrollo exhaustivo de los sistemas de adquisición y almacenamiento de datos. Por ejemplo, una aeronave puede producir más de 500 GB de datos durante un solo vuelo. Este panorama brinda una oportunidad perfecta a los denominados Sistemas de Aplicación Dinámicos Impulsados por Datos (DDDAS), cuyo objetivo principal es fusionar de forma dinámica los algoritmos clásicos de simulación con los datos procedentes de medidas experimentales. En este escenario, los datos y las simulaciones ya no se desacoplarían, sino que aprovechando una simbiosis se alcanzaría hitos que hasta ahora eran inconcebibles. Mas en detalle, los datos ya no se entenderán como una calibración estática de un modelo constitutivo dado, sino que el modelo se corregirá dinámicamente tan pronto como los datos experimentales y las simulaciones tiendan a diverger. A lo largo de este manuscrito se presentarán varios algoritmos numéricos cuyo objetivo principal es fortalecer el vínculo entre los datos y la mecánica computacional. La primera parte de la tesis se centra principalmente en técnicas de identificación de parámetros, basadas en datos y de compleción de datos. La segunda parte se centra en las técnicas de Reducción de Modelo (MOR), ya que constituyen un aliado fundamental para conseguir las restricciones de tiempo real derivadas del marco DDDAS.Les sciences de l'ingénieur basées sur la simulation (Simulation Based Engineering Science, SBES) ont apporté des améliorations majeures dans l'optimisation, le contrôle et l'analyse inverse, menant toutes à une meilleure compréhension de nombreux processus se produisant dans le monde réel. Ces percées notables sont présentes dans une grande variété de secteurs tels que l'aéronautique ou l'automobile, les télécommunications mobiles ou la santé, entre autres. Néanmoins, les SBES sont actuellement confrontées à plusieurs dificultés pour fournir des résultats précis dans des problèmes industriels complexes. Outre les coûts de calcul élevés associés aux applications industrielles, les erreurs introduites par la modélisation constitutive deviennent de plus en plus importantes lorsqu'il s'agit de nouveaux matériaux. Parallèlement, un intérêt sans cesse croissant pour des concepts tels que les données massives (big data), l'apprentissage machine ou l'analyse de données a été constaté. En effet, cet intérêt est intrinsèquement motivé par un développement exhaustif des systèmes d'acquisition et de stockage de données. Par exemple, un avion peut produire plus de 500 Go de données au cours d'un seul vol. Ce panorama apporte une opportunité parfaite aux systèmes d'application dynamiques pilotés par les données (Dynamic Data Driven Application Systems, DDDAS), dont l'objectif principal est de fusionner de manière dynamique des algorithmes de simulation classiques avec des données provenant de mesures expérimentales. Dans ce scénario, les données et les simulations ne seraient plus découplées, mais une symbiose à exploiter permettrait d'envisager des situations jusqu'alors inconcevables. En effet, les données ne seront plus comprises comme un étalonnage statique d'un modèle constitutif donné mais plutôt comme une correction dynamique du modèle dès que les données expérimentales et les simulations auront tendance à diverger. Plusieurs algorithmes numériques seront présentés tout au long de ce manuscrit dont l'objectif principal est de renforcer le lien entre les données et la mécanique computationnelle. La première partie de la thèse est principalement axée sur l'identification des paramètres, les techniques d'analyse des données et les techniques de complétion de données. La deuxième partie est axée sur les techniques de réduction de modèle (MOR), car elles constituent un allié fondamental pour satisfaire les contraintes temps réel découlant du cadre DDDAS

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies

Author: De Deuge Mark
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2016
Field of study

Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot

Sydney eScholarship

The characterisation and simulation of 3D vision sensors for measurement optimisation

Author: John Hodgson (5214515)
Publication venue
Publication date: 01/01/2018
Field of study

The use of 3D Vision is becoming increasingly common in a range of industrial applications including part identification, reverse engineering, quality control and inspection. To facilitate this increased usage, especially in autonomous applications such as free-form assembly and robotic metrology, the capability to deploy a sensor to the optimum pose for a measurement task is essential to reduce cycle times and increase measurement quality. Doing so requires knowledge of the 3D sensor capabilities on a material specific basis, as the optical properties of a surface, object shape, pose and even the measurement itself have severe implications for the data quality. This need is not reflected in the current state of sensor haracterisation standards which commonly utilise optically compliant artefacts and therefore can not inform the user of a 3D sensor the realistic expected performance on non-ideal objects.This thesis presents a method of scoring candidate viewpoints for their ability to perform geometric measurements on an object of arbitrary surface finish. This is achieved by first defining a technology independent, empirical sensor characterisation method which implements a novel variant of the commonly used point density point cloud quality metric, which is normalised to isolate the effect of surface finish on sensor performance, as well as the more conventional assessment of point standard deviation. The characterisation method generates a set of performance maps for a sensor per material which are a function of distance and surface orientation. A sensor simulation incorporates these performance maps to estimate the statistical properties of a point cloud on objects with arbitrary shape and surface finish, providing the sensor has been characterised on the material in question.A framework for scoring measurement specific candidate viewpoints is presented in the context of the geometric inspection of four artefacts with different surface finish but identical geometry. Views are scored on their ability to perform each measurement based on a novel view score metric, which incorporates the expected point density, noise and occlusion of measurement dependent model features. The simulation is able to score the views reliably on all four surface finishes tested, which range from ideal matt white to highly polished aluminium. In 93% of measurements, a set of optimal or nearly optimal views is correctly selected.</div

Loughborough University Institutional Repository

Application of Surrogate Based Optimisation in the Design of Automotive Body Structures

Author: Fang Jianguang
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2016
Field of study

The rapid development of automotive industry requires manufacturers to continuously reduce the development cost and time and to enhance the product quality. Thus, modern automotive design pays more attention to using CAE analysis based optimisation techniques to drive the entire design flow. This thesis focuses on the optimisation design to improve the automotive crashworthiness and fatigue performances, aiming to enhance the optimisation efficiency, accuracy, reliability, and robustness etc. The detailed contents are as follows: (1) To excavate the potential of crash energy absorbers, the concept of functionally graded structure was introduced and multiobjective designs were implemented to this novel type of structures. First, note that the severe deformation takes place in the tubal corners, multi-cell tubes with a lateral thickness gradient were proposed to better enhance the crashworthiness. The results of crashworthiness analyses and optimisation showed that these functionally graded multi-cell tubes are preferable to a uniform multi-cell tube. Then, functionally graded foam filled tubes with different gradient patterns were analyzed and optimized subject to lateral impact and the results demonstrated that these structures can still behave better than uniform foam filled structures under lateral loading, which will broaden the application scope of functionally graded structures. Finally, dual functionally graded structures, i.e. functionally graded foam filled tubes with functionally graded thickness walls, were proposed and different combinations of gradients were compared. The results indicated that placing more material to tubal corners and the maximum density to the outmost layer are beneficial to achieve the best performance. (2) To make full use of training data, multiple ensembles of surrogate models were proposed to maximize the fatigue life of a truck cab, while the panel thicknesses were taken as design variables and the structural mass the constraint. Meanwhile, particle swarm optimisation was integrated with sequential quadratic programming to avoid the premature convergence. The results illustrated that the hybrid particle swarm optimisation and ensembles of surrogates enable to attain a more competent solution for fatigue optimisation. (3) As the conventional surrogate based optimisation largely depends on the number of initial sample data, sequential surrogate modeling was proposed to practical applications in automotive industry. (a) To maximize the fatigue life of spot-welded joints, an expected improvement based sequential surrogate modeling method was utilized. The results showed that by using this method the performance can be significantly improved with only a relatively small number of finite element analyses. (c) A multiojective sequential surrogate modeling method was proposed to address a multiobjective optimisation of a foam-filled double cylindrical structure. By adding the sequential points and updating the Kriging model adaptively, more accurate Pareto solutions are generated. (4) While various uncertainties are inevitably present in real-life optimisations, conventional deterministic optimisations could probably lead to the violation of constraints and the instability of performances. Therefore, nondeterministic optimisation methods were introduced to solve the automotive design problems. (a) A multiobjective reliability-based optimisation for design of a door was investigated. Based on analysis and design responses surface models, the structural mass was minimized and the vertical sag stiffness was maximized subjected to the probabilistic constraint. The results revealed that the Pareto frontier is divided into the sensitive region and insensitive region with respect to uncertainties, and the decision maker is recommended to select a solution from the insensitive region. Furthermore, the reduction of uncertainties can help improve the reliability but will increase the manufacturing cost, and the tradeoff between the reliability target and performance should be made. (b) A multiobjective uncertain optimisation of the foam-filled double cylindrical structure was conducted by considering randomness in the foam density and wall thicknesses. Multiobjective particle swarm optimisation and Monte Carlo simulation were integrated into the optimisation. The results proved that while the performances of the objectives are sacrificed slightly, the nondeterministic optimisation can enhance the robustness of the objectives and maintain the reliability of the constraint. (c) A multiobjective robust optimisation of the truck cab was performed by considering the uncertainty in material properties. The general version of dual response surface model, namely dual surrogate model, was proposed to approximate the means and standard deviations of the performances. Then, the multiobjective particle optimisation was used to generate the well-distributed Pareto frontier. Finally, a hybrid multi-criteria decision making model was proposed to select the best compromise solution considering both the fatigue performance and its robustness. During this PhD study, the following ideas are considered innovative: (1) Surrogate modeling and multiobjective optimisation were integrated to address the design problems of novel functionally graded structures, aiming to develop more advanced automotive energy absorbers. (2) The ensembles of surrogates and hybrid particle swarm optimisation were proposed for the design of a truck cab, which could make full use of training points and has a strong searching capacity. (3) Sequential surrogate modeling methods were introduced to several optimisation problems in the automotive industry so that the optimisations are less dependent on the number of initial training points and both the efficiency and accuracy are improved. (4) The surrogate based optimisation method was implemented to address various uncertainties in real life applications. Furthermore, a hybrid multi-criteria decision making model was proposed to make the best compromise between the performance and robustness

Sydney eScholarship

Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The book documents 25 papers collected from the Special Issue “Advances in Condition Monitoring, Optimization and Control for Complex Industrial Processes”, highlighting recent research trends in complex industrial processes. The book aims to stimulate the research field and be of benefit to readers from both academic institutes and industrial sectors

Directory of Open Access Books (DOAB)

Minding the gap between labor and educational markets by enhancing standards with semantic web techniques

Author: Papoutsoglou Iordanis
Publication venue
Publication date: 08/04/2019
Field of study

International Hellenic University: IHU Open Access Repository

A review of laser scanning for geological and geotechnical applications in underground mining

Author: Banerjee Bikram Pratap
Raval Simit
Singh Sarvesh Kumar
Publication venue
Publication date: 20/11/2022
Field of study

Laser scanning can provide timely assessments of mine sites despite adverse challenges in the operational environment. Although there are several published articles on laser scanning, there is a need to review them in the context of underground mining applications. To this end, a holistic review of laser scanning is presented including progress in 3D scanning systems, data capture/processing techniques and primary applications in underground mines. Laser scanning technology has advanced significantly in terms of mobility and mapping, but there are constraints in coherent and consistent data collection at certain mines due to feature deficiency, dynamics, and environmental influences such as dust and water. Studies suggest that laser scanning has matured over the years for change detection, clearance measurements and structure mapping applications. However, there is scope for improvements in lithology identification, surface parameter measurements, logistic tracking and autonomous navigation. Laser scanning has the potential to provide real-time solutions but the lack of infrastructure in underground mines for data transfer, geodetic networking and processing capacity remain limiting factors. Nevertheless, laser scanners are becoming an integral part of mine automation thanks to their affordability, accuracy and mobility, which should support their widespread usage in years to come

arXiv.org e-Print Archive

Advanced physics-based and data-driven strategies

Author: Ibáñez Pinillo Rubén
Publication venue: Universitat Politècnica de Catalunya
Publication date: 02/09/2019
Field of study

Cotutela: Universitat Politècnica de Catalunya i École Centrale de Nantes.Premi Extraordinari de Doctorat, promoció 2018-2019. Àmbit d’Enginyeria Civil i AmbientalSimulation Based Engineering Science (SBES) has brought major improvements in optimization, control and inverse analysis, all leading to a deeper understanding in many processes occuring in the real world. These noticeable breakthroughts are present in a vast variety of sectors such as aeronautic or automotive industries, mobile telecommunications or healthcare among many other fields. Nevertheless, SBES is currently confronting several difficulties to provide accurate results in complex industrial problems. Apart from the high computational costs associated with industrial applications, the errors introduced by constitutive modeling become more and more important when dealing with new materials. Concurrently, an unceasingly growing interest in concepts such as Big-Data, Machine Learning or Data-Analytics has been experienced. Indeed, this interest is intrinsically motivated by an exhaustive development in both data-acquisition and data-storage systems. For instance, an aircraft may produce over 500 GB of data during a single flight. This panorama brings a perfect opportunity to the so-called Dynamic Data Driven Application Systems (DDDAS), whose main objective is to merge classical simulation algorithms with data coming from experimental measures in a dynamic way. Within this scenario, data and simulations would no longer be uncoupled but rather a symbiosis that is to be exploited would achieve milestones which were inconceivable until these days. Indeed, data will no longer be understood as a static calibration of a given constitutive model but rather the model will be corrected dynamicly as soon as experimental data and simulations tend to diverge. Several numerical algorithms will be presented throughout this manuscript whose main objective is to strengthen the link between data and computational mechanics. The first part of the thesis is mainly focused on parameter identification, data-driven and data completion techniques. The second part is focused on Model Order Reduction (MOR) techniques, since they constitute a fundamental ally to achieve real time constraints arising from DDDAS framework.La Ciencia de la Ingeniería Basada en la Simulación (SBES) ha aportado importantes mejoras en la optimización, el control y el análisis inverso, todo lo cual ha llevado a una comprensión más profunda de muchos de los procesos que ocurren en el mundo real. Estos notables avances están presentes en una gran variedad de sectores como la industria aeronáutica o automotriz, las telecomunicaciones móviles o la salud, entre muchos otros campos. Sin embargo, SBES se enfrenta actualmente a varias dificultades para proporcionar resultados precisos en problemas industriales complejos. Aparte de los altos costes computacionales asociados a las aplicaciones industriales, los errores introducidos por el modelado constitutivo son cada vez más importantes a la hora de tratar con nuevos materiales. Al mismo tiempo, se ha experimentado un interés cada vez mayor en conceptos como Big-Data, Machine Learning o Data-Analytics. Ciertamente, este interés está intrínsecamente motivado por un desarrollo exhaustivo de los sistemas de adquisición y almacenamiento de datos. Por ejemplo, una aeronave puede producir más de 500 GB de datos durante un solo vuelo. Este panorama brinda una oportunidad perfecta a los denominados Sistemas de Aplicación Dinámicos Impulsados por Datos (DDDAS), cuyo objetivo principal es fusionar de forma dinámica los algoritmos clásicos de simulación con los datos procedentes de medidas experimentales. En este escenario, los datos y las simulaciones ya no se desacoplarían, sino que aprovechando una simbiosis se alcanzaría hitos que hasta ahora eran inconcebibles. Mas en detalle, los datos ya no se entenderán como una calibración estática de un modelo constitutivo dado, sino que el modelo se corregirá dinámicamente tan pronto como los datos experimentales y las simulaciones tiendan a diverger. A lo largo de este manuscrito se presentarán varios algoritmos numéricos cuyo objetivo principal es fortalecer el vínculo entre los datos y la mecánica computacional. La primera parte de la tesis se centra principalmente en técnicas de identificación de parámetros, basadas en datos y de compleción de datos. La segunda parte se centra en las técnicas de Reducción de Modelo (MOR), ya que constituyen un aliado fundamental para conseguir las restricciones de tiempo real derivadas del marco DDDAS.Les sciences de l'ingénieur basées sur la simulation (Simulation Based Engineering Science, SBES) ont apporté des améliorations majeures dans l'optimisation, le contrôle et l'analyse inverse, menant toutes à une meilleure compréhension de nombreux processus se produisant dans le monde réel. Ces percées notables sont présentes dans une grande variété de secteurs tels que l'aéronautique ou l'automobile, les télécommunications mobiles ou la santé, entre autres. Néanmoins, les SBES sont actuellement confrontées à plusieurs dificultés pour fournir des résultats précis dans des problèmes industriels complexes. Outre les coûts de calcul élevés associés aux applications industrielles, les erreurs introduites par la modélisation constitutive deviennent de plus en plus importantes lorsqu'il s'agit de nouveaux matériaux. Parallèlement, un intérêt sans cesse croissant pour des concepts tels que les données massives (big data), l'apprentissage machine ou l'analyse de données a été constaté. En effet, cet intérêt est intrinsèquement motivé par un développement exhaustif des systèmes d'acquisition et de stockage de données. Par exemple, un avion peut produire plus de 500 Go de données au cours d'un seul vol. Ce panorama apporte une opportunité parfaite aux systèmes d'application dynamiques pilotés par les données (Dynamic Data Driven Application Systems, DDDAS), dont l'objectif principal est de fusionner de manière dynamique des algorithmes de simulation classiques avec des données provenant de mesures expérimentales. Dans ce scénario, les données et les simulations ne seraient plus découplées, mais une symbiose à exploiter permettrait d'envisager des situations jusqu'alors inconcevables. En effet, les données ne seront plus comprises comme un étalonnage statique d'un modèle constitutif donné mais plutôt comme une correction dynamique du modèle dès que les données expérimentales et les simulations auront tendance à diverger. Plusieurs algorithmes numériques seront présentés tout au long de ce manuscrit dont l'objectif principal est de renforcer le lien entre les données et la mécanique computationnelle. La première partie de la thèse est principalement axée sur l'identification des paramètres, les techniques d'analyse des données et les techniques de complétion de données. La deuxième partie est axée sur les techniques de réduction de modèle (MOR), car elles constituent un allié fondamental pour satisfaire les contraintes temps réel découlant du cadre DDDAS.Award-winningPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Visualizing Profiles of Large Datasets of Weighted and Mixed Data

Author: Grané Chávez Aurea
Sow-Barry Alpha A.
Publication venue: 'MDPI AG'
Publication date: 02/04/2021
Field of study

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.This research was funded by the Spanish Ministry of Economy and Competitiveness, grant number MTM2014-56535-R; and the V Regional Plan for Scientific Research and Technological Innovation 2016-2020 of the Community of Madrid, an agreement with Universidad Carlos III de Madrid in the action of "Excellence for University Professors.

Multidisciplinary Digital Publishing Institute

Universidad Carlos III de Madrid e-Archivo