108 research outputs found

    IoT Data Imputation with Incremental Multiple Linear Regression

    Get PDF
    In this paper, we address the problem related to missing data imputation in the IoT domain. More specifically, we propose an Incremental Space-Time-based model (ISTM) for repairing missing values in IoT real-time data streams. ISTM is based on Incremental Multiple Linear Regression, which processes data as follows: Upon data arrival, ISTM updates the model after reading again the intermediary data matrix instead of accessing all historical information. If a missing value is detected, ISTM will provide an estimation for the missing value based on nearly historical data and the observations of neighboring sensors of the default one. Experiments conducted with real traffic data show the performance of ISTM in comparison with known techniques

    Online Machine Learning for Inference from Multivariate Time-series

    Get PDF
    Inference and data analysis over networks have become significant areas of research due to the increasing prevalence of interconnected systems and the growing volume of data they produce. Many of these systems generate data in the form of multivariate time series, which are collections of time series data that are observed simultaneously across multiple variables. For example, EEG measurements of the brain produce multivariate time series data that record the electrical activity of different brain regions over time. Cyber-physical systems generate multivariate time series that capture the behaviour of physical systems in response to cybernetic inputs. Similarly, financial time series reflect the dynamics of multiple financial instruments or market indices over time. Through the analysis of these time series, one can uncover important details about the behavior of the system, detect patterns, and make predictions. Therefore, designing effective methods for data analysis and inference over networks of multivariate time series is a crucial area of research with numerous applications across various fields. In this Ph.D. Thesis, our focus is on identifying the directed relationships between time series and leveraging this information to design algorithms for data prediction as well as missing data imputation. This Ph.D. thesis is organized as a compendium of papers, which consists of seven chapters and appendices. The first chapter is dedicated to motivation and literature survey, whereas in the second chapter, we present the fundamental concepts that readers should understand to grasp the material presented in the dissertation with ease. In the third chapter, we present three online nonlinear topology identification algorithms, namely NL-TISO, RFNL-TISO, and RFNL-TIRSO. In this chapter, we assume the data is generated from a sparse nonlinear vector autoregressive model (VAR), and propose online data-driven solutions for identifying nonlinear VAR topology. We also provide convergence guarantees in terms of dynamic regret for the proposed algorithm RFNL-TIRSO. Chapters four and five of the dissertation delve into the issue of missing data and explore how the learned topology can be leveraged to address this challenge. Chapter five is distinct from other chapters in its exclusive focus on edge flow data and introduces an online imputation strategy based on a simplicial complex framework that leverages the known network structure in addition to the learned topology. Chapter six of the dissertation takes a different approach, assuming that the data is generated from nonlinear structural equation models. In this chapter, we propose an online topology identification algorithm using a time-structured approach, incorporating information from both the data and the model evolution. The algorithm is shown to have convergence guarantees achieved by bounding the dynamic regret. Finally, chapter seven of the dissertation provides concluding remarks and outlines potential future research directions.publishedVersio

    Support matrix machine: A review

    Full text link
    Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm

    Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data

    Get PDF
    In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data

    Personalized data analytics for internet-of-things-based health monitoring

    Get PDF
    The Internet-of-Things (IoT) has great potential to fundamentally alter the delivery of modern healthcare, enabling healthcare solutions outside the limits of conventional clinical settings. It can offer ubiquitous monitoring to at-risk population groups and allow diagnostic care, preventive care, and early intervention in everyday life. These services can have profound impacts on many aspects of health and well-being. However, this field is still at an infancy stage, and the use of IoT-based systems in real-world healthcare applications introduces new challenges. Healthcare applications necessitate satisfactory quality attributes such as reliability and accuracy due to their mission-critical nature, while at the same time, IoT-based systems mostly operate over constrained shared sensing, communication, and computing resources. There is a need to investigate this synergy between the IoT technologies and healthcare applications from a user-centered perspective. Such a study should examine the role and requirements of IoT-based systems in real-world health monitoring applications. Moreover, conventional computing architecture and data analytic approaches introduced for IoT systems are insufficient when used to target health and well-being purposes, as they are unable to overcome the limitations of IoT systems while fulfilling the needs of healthcare applications. This thesis aims to address these issues by proposing an intelligent use of data and computing resources in IoT-based systems, which can lead to a high-level performance and satisfy the stringent requirements. For this purpose, this thesis first delves into the state-of-the-art IoT-enabled healthcare systems proposed for in-home and in-hospital monitoring. The findings are analyzed and categorized into different domains from a user-centered perspective. The selection of home-based applications is focused on the monitoring of the elderly who require more remote care and support compared to other groups of people. In contrast, the hospital-based applications include the role of existing IoT in patient monitoring and hospital management systems. Then, the objectives and requirements of each domain are investigated and discussed. This thesis proposes personalized data analytic approaches to fulfill the requirements and meet the objectives of IoT-based healthcare systems. In this regard, a new computing architecture is introduced, using computing resources in different layers of IoT to provide a high level of availability and accuracy for healthcare services. This architecture allows the hierarchical partitioning of machine learning algorithms in these systems and enables an adaptive system behavior with respect to the user's condition. In addition, personalized data fusion and modeling techniques are presented, exploiting multivariate and longitudinal data in IoT systems to improve the quality attributes of healthcare applications. First, a real-time missing data resilient decision-making technique is proposed for health monitoring systems. The technique tailors various data resources in IoT systems to accurately estimate health decisions despite missing data in the monitoring. Second, a personalized model is presented, enabling variations and event detection in long-term monitoring systems. The model evaluates the sleep quality of users according to their own historical data. Finally, the performance of the computing architecture and the techniques are evaluated in this thesis using two case studies. The first case study consists of real-time arrhythmia detection in electrocardiography signals collected from patients suffering from cardiovascular diseases. The second case study is continuous maternal health monitoring during pregnancy and postpartum. It includes a real human subject trial carried out with twenty pregnant women for seven months

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Probabilistic models for human behavior learning

    Get PDF
    The problem of human behavior learning is a popular interdisciplinary research topic that has been explored from multiple perspectives, with a principal branch of study in the context of computer vision systems and activity recognition. However, the statistical methods used in these frameworks typically assume short time scales, usually of minutes or even seconds. The emergence of mobile electronic devices, such as smartphones and wearables, has changed this paradigm as long as we are now able to massively collect digital records from users. This collection of smartphone-generated data, whose attributes are obtained in an unobtrusive manner from the devices via multiple sensors and apps, shape the behavioral footprint that is unique for everyone of us. At an individual level, the data projection also di ers from person to person, as not all sensors are equal, neither the apps installed, or the devices used in the real life. This point actually reflects that learning the human behavior from the digital signature of users is an arduous task, that requires to fuse irregular data. For instance, collections of samples that are corrupted, heterogeneous, outliers or have shortterm correlations. The statistical modelling of this sort of objects is one of the principal contributions of this thesis, that we study from the perspective of Gaussian processes (gp). In the particular case of humans, as well as many other life species in our world, we are inherently conditioned to the diurnal and nocturnal cycles that everyday shape our behavior, and hence, our data. We can study these cycles in our behavioral representation to see that there exists a perpetual circadian rhytm in everyone of us. This tempo is the 24h periodic component that shapes the baseline temporal structure of our behavior, not the particular patterns that change for every person. Looking to the trajectories and variabilities that our behavior may take in the data, we can appreciate that there is not a single repetitive behavior. Instead, there are typically several patterns or routines, sampled from our own dictionary, that we choose for every special situation. At the same time, these routines are arbitrary combinations of di erents timescales, correlations, levels of mobility, social interaction, sleep quality or will for working during the same hours on weekdays. Together, the properties of human behavior already indicate to us how we shall proceed to model its structure, not as unique functions, but as a dictionary of latent behavioral profiles. To discover them, we have considered latent variable models. The main application of the statistical methods developed for human behavior learning appears as we look to medicine. Having a personalized model that is accurately fitted to the behavioral patterns of some patient of interest, sudden changes in them could be early indicators of future relapses. From a technical point of view, the traditional question use to be if newer observations conform or not to the expected behavior indicated by the already fitted model. The problem can be analyzed from two perspectives that are interrelated, one more oriented to the characterization of that single object as outlier, typically named as anomaly detection, and another focused in refreshing the learning model if no longer fits to the new sequential data. This last problem, widely known as change-point detection (cpd) is another pillar of this thesis. These methods are oriented to mental health applications, and particularly to the passive detection of crisis events. The final goal is to provide an early detection methodology based on probabilistic modeling for early intervention, e.g. prevent suicide attempts, on psychiatric outpatients with severe a ective disorders of higher prevalence, such as depression or bipolar diseases.El problema de aprendizaje del comportamiento humano es un tema de investigación interdisciplinar que ha sido explorado desde múltiples perspectivas, con una línea de estudio principal en torno a los sistemas de visión por ordenador y el reconocimiento de actividades. Sin embargo, los métodos estadísticos usados en estos casos suelen asumir escalas de tiempo cortas, generalmente de minutos o incluso segundos. La aparición de tecnologías móviles, tales como teléfonos o relojes inteligentes, ha cambiado este paradigma, dado que ahora es posible recolectar ingentes colecciones de datos a partir de los usuarios. Este conjunto de datos generados a partir de nuestro teléfono, cuyos atributos se obtienen de manera no invasiva desde múltiples sensores y apps, conforman la huella de comportamiento que es única para cada uno de nosotros. A nivel individual, la proyección sobre los datos difiere de persona a persona, dado que no todos los sensores son iguales, ni las apps instaladas así como los dispositivos utilizados en la vida real. Esto precisamente refleja que el aprendizaje del comportamiento humano a partir de la huella digital de los usuarios es una ardua tarea, que requiere principalmente fusionar datos irregulares. Por ejemplo, colecciones de muestras corruptas, heterogéneas, con outliers o poseedoras de correlaciones cortas. El modelado estadístico de este tipo de objetos es una de las contribuciones principales de esta tesis, que estudiamos desde la perspectiva de los procesos Gaussianos (gp). En el caso particular de los humanos, así como para muchas otras especies en nuestro planeta, estamos inherentemente condicionados a los ciclos diurnos y nocturnos que cada día dan forma a nuestro comportamiento, y por tanto, a nuestros datos. Podemos estudiar estos ciclos en la representación del comportamiento que obtenemos y ver que realmente existe un ritmo circadiano perpetuo en cada uno de nosotros. Este tempo es en realidad la componente periódica de 24 horas que construye la base sobre la que se asienta nuestro comportamiento, no únicamente los patrones que cambian para cada persona. Mirando a las trayectorias y variabilidades que nuestro comportamiento puede plasmar en los datos, podemos apreciar que no existe un comportamiento único y repetitivo. En su lugar, hay varios patrones o rutinas, obtenidas de nuestro propio diccionario, que elegimos para cada situación especial. Al mismo tiempo, estas rutinas son combinaciones arbitrarias de diferentes escalas de tiempo, correlaciones, niveles de movilidad, interacción social, calidad del sueño o iniciativa para trabajar durante las mismas horas cada día laborable. Juntas, estas propiedades del comportamiento humano nos indican como debemos proceder a modelar su estructura, no como funciones únicas, sino como un diccionario de perfiles ocultos de comportamiento, Para descubrirlos, hemos considerado modelos de variables latentes. La aplicación principal de los modelos estadísticos desarrollados para el aprendizaje de comportamiento humano aparece en cuanto miramos a la medicina. Teniendo un modelo personalizado que está ajustado de una manera precisa a los patrones de comportamiento de un paciente, los cambios espontáneos en ellos pueden ser indicadores de futuras recaídas. Desde un punto de vista técnico, la pregunta clásica suele ser si nuevas observaciones encajan o no con lo indicado por el modelo. Este problema se puede enfocar desde dos perspectivas que están interrelacionadas, una más orientada a la caracterización de aquellos objetos como outliers, que usualmente se conoce como detección de anomalías, y otro enfocado en refrescar el modelo de aprendizaje si este deja de ajustarse debidamente a los nuevos datos secuenciales. Este último problema, ampliamente conocido como detección de puntos de cambio (cpd) es otro de los pilares de esta tesis. Estos métodos se han orientado a aplicaciones de salud mental, y particularmente, a la detección pasiva de eventos críticos. El objetivo final es proveer de una metodología de detección temprana basada en el modelado probabilístico para intervenciones rápidas. Por ejemplo, de cara a prever intentos de suicidio en pacientes fuera de hospitales con trastornos afectivos severos de gran prevalencia, como depresión o síndrome bipolar.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Pablo Martínez Olmos.- Secretario: Daniel Hernández Lobato.- Vocal: Javier González Hernánde
    • …
    corecore