4,800 research outputs found

    Discovering human activities from binary data in smart homes

    Get PDF
    With the rapid development in sensing technology, data mining, and machine learning fields for human health monitoring, it became possible to enable monitoring of personal motion and vital signs in a manner that minimizes the disruption of an individual’s daily routine and assist individuals with difficulties to live independently at home. A primary difficulty that researchers confront is acquiring an adequate amount of labeled data for model training and validation purposes. Therefore, activity discovery handles the problem that activity labels are not available using approaches based on sequence mining and clustering. In this paper, we introduce an unsupervised method for discovering activities from a network of motion detectors in a smart home setting. First, we present an intra-day clustering algorithm to find frequent sequential patterns within a day. As a second step, we present an inter-day clustering algorithm to find the common frequent patterns between days. Furthermore, we refine the patterns to have more compressed and defined cluster characterizations. Finally, we track the occurrences of various regular routines to monitor the functional health in an individual’s patterns and lifestyle. We evaluate our methods on two public data sets captured in real-life settings from two apartments during seven-month and three-month periods

    A survey on online active learning

    Full text link
    Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

    Supervised cross-modal factor analysis for multiple modal data classification

    Full text link
    In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods

    Deep Learning in Cardiology

    Full text link
    The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Sensor-based datasets for human activity recognition - a systematic review of literature

    Get PDF
    The research area of ambient assisted living has led to the development of activity recognition systems (ARS) based on human activity recognition (HAR). These systems improve the quality of life and the health care of the elderly and dependent people. However, before making them available to end users, it is necessary to evaluate their performance in recognizing activities of daily living, using data set benchmarks in experimental scenarios. For that reason, the scientific community has developed and provided a huge amount of data sets for HAR. Therefore, identifying which ones to use in the evaluation process and which techniques are the most appropriate for prediction of HAR in a specific context is not a trivial task and is key to further progress in this area of research. This work presents a systematic review of the literature of the sensor-based data sets used to evaluate ARS. On the one hand, an analysis of different variables taken from indexed publications related to this field was performed. The sources of information are journals, proceedings, and books located in specialized databases. The analyzed variables characterize publications by year, database, type, quartile, country of origin, and destination, using scientometrics, which allowed identification of the data set most used by researchers. On the other hand, the descriptive and functional variables were analyzed for each of the identified data sets: occupation, annotation, approach, segmentation, representation, feature selection, balancing and addition of instances, and classifier used for recognition. This paper provides an analysis of the sensor-based data sets used in HAR to date, identifying the most appropriate dataset to evaluate ARS and the classification techniques that generate better results

    Sensor-based datasets for human activity recognition - a systematic review of literature

    Get PDF
    The research area of ambient assisted living has led to the development of activity recognition systems (ARS) based on human activity recognition (HAR). These systems improve the quality of life and the health care of the elderly and dependent people. However, before making them available to end users, it is necessary to evaluate their performance in recognizing activities of daily living, using data set benchmarks in experimental scenarios. For that reason, the scientific community has developed and provided a huge amount of data sets for HAR. Therefore, identifying which ones to use in the evaluation process and which techniques are the most appropriate for prediction of HAR in a specific context is not a trivial task and is key to further progress in this area of research. This work presents a systematic review of the literature of the sensor-based data sets used to evaluate ARS. On the one hand, an analysis of different variables taken from indexed publications related to this field was performed. The sources of information are journals, proceedings, and books located in specialized databases. The analyzed variables characterize publications by year, database, type, quartile, country of origin, and destination, using scientometrics, which allowed identification of the data set most used by researchers. On the other hand, the descriptive and functional variables were analyzed for each of the identified data sets: occupation, annotation, approach, segmentation, representation, feature selection, balancing and addition of instances, and classifier used for recognition. This paper provides an analysis of the sensor-based data sets used in HAR to date, identifying the most appropriate dataset to evaluate ARS and the classification techniques that generate better results
    corecore