1,257 research outputs found

    Data Mining Techniques for Complex User-Generated Data

    Get PDF
    Nowadays, the amount of collected information is continuously growing in a variety of different domains. Data mining techniques are powerful instruments to effectively analyze these large data collections and extract hidden and useful knowledge. Vast amount of User-Generated Data (UGD) is being created every day, such as user behavior, user-generated content, user exploitation of available services and user mobility in different domains. Some common critical issues arise for the UGD analysis process such as the large dataset cardinality and dimensionality, the variable data distribution and inherent sparseness, and the heterogeneous data to model the different facets of the targeted domain. Consequently, the extraction of useful knowledge from such data collections is a challenging task, and proper data mining solutions should be devised for the problem under analysis. In this thesis work, we focus on the design and development of innovative solutions to support data mining activities over User-Generated Data characterised by different critical issues, via the integration of different data mining techniques in a unified frame- work. Real datasets coming from three example domains characterized by the above critical issues are considered as reference cases, i.e., health care, social network, and ur- ban environment domains. Experimental results show the effectiveness of the proposed approaches to discover useful knowledge from different domains

    T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data

    Get PDF
    The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events

    Discovering user mobility and activity in smart lighting environments

    Full text link
    "Smart lighting" environments seek to improve energy efficiency, human productivity and health by combining sensors, controls, and Internet-enabled lights with emerging ā€œInternet-of-Thingsā€ technology. Interesting and potentially impactful applications involve adaptive lighting that responds to individual occupants' location, mobility and activity. In this dissertation, we focus on the recognition of user mobility and activity using sensing modalities and analytical techniques. This dissertation encompasses prior work using body-worn inertial sensors in one study, followed by smart-lighting inspired infrastructure sensors deployed with lights. The first approach employs wearable inertial sensors and body area networks that monitor human activities with a user's smart devices. Real-time algorithms are developed to (1) estimate angles of excess forward lean to prevent risk of falls, (2) identify functional activities, including postures, locomotion, and transitions, and (3) capture gait parameters. Two human activity datasets are collected from 10 healthy young adults and 297 elder subjects, respectively, for laboratory validation and real-world evaluation. Results show that these algorithms can identify all functional activities accurately with a sensitivity of 98.96% on the 10-subject dataset, and can detect walking activities and gait parameters consistently with high test-retest reliability (p-value < 0.001) on the 297-subject dataset. The second approach leverages pervasive "smart lighting" infrastructure to track human location and predict activities. A use case oriented design methodology is considered to guide the design of sensor operation parameters for localization performance metrics from a system perspective. Integrating a network of low-resolution time-of-flight sensors in ceiling fixtures, a recursive 3D location estimation formulation is established that links a physical indoor space to an analytical simulation framework. Based on indoor location information, a label-free clustering-based method is developed to learn user behaviors and activity patterns. Location datasets are collected when users are performing unconstrained and uninstructed activities in the smart lighting testbed under different layout configurations. Results show that the activity recognition performance measured in terms of CCR ranges from approximately 90% to 100% throughout a wide range of spatio-temporal resolutions on these location datasets, insensitive to the reconfiguration of environment layout and the presence of multiple users.2017-02-17T00:00:00

    All in a twitter: Self-tuning strategies for a deeper understanding of a crisis tweet collection

    Get PDF
    Natural disasters have become more frequent during the past 20 years due to significant climate changes. These natural events are hotly debated on social networks like Twitter and a huge amount of short text messages are continuously and promptly exchanged with personal opinions, descriptions of the natural events and their corresponding consequences. The analysis of these large and complex data could help policy-makers to better understand the event as well as to set priorities. However, the correct configuration of the tweet mining process is still challenging due to variable data distribution and the availability of a large number of algorithms with different specific parameters. The analyst need to perform a large number of experiments to identify the best configuration for the overall knowledge discovery process. Innovative, scalable, and parameter-free solutions need to be explored to streamline the analytics process. This paper presents an enhanced version of PASTA (a distributed self-tuning engine) applied to a crisis tweet collection to group a corpus of tweets into cohesive and well-separated clusters with minimal analyst intervention. Experimental results performed on real data collected during natural disasters show the effectiveness of PASTA in discovering interesting groups of correlated tweets without selecting neither the algorithms nor their parameters
    • ā€¦
    corecore