381 research outputs found

    How is a data-driven approach better than random choice in label space division for multi-label classification?

    Full text link
    We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss

    Learning robust and efficient point cloud representations

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Towards Probabilistic and Partially-Supervised Structural Health Monitoring

    Get PDF
    One of the most significant challenges for signal processing in data-based structural health monitoring (SHM) is a lack of comprehensive data; in particular, recording labels to describe what each of the measured signals represent. For example, consider an offshore wind-turbine, monitored by an SHM strategy. It is infeasible to artificially damage such a high-value asset to collect signals that might relate to the damaged structure in situ; additionally, signals that correspond to abnormal wave-loading, or unusually low-temperatures, could take several years to be recorded. Regular inspections of the turbine in operation, to describe (and label) what measured data represent, would also prove impracticable -- conventionally, it is only possible to check various components (such as the turbine blades) following manual inspection; this involves travelling to a remote, offshore location, which is a high-cost procedure. Therefore, the collection of labelled data is generally limited by some expense incurred when investigating the signals; this might include direct costs, or loss of income due to down-time. Conventionally, incomplete label information forces a dependence on unsupervised machine learning, limiting SHM strategies to damage (i.e. novelty) detection. However, while comprehensive and fully labelled data can be rare, it is often possible to provide labels for a limited subset of data, given a label budget. In this scenario, partially-supervised machine learning should become relevant. The associated algorithms offer an alternative approach to monitor measured data, as they can utilise both labelled and unlabelled signals, within a unifying training scheme. In consequence, this work introduces (and adapts) partially-supervised algorithms for SHM; specifically, semi-supervised and active learning methods. Through applications to experimental data, semi-supervised learning is shown to utilise information in the unlabelled signals, alongside a limited set of labelled data, to further update a predictive-model. On the other hand, active learning improves the predictive performance by querying specific signals to investigate, which are assumed the most informative. Both discriminative and generative methods are investigated, leading towards a novel, probabilistic framework, to classify, investigate, and label signals for online SHM. The findings indicate that, through partially-supervised learning, the cost associated with labelling data can be managed, as the information in a selected subset of labelled signals can be combined with larger sets of unlabelled data -- increasing the potential scope and predictive performance for data-driven SHM

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF

    Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning

    Get PDF
    In a context of global carbon emission reduction goals, buildings have been identified to detain valuable energy-saving abilities. With the exponential increase of smart, connected building automation systems, massive amounts of data are now accessible for analysis. These coupled with powerful data science methods and machine learning algorithms present a unique opportunity to identify untapped energy-saving potentials from field information, and effectively turn buildings into active assets of the built energy infrastructure.However, the diversity of building occupants, infrastructures, and the disparities in collected information has produced disjointed scales of analytics that make it tedious for approaches to scale and generalize over the building stock.This coupled with the lack of standards in the sector has hindered the broader adoption of data science practices in the field, and engendered the following questioning:How can data science facilitate the scaling of approaches and bridge disconnected spatiotemporal scales of the built environment to deliver enhanced energy-saving strategies?This thesis focuses on addressing this interrogation by investigating data-driven, scalable, interpretable, and multi-scale approaches across varying types of analytical classes. The work particularly explores descriptive, predictive, and prescriptive analytics to connect occupants, buildings, and urban energy planning together for improved energy performances.First, a novel multi-dimensional data-mining framework is developed, producing distinct dimensional outlines supporting systematic methodological approaches and refined knowledge discovery. Second, an automated building heat dynamics identification method is put forward, supporting large-scale thermal performance examination of buildings in a non-intrusive manner. The method produced 64\% of good quality model fits, against 14\% close, and 22\% poor ones out of 225 Dutch residential buildings. %, which were open-sourced in the interest of developing benchmarks. Third, a pioneering hierarchical forecasting method was designed, bridging individual and aggregated building load predictions in a coherent, data-efficient fashion. The approach was evaluated over hierarchies of 37, 140, and 383 nodal elements and showcased improved accuracy and coherency performances against disjointed prediction systems.Finally, building occupants and urban energy planning strategies are investigated under the prism of uncertainty. In a neighborhood of 41 Dutch residential buildings, occupants were determined to significantly impact optimal energy community designs in the context of weather and economic uncertainties.Overall, the thesis demonstrated the added value of multi-scale approaches in all analytical classes while fostering best data-science practices in the sector from benchmarks and open-source implementations

    Wireless sensor data processing for on-site emergency response

    Get PDF
    This thesis is concerned with the problem of processing data from Wireless Sensor Networks (WSNs) to meet the requirements of emergency responders (e.g. Fire and Rescue Services). A WSN typically consists of spatially distributed sensor nodes to cooperatively monitor the physical or environmental conditions. Sensor data about the physical or environmental conditions can then be used as part of the input to predict, detect, and monitor emergencies. Although WSNs have demonstrated their great potential in facilitating Emergency Response, sensor data cannot be interpreted directly due to its large volume, noise, and redundancy. In addition, emergency responders are not interested in raw data, they are interested in the meaning it conveys. This thesis presents research on processing and combining data from multiple types of sensors, and combining sensor data with other relevant data, for the purpose of obtaining data of greater quality and information of greater relevance to emergency responders. The current theory and practice in Emergency Response and the existing technology aids were reviewed to identify the requirements from both application and technology perspectives (Chapter 2). The detailed process of information extraction from sensor data and sensor data fusion techniques were reviewed to identify what constitutes suitable sensor data fusion techniques and challenges presented in sensor data processing (Chapter 3). A study of Incident Commanders’ requirements utilised a goal-driven task analysis method to identify gaps in current means of obtaining relevant information during response to fire emergencies and a list of opportunities for WSN technology to fill those gaps (Chapter 4). A high-level Emergency Information Management System Architecture was proposed, including the main components that are needed, the interaction between components, and system function specification at different incident stages (Chapter 5). A set of state-awareness rules was proposed, and integrated with Kalman Filter to improve the performance of filtering. The proposed data pre-processing approach achieved both improved outlier removal and quick detection of real events (Chapter 6). A data storage mechanism was proposed to support timely response to queries regardless of the increase in volume of data (Chapter 7). What can be considered as “meaning” (e.g. events) for emergency responders were identified and a generic emergency event detection model was proposed to identify patterns presenting in sensor data and associate patterns with events (Chapter 8). In conclusion, the added benefits that the technical work can provide to the current Emergency Response is discussed and specific contributions and future work are highlighted (Chapter 9)

    Evaluating the Resiliency of Industrial Internet of Things Process Control Using Protocol Agnostic Attacks

    Get PDF
    Improving and defending our nation\u27s critical infrastructure has been a challenge for quite some time. A malfunctioning or stoppage of any one of these systems could result in hazardous conditions on its supporting populace leading to widespread damage, injury, and even death. The protection of such systems has been mandated by the Office of the President of the United States of America in Presidential Policy Directive Order 21. Current research now focuses on securing and improving the management and efficiency of Industrial Control Systems (ICS). IIoT promises a solution in enhancement of efficiency in ICS. However, the presence of IIoT can be a security concern, forcing ICS processes to rely on network based devices for process management. In this research, the attack surface of a testbed is evaluated using protocol-agnostic attacks and the SANS ICS Cyber Kill Chain. This highlights the widening of ICS attack surface due to reliance on IIoT, but also provides a solution which demonstrates one technique an ICS can use to securely rely on IIoT

    Ambient-Aware LiDAR Odometry in Variable Terrains

    Full text link
    The flexibility of Simultaneous Localization and Mapping (SLAM) algorithms in various environments has consistently been a significant challenge. To address the issue of LiDAR odometry drift in high-noise settings, integrating clustering methods to filter out unstable features has become an effective module of SLAM frameworks. However, reducing the amount of point cloud data can lead to potential loss of information and possible degeneration. As a result, this research proposes a LiDAR odometry that can dynamically assess the point cloud's reliability. The algorithm aims to improve adaptability in diverse settings by selecting important feature points with sensitivity to the level of environmental degeneration. Firstly, a fast adaptive Euclidean clustering algorithm based on range image is proposed, which, combined with depth clustering, extracts the primary structural points of the environment defined as ambient skeleton points. Then, the environmental degeneration level is computed through the dense normal features of the skeleton points, and the point cloud cleaning is dynamically adjusted accordingly. The algorithm is validated on the KITTI benchmark and real environments, demonstrating higher accuracy and robustness in different environments
    • …
    corecore