400 research outputs found

    Online structural damage classification methodology for offshore wind turbine foundations using data stream analysis

    Get PDF
    Structural health monitoring (SHM) of wind turbines is crucial to improve maintenance and extend their lifespan. This study develops an online data analysis methodology using data stream analysis to classify damage in the links of an offshore wind turbine foundation. The methodology is validated using a laboratory-scaled jacket-type wind turbine foundation structure. 2460 measurements of the healthy structure were acquired, and a 5mm crack was applied to four different links to determine the four unhealthy classes. 820 measurements were taken for each of the unhealthy structures, resulting in a dataset with 5740 instances. As this is an imbalanced multiclass classification problem, a random sampler approach was used to treat the data. The only data obtained was from eight triaxial accelerometers distributed throughout the structure. Three different tree-based stream data classifiers were compared: Hoeffding Tree classifier, Extremely Fast Decision Tree classifier, and Hoeffding Adaptive Tree classifier. Each classification model underwent a tuning parameter procedure, and high values of the receiving operating characteristic area under the curve (ROC AUC) metric were achieved as a result. It is important to note that stream learning differs from batch learning.Peer ReviewedPostprint (published version

    Incremental Market Behavior Classification in Presence of Recurring Concepts

    Get PDF
    In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant number ENE2014-56126-C2-2-R

    Preventing Discriminatory Decision-making in Evolving Data Streams

    Full text link
    Bias in machine learning has rightly received significant attention over the last decade. However, most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking. The unique challenges of the online environment make addressing bias more difficult than in the offline setting. First, Streaming Machine Learning (SML) algorithms must deal with the constantly evolving real-time data stream. Second, they need to adapt to changing data distributions (concept drift) to make accurate predictions on new incoming data. Adding fairness constraints to this already complicated task is not straightforward. In this work, we focus on the challenges of achieving fairness in biased data streams while accounting for the presence of concept drift, accessing one sample at a time. We present Fair Sampling over Stream (FS2FS^2), a novel fair rebalancing approach capable of being integrated with SML classification algorithms. Furthermore, we devise the first unified performance-fairness metric, Fairness Bonded Utility (FBU), to evaluate and compare the trade-off between performance and fairness of different bias mitigation methods efficiently. FBU simplifies the comparison of fairness-performance trade-offs of multiple techniques through one unified and intuitive evaluation, allowing model designers to easily choose a technique. Overall, extensive evaluations show our measures surpass those of other fair online techniques previously reported in the literature

    Anytime Discovery of a Diverse Set of Patterns with Monte Carlo Tree Search

    Get PDF
    International audienceThe discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting patterns from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling, and genetic algorithms for discovering a pattern set that is non-redundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It out-performs other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS approach can be used for SD but also for many other pattern mining tasks

    A survey on machine learning for recurring concept drifting data streams

    Get PDF
    The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

    Social, Private, and Trusted Wearable Technology under Cloud-Aided Intermittent Wireless Connectivity

    Get PDF
    There has been an unprecedented increase in the use of smart devices globally, together with novel forms of communication, computing, and control technologies that have paved the way for a new category of devices, known as high-end wearables. While massive deployments of these objects may improve the lives of people, unauthorized access to the said private equipment and its connectivity is potentially dangerous. Hence, communication enablers together with highly-secure human authentication mechanisms have to be designed.In addition, it is important to understand how human beings, as the primary users, interact with wearable devices on a day-to-day basis; usage should be comfortable, seamless, user-friendly, and mindful of urban dynamics. Usually the connectivity between wearables and the cloud is executed through the user’s more power independent gateway: this will usually be a smartphone, which may have potentially unreliable infrastructure connectivity. In response to these unique challenges, this thesis advocates for the adoption of direct, secure, proximity-based communication enablers enhanced with multi-factor authentication (hereafter refereed to MFA) that can integrate/interact with wearable technology. Their intelligent combination together with the connection establishment automation relying on the device/user social relations would allow to reliably grant or deny access in cases of both stable and intermittent connectivity to the trusted authority running in the cloud.The introduction will list the main communication paradigms, applications, conventional network architectures, and any relevant wearable-specific challenges. Next, the work examines the improved architecture and security enablers for clusterization between wearable gateways with a proximity-based communication as a baseline. Relying on this architecture, the author then elaborates on the social ties potentially overlaying the direct connectivity management in cases of both reliable and unreliable connection to the trusted cloud. The author discusses that social-aware cooperation and trust relations between users and/or the devices themselves are beneficial for the architecture under proposal. Next, the author introduces a protocol suite that enables temporary delegation of personal device use dependent on different connectivity conditions to the cloud.After these discussions, the wearable technology is analyzed as a biometric and behavior data provider for enabling MFA. The conventional approaches of the authentication factor combination strategies are compared with the ‘intelligent’ method proposed further. The assessment finds significant advantages to the developed solution over existing ones.On the practical side, the performance evaluation of existing cryptographic primitives, as part of the experimental work, shows the possibility of developing the experimental methods further on modern wearable devices.In summary, the set of enablers developed here for wearable technology connectivity is aimed at enriching people’s everyday lives in a secure and usable way, in cases when communication to the cloud is not consistently available
    corecore