1,384 research outputs found

    Data Mining in Smart Grids

    Get PDF
    Effective smart grid operation requires rapid decisions in a data-rich, but information-limited, environment. In this context, grid sensor data-streaming cannot provide the system operators with the necessary information to act on in the time frames necessary to minimize the impact of the disturbances. Even if there are fast models that can convert the data into information, the smart grid operator must deal with the challenge of not having a full understanding of the context of the information, and, therefore, the information content cannot be used with any high degree of confidence. To address this issue, data mining has been recognized as the most promising enabling technology for improving decision-making processes, providing the right information at the right moment to the right decision-maker. This Special Issue is focused on emerging methodologies for data mining in smart grids. In this area, it addresses many relevant topics, ranging from methods for uncertainty management, to advanced dispatching. This Special Issue not only focuses on methodological breakthroughs and roadmaps in implementing the methodology, but also presents the much-needed sharing of the best practices. Topics include, but are not limited to, the following: Fuzziness in smart grids computing Emerging techniques for renewable energy forecasting Robust and proactive solution of optimal smart grids operation Fuzzy-based smart grids monitoring and control frameworks Granular computing for uncertainty management in smart grids Self-organizing and decentralized paradigms for information processin

    Simultaneous Coherent Structure Coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity

    Get PDF
    The clustering of data into physically meaningful subsets often requires assumptions regarding the number, size, or shape of the subgroups. Here, we present a new method, simultaneous coherent structure coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. sCSC performs a sequence of binary splittings on the dataset such that the most dissimilar data points are required to be in separate clusters. To achieve this, we obtain a set of orthogonal coordinates along which dissimilarity in the dataset is maximized from a generalized eigenvalue problem based on the pairwise dissimilarity between the data points to be clustered. This sequence of bifurcations produces a binary tree representation of the system, from which the number of clusters in the data and their interrelationships naturally emerge. To illustrate the effectiveness of the method in the absence of a priori assumptions, we apply it to three exemplary problems in fluid dynamics. Then, we illustrate its capacity for interpretability using a high-dimensional protein folding simulation dataset. While we restrict our examples to dynamical physical systems in this work, we anticipate straightforward translation to other fields where existing analysis tools require ad hoc assumptions on the data structure, lack the interpretability of the present method, or in which the underlying processes are less accessible, such as genomics and neuroscience

    A Type-2 Fuzzy Logic Based System for Malaria Epidemic Prediction in Ethiopia

    Get PDF
    Malaria is the most prevalent mosquito-borne disease throughout tropical and subtropical regions of the world with severe medical, economic, and social impact. Malaria is a serious public health problem in Ethiopia since 1959, even if, its morbidity and mortality have been reduced starting from 2001. Various studies were conducted to predict the malaria epidemic using mathematical and statistical approaches, nevertheless, they had no learning capabilities. In this paper, we present a Type-2 Fuzzy Logic Based System for Malaria epidemic prediction in Ethiopia which was trained using real data collected throughout Ethiopia from 2013 to 2017. Fuzzy Logic Based Systems provide a transparent model which employs IF-Then rules for the prediction that could be easily analyzed and interpreted by decision-makers. This is quite important to fight the sources of Malaria and take the needed preventive measures where the generated rules from our system were able to explain the situations and intensity of input factors which contributed to Malaria epidemic incidence up to three months ahead. The presented Type-2 Fuzzy Logic System (T2FLS) learns its rules and fuzzy set parameters from data and was able to outperform its counterparts T1FLS in 2% and ANFIS in 0.33% in the accuracy of prediction of Malaria epidemic in Ethiopia. In addition, the proposed system did shed light on the main causes behind such outbreaks in Ethiopia because of its high level of interpretabilit

    Essays on Predictive Analytics in E-Commerce

    Get PDF
    Die Motivation für diese Dissertation ist dualer Natur: Einerseits ist die Dissertation methodologisch orientiert und entwickelt neue statistische Ansätze und Algorithmen für maschinelles Lernen. Gleichzeitig ist sie praktisch orientiert und fokussiert sich auf den konkreten Anwendungsfall von Produktretouren im Onlinehandel. Die “data explosion”, veursacht durch die Tatsache, dass die Kosten für das Speichern und Prozessieren großer Datenmengen signifikant gesunken sind (Bhimani and Willcocks, 2014), und die neuen Technologien, die daraus resultieren, stellen die größte Diskontinuität für die betriebliche Praxis und betriebswirtschaftliche Forschung seit Entwicklung des Internets dar (Agarwal and Dhar, 2014). Insbesondere die Business Intelligence (BI) wurde als wichtiges Forschungsthema für Praktiker und Akademiker im Bereich der Wirtschaftsinformatik (WI) identifiziert (Chen et al., 2012). Maschinelles Lernen wurde erfolgreich auf eine Reihe von BI-Problemen angewandt, wie zum Beispiel Absatzprognose (Choi et al., 2014; Sun et al., 2008), Prognose von Windstromerzeugung (Wan et al., 2014), Prognose des Krankheitsverlaufs von Patienten eines Krankenhauses (Liu et al., 2015), Identifikation von Betrug Abbasi et al., 2012) oder Recommender-Systeme (Sahoo et al., 2012). Allerdings gibt es nur wenig Forschung, die sich mit Fragestellungen um maschinelles Lernen mit spezifischen Bezug zu BI befasst: Obwohl existierende Algorithmen teilweise modifiziert werden, um sie auf ein bestimmtes Problem anzupassen (Abbasi et al., 2010; Sahoo et al., 2012), beschränkt sich die WI-Forschung im Allgemeinen darauf, existierende Algorithmen, die für andere Fragestellungen als BI entwickelt wurden, auf BI-Fragestellungen anzuwenden (Abbasi et al., 2010; Sahoo et al., 2012). Das erste wichtige Ziel dieser Dissertation besteht darin, einen Beitrag dazu zu leisten, diese Lücke zu schließen. Diese Dissertation fokussiert sich auf das wichtige BI-Problem von Produktretouren im Onlinehandel für eine Illustration und praktische Anwendung der vorgeschlagenen Konzepte. Viele Onlinehändler sind nicht profitabel (Rigby, 2014) und Produktretouren sind eine wichtige Ursache für dieses Problem (Grewal et al., 2004). Neben Kostenaspekten sind Produktretouren aus ökologischer Sicht problematisch. In der Logistikforschung ist es weitestgehend Konsens, dass die “letzte Meile” der Zulieferkette, nämlich dann wenn das Produkt an die Haustür des Kunden geliefert wird, am CO2-intensivsten ist (Browne et al., 2008; Halldórsson et al., 2010; Song et al., 2009). Werden Produkte retourniert, wird dieser energieintensive Schritt wiederholt, wodurch sich die Nachhaltigkeit und Umweltfreundlichkeit des Geschäftsmodells von Onlinehändlern relativ zum klassischen Vertrieb reduziert. Allerdings können Onlinehändler Produktretouren nicht einfach verbieten, da sie einen wichtigen Teil ihres Geschäftsmodells darstellen: So hat die Möglichkeit, Produkte zu retournieren positive Auswirkungen auf Kundenzufriedenheit (Cassill, 1998), Kaufverhalten (Wood, 2001), künftiges Kaufverhalten (Petersen and Kumar, 2009) und emotianale Reaktionen der Kunden (Suwelack et al., 2011). Ein vielversprechender Ansatz besteht darin, sich auf impulsives und kompulsives (LaRose, 2001) sowie betrügerisches Kaufverhalten zu fokussieren (Speights and Hilinski, 2005; Wachter et al., 2012). In gegenwärtigen akademschen Literatur zu dem Thema gibt es keine solchen Strategien. Die meisten Strategien unterscheiden nicht zwischen gewollten und ungewollten Retouren (Walsh et al., 2014). Das zweite Ziel dieser Dissertation besteht daher darin, die Basis für eine Strategie von Prognose und Intervention zu entwickeln, mit welcher Konsumverhalten mit hoher Retourenwahrscheinlichkeit im Vorfeld erkannt und rechtzeitig interveniert werden kann. In dieser Dissertation werden mehrere Prognosemodelle entwickelt, auf Basis welcher demonstriert wird, dass die Strategie, unter der Annahme moderat effektiver Interventionsstrategien, erhebliche Kosteneinsparungen mit sich bringt

    A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

    Full text link
    Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap with arXiv:1403.0623 by other author

    Autonomous supervision and optimization of product quality in a multi-stage manufacturing process based on self-adaptive prediction models.

    Get PDF
    In modern manufacturing facilities, there are basically two essential phases for assuring high production quality with low (or even zero) defects and waste in order to save costs for companies. The first phase concerns the early recognition of potentially arising problems in product quality, the second phase concerns proper reactions upon the recognition of such problems. In this paper, we address a holistic approach for handling both issues consecutively within a predictive maintenance framework at an on-line production system. Thereby, we address multi-stage functionality based on (i) data-driven forecast models for (measure-able) product quality criteria (QCs) at a latter stage, which are established and executed through process values (and their time series trends) recorded at an early stage of production (describing its progress), and (ii) process optimization cycles whose outputs are suggestions for proper reactions at an earlier stage in the case of forecasted downtrends or exceeds of allowed boundaries in product quality. The data-driven forecast models are established through a high-dimensional batch time-series modeling problem. In this, we employ a non-linear version of PLSR (partial least squares regression) by coupling PLS with generalized Takagi–Sugeno fuzzy systems (termed as PLS-fuzzy). The models are able to self-adapt over time based on recursive parameters adaptation and rule evolution functionalities. Two concepts for increased flexibility during model updates are proposed, (i) a dynamic outweighing strategy of older samples with an adaptive update of the forgetting factor (steering forgetting intensity) and (ii) an incremental update of the latent variable space spanned by the directions (loading vectors) achieved through PLS; the whole model update approach is termed as SAFM-IF (self-adaptive forecast models with increased flexibility). Process optimization is achieved through multi-objective optimization using evolutionary techniques, where the (trained and updated) forecast models serve as surrogate models to guide the optimization process to Pareto fronts (containing solution candidates) with high quality. A new influence analysis between process values and QCs is suggested based on the PLS-fuzzy forecast models in order to reduce the dimensionality of the optimization space and thus to guarantee high(er) quality of solutions within a reasonable amount of time (→ better usage in on-line mode). The methodologies have been comprehensively evaluated on real on-line process data from a (micro-fluidic) chip production system, where the early stage comprises the injection molding process and the latter stage the bonding process. The results show remarkable performance in terms of low prediction errors of the PLS-fuzzy forecast models (showing mostly lower errors than achieved by other model architectures) as well as in terms of Pareto fronts with individuals (solutions) whose fitness was close to the optimal values of three most important target QCs (being used for supervision): flatness, void events and RMSEs of the chips. Suggestions could thus be provided to experts/operators how to best change process values and associated machining parameters at the injection molding process in order to achieve significantly higher product quality for the final chips at the end of the bonding process

    A new approach to seasonal energy consumption forecasting using temporal convolutional networks

    Get PDF
    There has been a significant increase in the attention paid to resource management in smart grids, and several energy forecasting models have been published in the literature. It is well known that energy forecasting plays a crucial role in several applications in smart grids, including demand-side management, optimum dispatch, and load shedding. A significant challenge in smart grid models is managing forecasts efficiently while ensuring the slightest feasible prediction error. A type of artificial neural networks such as recurrent neural networks, are frequently used to forecast time series data. However, due to certain limitations like vanishing gradients and lack of memory retention of recurrent neural networks, sequential data should be modeled using convolutional networks. The reason is that they have strong capabilities to solve complex problems better than recurrent neural networks. In this research, a temporal convolutional network is proposed to handle seasonal short-term energy forecasting. The proposed temporal convolutional network computes outputs in parallel, reducing the computation time compared to the recurrent neural networks. Further performance comparison with the traditional long short-term memory in terms of MAD and sMAPE has proved that the proposed model has outperformed the recurrent neural network

    A review of machine learning applications in wildfire science and management

    Full text link
    Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improve awareness of ML among wildfire scientists and managers, as well as illustrate the challenging range of problems in wildfire science available to data scientists. We first present an overview of popular ML approaches used in wildfire science to date, and then review their use in wildfire science within six problem domains: 1) fuels characterization, fire detection, and mapping; 2) fire weather and climate change; 3) fire occurrence, susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6) fire management. We also discuss the advantages and limitations of various ML approaches and identify opportunities for future advances in wildfire science and management within a data science context. We identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. There exists opportunities to apply more current ML methods (e.g., deep learning and agent based learning) in wildfire science. However, despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods requires sophisticated knowledge for their application. Finally, we stress that the wildfire research and management community plays an active role in providing relevant, high quality data for use by practitioners of ML methods.Comment: 83 pages, 4 figures, 3 table
    • …
    corecore