107,605 research outputs found
Facilitating and Enhancing the Performance of Model Selection for Energy Time Series Forecasting in Cluster Computing Environments
Applying Machine Learning (ML) manually to a given problem setting is a tedious and time-consuming process which brings many challenges with it, especially in the context of Big Data. In such a context, gaining insightful information, finding patterns, and extracting knowledge from large datasets are quite complex tasks. Additionally, the configurations of the underlying Big Data infrastructure introduce more complexity for configuring and running ML tasks. With the growing interest in ML the last few years, particularly people without extensive ML expertise have a high demand for frameworks assisting people in applying the right ML algorithm to their problem setting. This is especially true in the field of smart energy system applications where more and more ML algorithms are used e.g. for time series forecasting. Generally, two groups of non-expert users are distinguished to perform energy time series forecasting. The first one includes the users who are familiar with statistics and ML but are not able to write the necessary programming code for training and evaluating ML models using the well-known trial-and-error approach. Such an approach is time consuming and wastes resources for constructing multiple models. The second group is even more inexperienced in programming and not knowledgeable in statistics and ML but wants to apply given ML solutions to their problem settings.
The goal of this thesis is to scientifically explore, in the context of more concrete use cases in the energy domain, how such non-expert users can be optimally supported in creating and performing ML tasks in practice on cluster computing environments. To support the first group of non-expert users, an easy-to-use modular extendable microservice-based ML solution for instrumenting and evaluating ML algorithms on top of a Big Data technology stack is conceptualized and evaluated. Our proposed solution facilitates applying trial-and-error approach by hiding the low level complexities from the users and introduces the best conditions to efficiently perform ML tasks in cluster computing environments.
To support the second group of non-expert users, the first solution is extended to realize meta learning approaches for automated model selection. We evaluate how meta learning technology can be efficiently applied to the problem space of data analytics for smart energy systems to assist energy system experts which are not data analytics experts in applying the right ML algorithms to their data analytics problems. To enhance the predictive performance of meta learning, an efficient characterization of energy time series datasets is required. To this end, Descriptive Statistics Time based Meta Features (DSTMF), a new kind of meta features, is designed to accurately capture the deep characteristics of energy time series datasets. We find that DSTMF outperforms the other state-of-the-art meta feature sets introduced in the literature to characterize energy time series datasets in terms of the accuracy of meta learning models and the time needed to extract them. Further enhancement in the predictive performance of the meta learning classification model is achieved by training the meta learner on new efficient meta examples. To this end, we proposed two new approaches to generate new energy time series datasets to be used as training meta examples by the meta learner depending on the type of time series dataset (i.e. generation or energy consumption time series). We find that extending the original training sets with new meta examples generated by our approaches outperformed the case in which the original is extended by new simulated energy time series datasets
Recommended from our members
Single-Route and Dual-Route Approaches to Reading Aloud Difficulties Associated with Dysphasia.
The study of reading aloud is currently informed by two main types of theory: modular dual-route and connectionist single-route. One difference between the
theories is the type of word classification system which they favour. Dual-route theory employs the regular-irregular dichotomy of classification, whereas single
route considers body neighbourhoods to be a more informative approach. This thesis explores the reading aloud performance of a group of people with dysphasia from the two theoretical standpoints by employing a specifically prepared set of real and pseudoword stimuli. As well as being classified according to regularity and body neighbourhood, all the real word stimuli were controlled for frequency. The pseudowords were divided into two groups, common pseudowords and pseudohomophones, and classified according to body neighbourhood. There were two main phases to the study. In the first phase, the stimuli were piloted and the response time performances of a group of people with dysphasia and a group of matched control people were compared. In the second phase, a series of tasks was developed to investigate which means of word classification best explained the visual lexical decision and reading aloud performance of people with dysphasia. The influence of word knowledge was also considered. The data was analysed both quantitatively and qualitatively. The quantitative analysis of the number of errors made indicated that classification of items by body neighbourhood and frequency provided the more comprehensive explanation of the data. Investigation of the types of errors that were made did not find a significant
relationship between word type and error type, but again the results indicated that the influence of frequency and body neighbourhood was stronger than that of regularity. The findings are discussed both in terms of their implications for the two theories of reading aloud and their relevance to clinical practice
Detecting modules in dense weighted networks with the Potts method
We address the problem of multiresolution module detection in dense weighted
networks, where the modular structure is encoded in the weights rather than
topology. We discuss a weighted version of the q-state Potts method, which was
originally introduced by Reichardt and Bornholdt. This weighted method can be
directly applied to dense networks. We discuss the dependence of the resolution
of the method on its tuning parameter and network properties, using sparse and
dense weighted networks with built-in modules as example cases. Finally, we
apply the method to data on stock price correlations, and show that the
resulting modules correspond well to known structural properties of this
correlation network.Comment: 14 pages, 6 figures. v2: 1 figure added, 1 reference added, minor
changes. v3: 3 references added, minor change
Modular invariants and subfactors
In this lecture we explain the intimate relationship between modular
invariants in conformal field theory and braided subfactors in operator
algebras. Our analysis is based on an approach to modular invariants using
braided sector induction ("-induction") arising from the treatment of
conformal field theory in the Doplicher-Haag-Roberts framework. Many properties
of modular invariants which have so far been noticed empirically and considered
mysterious can be rigorously derived in a very general setting in the subfactor
context. For example, the connection between modular invariants and graphs (cf.
the A-D-E classification for ) finds a natural explanation and
interpretation. We try to give an overview on the current state of affairs
concerning the expected equivalence between the classifications of braided
subfactors and modular invariant two-dimensional conformal field theories.Comment: 25 pages, AMS LaTeX, epic, eepic, doc-class fic-1.cl
Modular generalized Springer correspondence III: exceptional groups
We complete the construction of the modular generalized Springer
correspondence for an arbitrary connected reductive group, with a uniform proof
of the disjointness of induction series that avoids the case-by-case arguments
for classical groups used in previous papers in the series. We show that the
induction series containing the trivial local system on the regular nilpotent
orbit is determined by the Sylow subgroups of the Weyl group. Under some
assumptions, we give an algorithm for determining the induction series
associated to the minimal cuspidal datum with a given central character. We
also provide tables and other information on the modular generalized Springer
correspondence for quasi-simple groups of exceptional type, including a
complete classification of cuspidal pairs in the case of good characteristic,
and a full determination of the correspondence in type .Comment: 40 pages. Version 2: added section 7.5, modified Table 5.2 to match
current conventions of GAP3. Version 3 has minor edits suggested by the
referee, including a slight strengthening of Proposition 3.2; final version,
to appear in Math. Annale
3d Modularity
We find and propose an explanation for a large variety of modularity-related
symmetries in problems of 3-manifold topology and physics of 3d
theories where such structures a priori are not manifest. These modular
structures include: mock modular forms, Weil
representations, quantum modular forms, non-semisimple modular tensor
categories, and chiral algebras of logarithmic CFTs.Comment: 119 pages, 10 figures and 20 table
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach
- …