6,278 research outputs found
NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring
Non-intrusive load monitoring, or energy disaggregation, aims to separate
household energy consumption data collected from a single point of measurement
into appliance-level consumption data. In recent years, the field has rapidly
expanded due to increased interest as national deployments of smart meters have
begun in many countries. However, empirically comparing disaggregation
algorithms is currently virtually impossible. This is due to the different data
sets used, the lack of reference implementations of these algorithms and the
variety of accuracy metrics employed. To address this challenge, we present the
Non-intrusive Load Monitoring Toolkit (NILMTK); an open source toolkit designed
specifically to enable the comparison of energy disaggregation algorithms in a
reproducible manner. This work is the first research to compare multiple
disaggregation approaches across multiple publicly available data sets. Our
toolkit includes parsers for a range of existing data sets, a collection of
preprocessing algorithms, a set of statistics for describing data sets, two
reference benchmark disaggregation algorithms and a suite of accuracy metrics.
We demonstrate the range of reproducible analyses which are made possible by
our toolkit, including the analysis of six publicly available data sets and the
evaluation of both benchmark disaggregation algorithms across such data sets.Comment: To appear in the fifth International Conference on Future Energy
Systems (ACM e-Energy), Cambridge, UK. 201
Visualizing clickstream data with multidimensional scaling
We visualize a a web server log by means of multidimensionalscaling. To that end, a so-called dissimilarity metric is introduced inthe sets of sessions and pages respectively. We interpret the resultingvisualizations and find some interesting patterns.
NEXT LEVEL: A COURSE RECOMMENDER SYSTEM BASED ON CAREER INTERESTS
Skills-based hiring is a talent management approach that empowers employers to align recruitment around business results, rather than around credentials and title. It starts with employers identifying the particular skills required for a role, and then screening and evaluating candidates’ competencies against those requirements. With the recent rise in employers adopting skills-based hiring practices, it has become integral for students to take courses that improve their marketability and support their long-term career success. A 2017 survey of over 32,000 students at 43 randomly selected institutions found that only 34% of students believe they will graduate with the skills and knowledge required to be successful in the job market. Furthermore, the study found that while 96% of chief academic officers believe that their institutions are very or somewhat effective at preparing students for the workforce, only 11% of business leaders strongly agree [11]. An implication of the misalignment is that college graduates lack the skills that companies need and value. Fortunately, the rise of skills-based hiring provides an opportunity for universities and students to establish and follow clearer classroom-to-career pathways. To this end, this paper presents a course recommender system that aims to improve students’ career readiness by suggesting relevant skills and courses based on their unique career interests
La détection d'anomalies comme outil de renforcement d'analyse des données et de prédiction dans l'éducation
Les établissements d'enseignement cherchent à concevoir des mécanismes efficaces pour améliorer les résultats scolaires, renforcer le processus d'apprentissage et éviter l'abandon scolaire. L'analyse et la prédiction des performances des étudiants au cours de leurs études peuvent mettre en évidence certaines lacunes d'une formation et détecter les étudiants ayant des problèmes d'apprentissage. Il s'agit donc de développer des techniques et des modèles basés sur des données qui visent à améliorer l'enseignement et l'apprentissage. Les modèles classiques ignorent généralement les étudiants présentant des comportements et incohérences inhabituels, bien qu'ils puissent fournir des informations importantes aux experts du domaine et améliorer les modèles de prédiction. Les profils atypiques dans l'éducation sont à peine explorés et leur impact sur les modèles de prédiction n'a pas encore été étudié dans la littérature. Cette thèse vise donc à étudier les valeurs anormales dans les données éducatives et à étendre les connaissances existantes à leur sujet. La thèse présente trois études de cas de détection de données anormales pour différents contextes éducatifs et modes de représentation des données (jeu de données numériques pour une université allemande, jeu de données numériques pour une université russe, jeu de données séquentiel pour les écoles d'infirmières françaises). Pour chaque cas, l'approche de prétraitement des données est proposée en tenant compte des particularités du jeu de données. Les données préparées ont été utilisées pour détecter les valeurs anormales dans des conditions de vérité terrain inconnue. Les caractéristiques des valeurs anormales détectées ont été explorées et analysées, ce qui a permis d'étendre les connaissances sur le comportement des étudiants dans un processus d'apprentissage. L'une des principales tâches dans le domaine de l'éducation est de développer des mécanismes essentiels qui permettront d'améliorer les résultats scolaires et de réduire l'abandon scolaire. Ainsi, il est nécessaire de construire des modèles de prédiction de performance qui sont capables de détecter les étudiants ayant des problèmes d'apprentissage, qui ont besoin d'une aide spéciale. Le deuxième objectif de la thèse est d'étudier l'impact des valeurs anormales sur les modèles de prédiction. Nous avons considéré deux des tâches de prédiction les plus courantes dans le domaine de l'éducation: (i) la prédiction de l'abandon scolaire, (ii) la prédiction du score final. Les modèles de prédiction ont été comparés en fonction de différents algorithmes de prédiction et de la présence de valeurs anormales dans les données d'entraînement. Cette thèse ouvre de nouvelles voies pour étudier les performances des élèves dans les environnements éducatifs. La compréhension des valeurs anormales et des raisons de leur apparition peut aider les experts du domaine à extraire des informations précieuses des données. La détection des valeurs aberrantes pourrait faire partie du pipeline des systèmes d'alerte précoce pour détecter les élèves à haut risque d'abandon. De plus, les tendances comportementales des valeurs aberrantes peuvent servir de base pour fournir des recommandations aux étudiants dans leurs études ou prendre des décisions concernant l'amélioration du processus éducatif.Educational institutions seek to design effective mechanisms that improve academic results, enhance the learning process, and avoid dropout. The performance analysis and performance prediction of students in their studies may show drawbacks in the educational formations and detect students with learning problems. This induces the task of developing techniques and data-based models which aim to enhance teaching and learning. Classical models usually ignore the students-outliers with uncommon and inconsistent characteristics although they may show significant information to domain experts and affect the prediction models. The outliers in education are barely explored and their impact on the prediction models has not been studied yet in the literature. Thus, the thesis aims to investigate the outliers in educational data and extend the existing knowledge about them. The thesis presents three case studies of outlier detection for different educational contexts and ways of data representation (numerical dataset for the German University, numerical dataset for the Russian University, sequential dataset for French nurse schools). For each case, the data preprocessing approach is proposed regarding the dataset peculiarities. The prepared data has been used to detect outliers in conditions of unknown ground truth. The characteristics of detected outliers have been explored and analysed, which allowed extending the comprehension of students' behaviour in a learning process. One of the main tasks in the educational domain is to develop essential tools which will help to improve academic results and reduce attrition. Thus, plenty of studies aim to build models of performance prediction which can detect students with learning problems that need special help. The second goal of the thesis is to study the impact of outliers on prediction models. The two most common prediction tasks in the educational field have been considered: (i) dropout prediction, (ii) the final score prediction. The prediction models have been compared in terms of different prediction algorithms and the presence of outliers in the training data. This thesis opens new avenues to investigate the students' performance in educational environments. The understanding of outliers and the reasons for their appearance can help domain experts to extract valuable information from the data. Outlier detection might be a part of the pipeline in the early warning systems of detecting students with a high risk of dropouts. Furthermore, the behavioral tendencies of outliers can serve as a basis for providing recommendations for students in their studies or making decisions about improving the educational process
Machine Learning Techniques for Stellar Light Curve Classification
We apply machine learning techniques in an attempt to predict and classify
stellar properties from noisy and sparse time series data. We preprocessed over
94 GB of Kepler light curves from MAST to classify according to ten distinct
physical properties using both representation learning and feature engineering
approaches. Studies using machine learning in the field have been primarily
done on simulated data, making our study one of the first to use real light
curve data for machine learning approaches. We tuned our data using previous
work with simulated data as a template and achieved mixed results between the
two approaches. Representation learning using a Long Short-Term Memory (LSTM)
Recurrent Neural Network (RNN) produced no successful predictions, but our work
with feature engineering was successful for both classification and regression.
In particular, we were able to achieve values for stellar density, stellar
radius, and effective temperature with low error (~ 2 - 4%) and good accuracy
(~ 75%) for classifying the number of transits for a given star. The results
show promise for improvement for both approaches upon using larger datasets
with a larger minority class. This work has the potential to provide a
foundation for future tools and techniques to aid in the analysis of
astrophysical data.Comment: Accepted to The Astronomical Journa
Automated Model Selection with AMSFin a production process of the automotive industry
Machine learning, statistics and knowledge engineering provide a broad variety of supervised learning algorithms for classification. In this paper we introduce the Automated Model Selection Framework (AMSF) which presents automatic and semi-automatic methods to select classifiers. To achieve this we split up the selection process into three distinct phases. Two of those select algorithms by static rules which are derived from a manually created knowledgebase. At this stage of AMSF the user can choose between different rankers in the third phase. Currently, we use instance based learning and a scoring scheme for ranking the classifiers. After evaluation of different rankers we will recommend the most successful to the user by default. Besides describing the architecture and design issues, we additionally point out the versatile ways AMSF is applied in a production process of the automotive industr
- …