9,621 research outputs found
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
CPS Data Streams Analytics based on Machine Learning for Cloud and Fog Computing: A Survey
Cloud and Fog computing has emerged as a promising paradigm for the Internet of things (IoT) and cyber-physical systems (CPS). One characteristic of CPS is the reciprocal feedback loops between physical processes and cyber elements (computation, software and networking), which implies that data stream analytics is one of the core components of CPS. The reasons for this are: (i) it extracts the insights and the knowledge from the data streams generated by various sensors and other monitoring components embedded in the physical systems; (ii) it supports informed decision making; (iii) it enables feedback from the physical processes to the cyber counterparts; (iv) it eventually facilitates the integration of cyber and physical systems. There have been many successful applications of data streams analytics, powered by machine learning techniques, to CPS systems. Thus, it is necessary to have a survey on the particularities of the application of machine learning techniques to the CPS domain. In particular, we explore how machine learning methods should be deployed and integrated in cloud and fog architectures for better fulfilment of the requirements, e.g. mission criticality and time criticality, arising in CPS domains. To the best of our knowledge, this paper is the first to systematically study machine learning techniques for CPS data stream analytics from various perspectives, especially from a perspective that leads to the discussion and guidance of how the CPS machine learning methods should be deployed in a cloud and fog architecture
- …