30,708 research outputs found

    Temporal knowledge discovery in big BAS data for building energy management

    Get PDF
    With the advances of information technologies, today's building automation systems (BASs) are capable of managing building operational performance in an efficient and convenient way. Meanwhile, the amount of real-time monitoring and control data in BASs grows continually in the building lifecycle, which stimulates an intense demand for powerful big data analysis tools in BASs. Existing big data analytics adopted in the building automation industry focus on mining cross-sectional relationships, whereas the temporal relationships, i.e., the relationships over time, are usually overlooked. However, building operations are typically dynamic and BAS data are essentially multivariate time series data. This paper presents a time series data mining methodology for temporal knowledge discovery in big BAS data. A number of time series data mining techniques are explored and carefully assembled, including the Symbolic Aggregate approXimation (SAX), motif discovery, and temporal association rule mining. This study also develops two methods for the efficient post-processing of knowledge discovered. The methodology has been applied to analyze the BAS data retrieved from a real building. The temporal knowledge discovered is valuable to identify dynamics, patterns and anomalies in building operations, derive temporal association rules within and between subsystems, assess building system performance and spot opportunities in energy conservation.Department of Building Services EngineeringDepartment of Computin

    Diagnosis of diseases using data mining

    Get PDF
    Introduction: In the information age, data are the most important asset for health organizations. In the case of using data in useful and optimal manner, they can become financial resources for organization. Data mining is an appropriate method to transform this potential value into strategic information. Data mining means extraction of hidden information, recognition of hidden relationships and patterns, and in general, discovery of useful knowledge at high volume. The objective of this review paper was to evaluate using data mining in diagnoses of diseases. Methods: This research is a review paper conducted based on a structured review of the papers published in Science Direct, Pubmed, Google Scholar, SID, Magiran (between years 2005 and 2015) and books related to using data mining in medical science and using it in diagnose of diseases with related keywords. Results: Nowadays, data mining is used in many medical science studies, including diagnosis of diseases, discovering the hidden patterns in data, and so on. New ideas such as discovery of Knowledge from Discovery and Data Mining Database, which includes data mining techniques, have found more popularity and they has becomedesired research tool for researchers. Researchers can use them to identify patterns and relationshipsamong great number of variables. Using them, researchers have been able to predict theresults obtained from one disease by using information stores available in databases. Several studies have indicated that data mining is used widely in diagnosis of diseases based on types of information (medical images, characteristics of patients, and so on), such as tuberculosis, types of cancers, infectious diseases, and diagnosis of anomalies rarely diagnosed by human (spots and particular points within aye, which is the symptom of onset of blindness resulting from diabetes), determining type of behavior with patients, and predicting the success rate of surgical surgeries, determining the success rate of therapeutic methods in coping with incurable diseases, and so on. Conclusion: One of the most important challenging topics in healthcare is transformation of raw clinical data into meaningful information following continuous generation of great number of data. In current competitive environment, health organizations using technologies such as data mining to improve healthcare quality will achieve success faster. Many of research centers in Iran are faced with large volume of information, which is not analyzed at all or will be time-consuming due to using traditional methods, even in the case of using analysis and converting them to knowledge. In light of using data mining and its implementation, health organizations can transform the data into a powerful and competitive tool and take new steps in preventing, diagnosing, treating, and providing high-quality services for clients.&nbsp

    Energy Analytics for Infrastructure: An Application to Institutional Buildings

    Get PDF
    abstract: Commercial buildings in the United States account for 19% of the total energy consumption annually. Commercial Building Energy Consumption Survey (CBECS), which serves as the benchmark for all the commercial buildings provides critical input for EnergyStar models. Smart energy management technologies, sensors, innovative demand response programs, and updated versions of certification programs elevate the opportunity to mitigate energy-related problems (blackouts and overproduction) and guides energy managers to optimize the consumption characteristics. With increasing advancements in technologies relying on the ‘Big Data,' codes and certification programs such as the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), and the Leadership in Energy and Environmental Design (LEED) evaluates during the pre-construction phase. It is mostly carried out with the assumed quantitative and qualitative values calculated from energy models such as Energy Plus and E-quest. However, the energy consumption analysis through Knowledge Discovery in Databases (KDD) is not commonly used by energy managers to perform complete implementation, causing the need for better energy analytic framework. The dissertation utilizes Interval Data (ID) and establishes three different frameworks to identify electricity losses, predict electricity consumption and detect anomalies using data mining, deep learning, and mathematical models. The process of energy analytics integrates with the computational science and contributes to several objectives which are to 1. Develop a framework to identify both technical and non-technical losses using clustering and semi-supervised learning techniques. 2. Develop an integrated framework to predict electricity consumption using wavelet based data transformation model and deep learning algorithms. 3. Develop a framework to detect anomalies using ensemble empirical mode decomposition and isolation forest algorithms. With a thorough research background, the first phase details on performing data analytics on the demand-supply database to determine the potential energy loss reduction potentials. Data preprocessing and electricity prediction framework in the second phase integrates mathematical models and deep learning algorithms to accurately predict consumption. The third phase employs data decomposition model and data mining techniques to detect the anomalies of institutional buildings.Dissertation/ThesisDoctoral Dissertation Civil, Environmental and Sustainable Engineering 201

    Fast and Accurate Dual-Way Streaming PARAFAC2 for Irregular Tensors -- Algorithm and Application

    Full text link
    How can we efficiently and accurately analyze an irregular tensor in a dual-way streaming setting where the sizes of two dimensions of the tensor increase over time? What types of anomalies are there in the dual-way streaming setting? An irregular tensor is a collection of matrices whose column lengths are the same while their row lengths are different. In a dual-way streaming setting, both new rows of existing matrices and new matrices arrive over time. PARAFAC2 decomposition is a crucial tool for analyzing irregular tensors. Although real-time analysis is necessary in the dual-way streaming, static PARAFAC2 decomposition methods fail to efficiently work in this setting since they perform PARAFAC2 decomposition for accumulated tensors whenever new data arrive. Existing streaming PARAFAC2 decomposition methods work in a limited setting and fail to handle new rows of matrices efficiently. In this paper, we propose Dash, an efficient and accurate PARAFAC2 decomposition method working in the dual-way streaming setting. When new data are given, Dash efficiently performs PARAFAC2 decomposition by carefully dividing the terms related to old and new data and avoiding naive computations involved with old data. Furthermore, applying a forgetting factor makes Dash follow recent movements. Extensive experiments show that Dash achieves up to 14.0x faster speed than existing PARAFAC2 decomposition methods for newly arrived data. We also provide discoveries for detecting anomalies in real-world datasets, including Subprime Mortgage Crisis and COVID-19.Comment: 12 pages, accept to The 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 202

    A generalized matrix profile framework with support for contextual series analysis

    Get PDF
    The Matrix Profile is a state-of-the-art time series analysis technique that can be used for motif discovery, anomaly detection, segmentation and others, in various domains such as healthcare, robotics, and audio. Where recent techniques use the Matrix Profile as a preprocessing or modeling step, we believe there is unexplored potential in generalizing the approach. We derived a framework that focuses on the implicit distance matrix calculation. We present this framework as the Series Distance Matrix (SDM). In this framework, distance measures (SDM-generators) and distance processors (SDM-consumers) can be freely combined, allowing for more flexibility and easier experimentation. In SDM, the Matrix Profile is but one specific configuration. We also introduce the Contextual Matrix Profile (CMP) as a new SDM-consumer capable of discovering repeating patterns. The CMP provides intuitive visualizations for data analysis and can find anomalies that are not discords. We demonstrate this using two real world cases. The CMP is the first of a wide variety of new techniques for series analysis that fits within SDM and can complement the Matrix Profile

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow
    corecore