Search CORE

5 research outputs found

Content-Based Access Control

Author: Zeng Wenrong
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2015
Field of study

In conventional database, the most popular access control model specifies policies explicitly for each role of every user against each data object manually. Nowadays, in large-scale content-centric data sharing, conventional approaches could be impractical due to exponential explosion of the data growth and the sensitivity of data objects. What's more, conventional database access control policy will not be functional when the semantic content of data is expected to play a role in access decisions. Users are often over-privileged, and ex post facto auditing is enforced to detect misuse of the privileges. Unfortunately, it is usually difficult to reverse the damage, as (large amount of) data has been disclosed already. In this dissertation, we first introduce Content-Based Access Control (CBAC), an innovative access control model for content-centric information sharing. As a complement to conventional access control models, the CBAC model makes access control decisions based on the content similarity between user credentials and data content automatically. In CBAC, each user is allowed by a metarule to access "a subset" of the designated data objects of a content-centric database, while the boundary of the subset is dynamically determined by the textual content of data objects. We then present an enforcement mechanism for CBAC that exploits Oracles Virtual Private Database (VPD) to implement a row-wise access control and to prevent data objects from being abused by unnecessary access admission. To further improve the performance of the proposed approach, we introduce a content-based blocking mechanism to improve the efficiency of CBAC enforcement to further reveal a more relevant part of the data objects comparing with only using the user credentials and data content. We also utilized several tagging mechanisms for more accurate textual content matching for short text snippets (e.g. short VarChar attributes) to extract topics other than pure word occurrences to represent the content of data. In the tagging mechanism, the similarity of content is calculated not purely dependent on the word occurrences but the semantic topics underneath the text content. Experimental results show that CBAC makes accurate access control decisions with a small overhead

CiteSeerX

KU ScholarWorks

A Human-Centric Approach to Data Fusion in Post-Disaster Managment: The Development of a Fuzzy Set Theory Based Model

Author: Banisakher Mubarak
Publication venue: University of Central Florida
Publication date: 01/01/2014
Field of study

It is critical to provide an efficient and accurate information system in the post-disaster phase for individuals\u27 in order to access and obtain the necessary resources in a timely manner; but current map based post-disaster management systems provide all emergency resource lists without filtering them which usually leads to high levels of energy consumed in calculation. Also an effective post-disaster management system (PDMS) will result in distribution of all emergency resources such as, hospital, storage and transportation much more reasonably and be more beneficial to the individuals in the post disaster period. In this Dissertation, firstly, semi-supervised learning (SSL) based graph systems was constructed for PDMS. A Graph-based PDMS\u27 resource map was converted to a directed graph that presented by adjacent matrix and then the decision information will be conducted from the PDMS by two ways, one is clustering operation, and another is graph-based semi-supervised optimization process. In this study, PDMS was applied for emergency resource distribution in post-disaster (responses phase), a path optimization algorithm based ant colony optimization (ACO) was used for minimizing the cost in post-disaster, simulation results show the effectiveness of the proposed methodology. This analysis was done by comparing it with clustering based algorithms under improvement ACO of tour improvement algorithm (TIA) and Min-Max Ant System (MMAS) and the results also show that the SSL based graph will be more effective for calculating the optimization path in PDMS. This research improved the map by combining the disaster map with the initial GIS based map which located the target area considering the influence of disaster. First, all initial map and disaster map will be under Gaussian transformation while we acquired the histogram of all map pictures. And then all pictures will be under discrete wavelet transform (DWT), a Gaussian fusion algorithm was applied in the DWT pictures. Second, inverse DWT (iDWT) was applied to generate a new map for a post-disaster management system. Finally, simulation works were proposed and the results showed the effectiveness of the proposed method by comparing it to other fusion algorithms, such as mean-mean fusion and max-UD fusion through the evaluation indices including entropy, spatial frequency (SF) and image quality index (IQI). Fuzzy set model were proposed to improve the presentation capacity of nodes in this GIS based PDMS

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

비표지 고장 데이터와 유중가스분석데이터를 이용한 딥러닝기반 주변압기 고장진단 연구

Author: 김선의
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 기계항공공학부, 2021.8. 소재웅.오늘날 산업의 급속한 발전과 고도화로 인해 안전하고 신뢰할 수 있는 전력 계통에 대한 수요는 더욱 중요해지고 있다. 따라서 실제 산업 현장에서는 주변압기의 안전한 작동을 위해 상태를 정확하게 진단할 수 있는 prognostics and health management (PHM)와 같은 기술이 필요하다. 주변압기 진단을 위해 개발된 다양한 방법 중 인공지능(AI) 기반 접근법은 산업과 학계에서 많은 관심을 받고 있다. 더욱이 방대한 데이터와 함께 높은 성능을 달성하는 딥 러닝 기술은 주변압기 고장 진단의 학자들에게 높은 관심을 갖게 해줬다. 그 이유는 딥 러닝 기술이 시스템의 도메인 지식을 깊이 이해할 필요 없이 대량의 데이터만 주어진다면 복잡한 시스템이라도 사용자의 목적에 맞게 그 해답을 찾을 수 있기 때문에 딥 러닝에 대한 관심은 주변압기 고장 진단 분야에서 특히 두드러졌다. 그러나, 이러한 뛰어난 진단 성능은 아직 실제 주변압기 산업에서는 많은 관심을 얻고 있지는 못한 것으로 알려졌다. 그 이유는 산업현장의 비표지데이터와 소량의 고장데이터 때문에 우수한 딥러닝기반의 고장 진단 모델들을 개발하기 어렵다. 따라서 본 학위논문에서는 주변압기 산업에서 현재 대두되고 있는 세가지 이슈를 연구하였다. 1) 건전성 평면 시각화 이슈, 2) 데이터 부족 이슈, 3) 심각도 이슈 들을 극복하기 위한 딥 러닝 기반 고장 진단 연구를 진행하였다. 소개된 세가지 이슈들을 개선하기 위해 본 학위논문은 세 가지 연구를 제안하였다. 첫 번째 연구는 보조 감지 작업이 있는 준지도 자동 인코더를 통해 건전성 평면을 제안하였다. 제안된 방법은 변압기 열하 특성을 시각화 할 수 있다. 또한, 준지도 접근법을 활용하기 때문에 방대한 비표지데이터 그리고 소수의 표지데이터만으로 구현될 수 있다. 제안방법은 주변압기 건전성을 건전성 평면과 함께 시각화하고, 매우 적은 소수의 레이블 데이터만으로 주변압기 고장을 진단한다. 두 번째 연구는 규칙 기반 Duval 방법을 AI 기반 deep neural network (DNN)과 융합(bridge)하는 새로운 프레임워크를 제안하였다. 이 방법은 룰기반의 Duval을 사용하여 비표지데이터를 수도 레이블링한다 (pseudo-labeling). 또한, AI 기반 DNN은 정규화 기술과 매개 변수 전이 학습을 적용하여 노이즈가 있는 pseudo-label 데이터를 학습하는데 사용된다. 개발된 기술은 방대한양의 비표지데이터를 룰기반으로 일차적으로 진단한 결과와 소수의 실제 고장데이터와 함께 학습데이터로 훈련하였을 때 기존의 진단 방법보다 획기적인 향상을 가능케 한다. 끝으로, 세 번째 연구는 고장 타입을 진단할 뿐만 아니라 심각도 또한 진단하는 기술을 제안하였다. 이때 두 상태의 레이블링된 고장 타입과 심각도 사이에는 불균일한 데이터 분포로 이루어져 있다. 그 이유는 심각도의 경우 레이블링이 항상 되어 있지만 고장 타입의 경우는 실제 주변압기로부터 고장 타입 데이터를 얻기가 매우 어렵기 때문이다. 따라서, 본 연구에서 세번째로 개발한 기술은 오늘날 데이터 생성에 매우 우수한 성능을 달성하고 있는 generative adversarial network (GAN)를 통해 불균형한 두 상태를 균일화 작업을 수행하는 동시에 고장 모드와 심각도를 진단하는 모델을 개발하였다.Due to the rapid development and advancement of today’s industry, the demand for safe and reliable power distribution and transmission lines is becoming more critical; thus, prognostics and health management (hereafter, PHM) is becoming more important in the power transformer industry. Among various methods developed for power transformer diagnosis, the artificial intelligence (AI) based approach has received considerable interest from academics. Specifically, deep learning technology, which offers excellent performance when used with vast amounts of data, is also rapidly gaining the spotlight in the academic field of transformer fault diagnosis. The interest in deep learning has been especially noticed in the field of fault diagnosis, because deep learning algorithms can be applied to complex systems that have large amounts of data, without the need for a deep understanding of the domain knowledge of the system. However, the outstanding performance of these diagnosis methods has not yet gained much attention in the power transformer PHM industry. The reason is that a large amount of unlabeled and a small amount of fault data always restrict their deep-learning-based diagnosis methods in the power transformer PHM industry. Therefore, in this dissertation research, deep-learning-based fault diagnosis methods are developed to overcome three issues that currently prevent this type of diagnosis in industrial power transformers: 1) the visualization of health feature space issue, 2) the insufficient data issue, and 3) the severity issue. To cope with these challenges, this thesis is composed of three research thrusts. The first research thrust develops a health feature space via a semi-supervised autoencoder with an auxiliary detection task. The proposed method can visualize a monotonic health trendability of the transformer’s degradation properties. Further, thanks to the use of a semi-supervised approach, the method is applicable to situations with a large amount of unlabeled and a small amount labeled data (a situation common in industrial datasets). Next, the second research thrust proposes a new framework, that bridges the rule-based Duval method with an AI-based deep neural network (BDD). In this method, the rule-based Duval method is utilized to pseudo-label a large amount of unlabeled data. Furthermore, the AI-based DNN is used to apply regularization techniques and parameter transfer learning to learn the noisy pseudo-labelled data. Finally, the third thrust not only identifies fault types but also indicates a severity level. However, the balance between labeled fault types and the severity level is imbalanced in real-world data. Therefore, in the proposed method, diagnosis of fault types – with severity levels – under imbalanced conditions is addressed by utilizing a generative adversarial network with an auxiliary classifier. The validity of the proposed methods is demonstrated by studying massive unlabeled dissolved gas analysis (DGA) data, provided by the Korea Electric Power Company (KEPCO), and sparse labeled data, provided by the IEC TC 10 database. Each developed method could be used in industrial fields that use power transformers to monitor the health feature space, consider severity level, and diagnose transformer faults under extremely insufficient labeled fault data.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Scope and Overview 4 1.3 Dissertation Layout 7 Chapter 2 Literature Review 9 2.1 A Brief Overview of Rule-Based Fault Diagnosis 9 2.2 A Brief Overview of Conventional AI-Based Fault Diagnosis 11 Chapter 3 Extracting Health Feature Space via Semi-Supervised Autoencoder with an Auxiliary Task (SAAT) 13 3.1 Backgrounds of Semi-supervised autoencoder (SSAE) 15 3.1.1 Autoencoder: Unsupervised Feature Extraction 15 3.1.2 Softmax Classifier: Supervised Classification 17 3.1.3 Semi-supervised Autoencoder 18 3.2 Input DGA Data Preprocessing 20 3.3 SAAT-Based Fault Diagnosis Method 21 3.3.1 Roles of the Auxiliary Detection Task 23 3.3.2 Architecture of the Proposed SAAT 27 3.3.3 Health Feature Space Visualization 29 3.3.4 Overall Procedure of the Proposed SAAT-based Fault Diagnosis 30 3.4 Performance Evaluation of SAAT 31 3.4.1 Data Description and Implementation 31 3.4.2 An Outline of Four Comparative Studies and Quantitative Evaluation Metrics 33 3.4.3 Experimental Results and Discussion 36 3.5 Summary and Discussion 49 Chapter 4 Learning from Even a Weak Teacher: Bridging Rule-based Duval Weak Supervision and a Deep Neural Network (BDD) for Diagnosing Transformer 51 4.1 Backgrounds of BDD 53 4.1.1 Rule-based method: Duval Method 53 4.1.2 Deep learning Based Method: Deep Neural Network 54 4.1.3 Parameter Transfer 55 4.2 BDD Based Fault Diagnosis 56 4.2.1 Problem Statement 56 4.2.2 Framework of the Proposed BDD 57 4.2.3 Overall Procedure of BDD-based Fault Diagnosis 63 4.3 Performance Evaluation of the BDD 64 4.3.1 Description of Data and the DNN Architecture 64 4.3.2 Experimental Results and Discussion 66 4.4 Summary and Discussion 76 Chapter 5 Generative Adversarial Network with Embedding Severity DGA Level 79 5.1 Backgrounds of Generative Adversarial Network 81 5.2 GANES based Fault Diagnosis 82 5.2.1 Training Strategy of GANES 82 5.2.2 Overall procedure of GANES 87 5.3 Performance Evaluation of GANES 91 5.3.1 Description of Data 91 5.3.2 Outlines of Experiments 91 5.3.3 Preliminary Experimental Results of Various GANs 95 5.3.4 Experiments for the Effectiveness of Embedding Severity DGA Level 99 5.4 Summary and Discussion 105 Chapter 6 Conclusion 106 6.1 Contributions and Significance 106 6.2 Suggestions for Future Research 108 References 110 국문 초록 127박

SNU Open Repository and Archive

Smart Environments for Collaborative Design, Implementation, and Interpretation of Scientific Experiments

Author: Breit Timo
Fikkert F.W.
Kulyk Olga Anatoliyivna
Rauwerda Han
van der Veer Gerrit C.
van der Vet P.E.
van Dijk Elisabeth M.A.G.
Wassink I.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 06/01/2007
Field of study

University of Twente Research Information

The Effect of Data Curation on the Accuracy of Quantitative Structure-Activity Relationship Models

Author: Fant Andrew
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

In the 33 years since the first public release of GenBank, and the 15 years since the publication of the first pilot assembly of the human genome, drug discovery has been awash in a tsunami of data. But it has only been within the past decade that medicinal chemists and chemical biologists have had access to the same sorts of large-scale, public-access databases as bioinformaticians and molecular biologists have had for so long. The release of this data has sparked a renewed interest in computational methods for rational drug design, but questions have arisen recently about the accuracy and quality of this data. The same question has arisen in other scientific disciplines, but it has a particular urgency to practitioners of Quantitative Structure-Activity Relationship (QSAR) modeling. By its nature QSAR modeling depends on both activity data and chemical structures. While activities are usually expressed as numerical scalar values, a form ubiquitous throughout the sciences, chemical structures (especially that must be interpretable as such by computer software) are stored in a variety of specialized formats which are much less common and mostly ignored outside of cheminformatics and related fields. While previous research has determined that a 5% error rate in data being used for modeling can cause a QSAR model to be non-predictive and useless for its intended purpose, and workflows have been proposed which reduce the effect of inconsistent chemical structure representations on model accuracy, a fundamental question remains: “how accurate are the structure and activity data freely available to researchers?” To this end, we have undertaken two surveys of data quality, one focusing on chemical structure information in Internet resources and a second examining the uncertainty associated with compounds reported in the medicinal chemistry literature as abstracted in ChEMBL. The results of these studies have informed the creation of an improved workflow for the curation of structure-activity data which is intended to identify problematic data points in raw data extracted from databases so that an expert human curator can examine the underlying literature and resolve discrepancies between reported values. This workflow was in turn applied to the creation of two QSAR models that were used to implement a virtual screen seeking molecules capable of binding to both the serotonergic reuptake transporter and the alpha2a adrenergic receptor. While no suitable compounds were identified in the initial screening process, regions of chemical space that may yield truly novel alpha 2a receptor ligands have been identified. These regions can be targeted in future efforts. Basing data curation workflows on manual processes by human curators is not particularly viable, as humans have a tendency to introduce errors by inattention even as they identify and repair other problems. Computers cannot effectively curate data either. While they are highly accurate when programmed properly, they lack human creativity and insight that would allow them to determine which data points represent truly inaccurate information. In order to effectively curate data, humans and computers must both be incorporated into a workflow that harnesses their strengths and limits their liabilities.Doctor of Philosoph

Carolina Digital Repository