5 research outputs found

    Content-Based Access Control

    Get PDF
    In conventional database, the most popular access control model specifies policies explicitly for each role of every user against each data object manually. Nowadays, in large-scale content-centric data sharing, conventional approaches could be impractical due to exponential explosion of the data growth and the sensitivity of data objects. What's more, conventional database access control policy will not be functional when the semantic content of data is expected to play a role in access decisions. Users are often over-privileged, and ex post facto auditing is enforced to detect misuse of the privileges. Unfortunately, it is usually difficult to reverse the damage, as (large amount of) data has been disclosed already. In this dissertation, we first introduce Content-Based Access Control (CBAC), an innovative access control model for content-centric information sharing. As a complement to conventional access control models, the CBAC model makes access control decisions based on the content similarity between user credentials and data content automatically. In CBAC, each user is allowed by a metarule to access "a subset" of the designated data objects of a content-centric database, while the boundary of the subset is dynamically determined by the textual content of data objects. We then present an enforcement mechanism for CBAC that exploits Oracles Virtual Private Database (VPD) to implement a row-wise access control and to prevent data objects from being abused by unnecessary access admission. To further improve the performance of the proposed approach, we introduce a content-based blocking mechanism to improve the efficiency of CBAC enforcement to further reveal a more relevant part of the data objects comparing with only using the user credentials and data content. We also utilized several tagging mechanisms for more accurate textual content matching for short text snippets (e.g. short VarChar attributes) to extract topics other than pure word occurrences to represent the content of data. In the tagging mechanism, the similarity of content is calculated not purely dependent on the word occurrences but the semantic topics underneath the text content. Experimental results show that CBAC makes accurate access control decisions with a small overhead

    A Human-Centric Approach to Data Fusion in Post-Disaster Managment: The Development of a Fuzzy Set Theory Based Model

    Get PDF
    It is critical to provide an efficient and accurate information system in the post-disaster phase for individuals\u27 in order to access and obtain the necessary resources in a timely manner; but current map based post-disaster management systems provide all emergency resource lists without filtering them which usually leads to high levels of energy consumed in calculation. Also an effective post-disaster management system (PDMS) will result in distribution of all emergency resources such as, hospital, storage and transportation much more reasonably and be more beneficial to the individuals in the post disaster period. In this Dissertation, firstly, semi-supervised learning (SSL) based graph systems was constructed for PDMS. A Graph-based PDMS\u27 resource map was converted to a directed graph that presented by adjacent matrix and then the decision information will be conducted from the PDMS by two ways, one is clustering operation, and another is graph-based semi-supervised optimization process. In this study, PDMS was applied for emergency resource distribution in post-disaster (responses phase), a path optimization algorithm based ant colony optimization (ACO) was used for minimizing the cost in post-disaster, simulation results show the effectiveness of the proposed methodology. This analysis was done by comparing it with clustering based algorithms under improvement ACO of tour improvement algorithm (TIA) and Min-Max Ant System (MMAS) and the results also show that the SSL based graph will be more effective for calculating the optimization path in PDMS. This research improved the map by combining the disaster map with the initial GIS based map which located the target area considering the influence of disaster. First, all initial map and disaster map will be under Gaussian transformation while we acquired the histogram of all map pictures. And then all pictures will be under discrete wavelet transform (DWT), a Gaussian fusion algorithm was applied in the DWT pictures. Second, inverse DWT (iDWT) was applied to generate a new map for a post-disaster management system. Finally, simulation works were proposed and the results showed the effectiveness of the proposed method by comparing it to other fusion algorithms, such as mean-mean fusion and max-UD fusion through the evaluation indices including entropy, spatial frequency (SF) and image quality index (IQI). Fuzzy set model were proposed to improve the presentation capacity of nodes in this GIS based PDMS

    λΉ„ν‘œμ§€ κ³ μž₯ 데이터와 μœ μ€‘κ°€μŠ€λΆ„μ„λ°μ΄ν„°λ₯Ό μ΄μš©ν•œ λ”₯λŸ¬λ‹κΈ°λ°˜ μ£Όλ³€μ••κΈ° κ³ μž₯진단 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 기계항곡곡학뢀, 2021.8. μ†Œμž¬μ›….μ˜€λŠ˜λ‚  μ‚°μ—…μ˜ κΈ‰μ†ν•œ λ°œμ „κ³Ό κ³ λ„ν™”λ‘œ 인해 μ•ˆμ „ν•˜κ³  μ‹ λ’°ν•  수 μžˆλŠ” μ „λ ₯ 계톡에 λŒ€ν•œ μˆ˜μš”λŠ” λ”μš± μ€‘μš”ν•΄μ§€κ³  μžˆλ‹€. λ”°λΌμ„œ μ‹€μ œ μ‚°μ—… ν˜„μž₯μ—μ„œλŠ” μ£Όλ³€μ••κΈ°μ˜ μ•ˆμ „ν•œ μž‘λ™μ„ μœ„ν•΄ μƒνƒœλ₯Ό μ •ν™•ν•˜κ²Œ 진단할 수 μžˆλŠ” prognostics and health management (PHM)와 같은 기술이 ν•„μš”ν•˜λ‹€. μ£Όλ³€μ••κΈ° 진단을 μœ„ν•΄ 개발된 λ‹€μ–‘ν•œ 방법 쀑 인곡지λŠ₯(AI) 기반 접근법은 μ‚°μ—…κ³Ό ν•™κ³„μ—μ„œ λ§Žμ€ 관심을 λ°›κ³  μžˆλ‹€. λ”μš±μ΄ λ°©λŒ€ν•œ 데이터와 ν•¨κ»˜ 높은 μ„±λŠ₯을 λ‹¬μ„±ν•˜λŠ” λ”₯ λŸ¬λ‹ κΈ°μˆ μ€ μ£Όλ³€μ••κΈ° κ³ μž₯ μ§„λ‹¨μ˜ ν•™μžλ“€μ—κ²Œ 높은 관심을 κ°–κ²Œ 해쀬닀. κ·Έ μ΄μœ λŠ” λ”₯ λŸ¬λ‹ 기술이 μ‹œμŠ€ν…œμ˜ 도메인 지식을 깊이 이해할 ν•„μš” 없이 λŒ€λŸ‰μ˜ λ°μ΄ν„°λ§Œ 주어진닀면 λ³΅μž‘ν•œ μ‹œμŠ€ν…œμ΄λΌλ„ μ‚¬μš©μžμ˜ λͺ©μ μ— 맞게 κ·Έ 해닡을 찾을 수 있기 λ•Œλ¬Έμ— λ”₯ λŸ¬λ‹μ— λŒ€ν•œ 관심은 μ£Όλ³€μ••κΈ° κ³ μž₯ 진단 λΆ„μ•Όμ—μ„œ 특히 λ‘λ“œλŸ¬μ‘Œλ‹€. κ·ΈλŸ¬λ‚˜, μ΄λŸ¬ν•œ λ›°μ–΄λ‚œ 진단 μ„±λŠ₯은 아직 μ‹€μ œ μ£Όλ³€μ••κΈ° μ‚°μ—…μ—μ„œλŠ” λ§Žμ€ 관심을 μ–»κ³  μžˆμ§€λŠ” λͺ»ν•œ κ²ƒμœΌλ‘œ μ•Œλ €μ‘Œλ‹€. κ·Έ μ΄μœ λŠ” μ‚°μ—…ν˜„μž₯의 λΉ„ν‘œμ§€λ°μ΄ν„°μ™€ μ†ŒλŸ‰μ˜ κ³ μž₯데이터 λ•Œλ¬Έμ— μš°μˆ˜ν•œ λ”₯λŸ¬λ‹κΈ°λ°˜μ˜ κ³ μž₯ 진단 λͺ¨λΈλ“€μ„ κ°œλ°œν•˜κΈ° μ–΄λ ΅λ‹€. λ”°λΌμ„œ λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œλŠ” μ£Όλ³€μ••κΈ° μ‚°μ—…μ—μ„œ ν˜„μž¬ λŒ€λ‘λ˜κ³  μžˆλŠ” 세가지 이슈λ₯Ό μ—°κ΅¬ν•˜μ˜€λ‹€. 1) 건전성 평면 μ‹œκ°ν™” 이슈, 2) 데이터 λΆ€μ‘± 이슈, 3) 심각도 이슈 듀을 κ·Ήλ³΅ν•˜κΈ° μœ„ν•œ λ”₯ λŸ¬λ‹ 기반 κ³ μž₯ 진단 연ꡬλ₯Ό μ§„ν–‰ν•˜μ˜€λ‹€. μ†Œκ°œλœ 세가지 μ΄μŠˆλ“€μ„ κ°œμ„ ν•˜κΈ° μœ„ν•΄ λ³Έ ν•™μœ„λ…Όλ¬Έμ€ μ„Έ 가지 연ꡬλ₯Ό μ œμ•ˆν•˜μ˜€λ‹€. 첫 번째 μ—°κ΅¬λŠ” 보쑰 감지 μž‘μ—…μ΄ μžˆλŠ” 쀀지도 μžλ™ 인코더λ₯Ό 톡해 건전성 평면을 μ œμ•ˆν•˜μ˜€λ‹€. μ œμ•ˆλœ 방법은 λ³€μ••κΈ° μ—΄ν•˜ νŠΉμ„±μ„ μ‹œκ°ν™” ν•  수 μžˆλ‹€. λ˜ν•œ, 쀀지도 접근법을 ν™œμš©ν•˜κΈ° λ•Œλ¬Έμ— λ°©λŒ€ν•œ λΉ„ν‘œμ§€λ°μ΄ν„° 그리고 μ†Œμˆ˜μ˜ ν‘œμ§€λ°μ΄ν„°λ§ŒμœΌλ‘œ κ΅¬ν˜„λ  수 μžˆλ‹€. μ œμ•ˆλ°©λ²•μ€ μ£Όλ³€μ••κΈ° 건전성을 건전성 평면과 ν•¨κ»˜ μ‹œκ°ν™”ν•˜κ³ , 맀우 적은 μ†Œμˆ˜μ˜ λ ˆμ΄λΈ” λ°μ΄ν„°λ§ŒμœΌλ‘œ μ£Όλ³€μ••κΈ° κ³ μž₯을 μ§„λ‹¨ν•œλ‹€. 두 번째 μ—°κ΅¬λŠ” κ·œμΉ™ 기반 Duval 방법을 AI 기반 deep neural network (DNN)κ³Ό μœ΅ν•©(bridge)ν•˜λŠ” μƒˆλ‘œμš΄ ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ•ˆν•˜μ˜€λ‹€. 이 방법은 룰기반의 Duval을 μ‚¬μš©ν•˜μ—¬ λΉ„ν‘œμ§€λ°μ΄ν„°λ₯Ό μˆ˜λ„ λ ˆμ΄λΈ”λ§ν•œλ‹€ (pseudo-labeling). λ˜ν•œ, AI 기반 DNN은 μ •κ·œν™” 기술과 맀개 λ³€μˆ˜ 전이 ν•™μŠ΅μ„ μ μš©ν•˜μ—¬ λ…Έμ΄μ¦ˆκ°€ μžˆλŠ” pseudo-label 데이터λ₯Ό ν•™μŠ΅ν•˜λŠ”λ° μ‚¬μš©λœλ‹€. 개발된 κΈ°μˆ μ€ λ°©λŒ€ν•œμ–‘μ˜ λΉ„ν‘œμ§€λ°μ΄ν„°λ₯Ό 룰기반으둜 일차적으둜 μ§„λ‹¨ν•œ 결과와 μ†Œμˆ˜μ˜ μ‹€μ œ κ³ μž₯데이터와 ν•¨κ»˜ ν•™μŠ΅λ°μ΄ν„°λ‘œ ν›ˆλ ¨ν•˜μ˜€μ„ λ•Œ 기쑴의 진단 방법보닀 획기적인 ν–₯상을 κ°€λŠ₯μΌ€ ν•œλ‹€. 끝으둜, μ„Έ 번째 μ—°κ΅¬λŠ” κ³ μž₯ νƒ€μž…μ„ 진단할 뿐만 μ•„λ‹ˆλΌ 심각도 λ˜ν•œ μ§„λ‹¨ν•˜λŠ” κΈ°μˆ μ„ μ œμ•ˆν•˜μ˜€λ‹€. μ΄λ•Œ 두 μƒνƒœμ˜ λ ˆμ΄λΈ”λ§λœ κ³ μž₯ νƒ€μž…κ³Ό 심각도 μ‚¬μ΄μ—λŠ” λΆˆκ· μΌν•œ 데이터 λΆ„ν¬λ‘œ 이루어져 μžˆλ‹€. κ·Έ μ΄μœ λŠ” μ‹¬κ°λ„μ˜ 경우 λ ˆμ΄λΈ”λ§μ΄ 항상 λ˜μ–΄ μžˆμ§€λ§Œ κ³ μž₯ νƒ€μž…μ˜ κ²½μš°λŠ” μ‹€μ œ μ£Όλ³€μ••κΈ°λ‘œλΆ€ν„° κ³ μž₯ νƒ€μž… 데이터λ₯Ό μ–»κΈ°κ°€ 맀우 μ–΄λ ΅κΈ° λ•Œλ¬Έμ΄λ‹€. λ”°λΌμ„œ, λ³Έ μ—°κ΅¬μ—μ„œ μ„Έλ²ˆμ§Έλ‘œ κ°œλ°œν•œ κΈ°μˆ μ€ μ˜€λŠ˜λ‚  데이터 생성에 맀우 μš°μˆ˜ν•œ μ„±λŠ₯을 λ‹¬μ„±ν•˜κ³  μžˆλŠ” generative adversarial network (GAN)λ₯Ό 톡해 λΆˆκ· ν˜•ν•œ 두 μƒνƒœλ₯Ό 균일화 μž‘μ—…μ„ μˆ˜ν–‰ν•˜λŠ” λ™μ‹œμ— κ³ μž₯ λͺ¨λ“œμ™€ 심각도λ₯Ό μ§„λ‹¨ν•˜λŠ” λͺ¨λΈμ„ κ°œλ°œν•˜μ˜€λ‹€.Due to the rapid development and advancement of today’s industry, the demand for safe and reliable power distribution and transmission lines is becoming more critical; thus, prognostics and health management (hereafter, PHM) is becoming more important in the power transformer industry. Among various methods developed for power transformer diagnosis, the artificial intelligence (AI) based approach has received considerable interest from academics. Specifically, deep learning technology, which offers excellent performance when used with vast amounts of data, is also rapidly gaining the spotlight in the academic field of transformer fault diagnosis. The interest in deep learning has been especially noticed in the field of fault diagnosis, because deep learning algorithms can be applied to complex systems that have large amounts of data, without the need for a deep understanding of the domain knowledge of the system. However, the outstanding performance of these diagnosis methods has not yet gained much attention in the power transformer PHM industry. The reason is that a large amount of unlabeled and a small amount of fault data always restrict their deep-learning-based diagnosis methods in the power transformer PHM industry. Therefore, in this dissertation research, deep-learning-based fault diagnosis methods are developed to overcome three issues that currently prevent this type of diagnosis in industrial power transformers: 1) the visualization of health feature space issue, 2) the insufficient data issue, and 3) the severity issue. To cope with these challenges, this thesis is composed of three research thrusts. The first research thrust develops a health feature space via a semi-supervised autoencoder with an auxiliary detection task. The proposed method can visualize a monotonic health trendability of the transformer’s degradation properties. Further, thanks to the use of a semi-supervised approach, the method is applicable to situations with a large amount of unlabeled and a small amount labeled data (a situation common in industrial datasets). Next, the second research thrust proposes a new framework, that bridges the rule-based Duval method with an AI-based deep neural network (BDD). In this method, the rule-based Duval method is utilized to pseudo-label a large amount of unlabeled data. Furthermore, the AI-based DNN is used to apply regularization techniques and parameter transfer learning to learn the noisy pseudo-labelled data. Finally, the third thrust not only identifies fault types but also indicates a severity level. However, the balance between labeled fault types and the severity level is imbalanced in real-world data. Therefore, in the proposed method, diagnosis of fault types – with severity levels – under imbalanced conditions is addressed by utilizing a generative adversarial network with an auxiliary classifier. The validity of the proposed methods is demonstrated by studying massive unlabeled dissolved gas analysis (DGA) data, provided by the Korea Electric Power Company (KEPCO), and sparse labeled data, provided by the IEC TC 10 database. Each developed method could be used in industrial fields that use power transformers to monitor the health feature space, consider severity level, and diagnose transformer faults under extremely insufficient labeled fault data.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Scope and Overview 4 1.3 Dissertation Layout 7 Chapter 2 Literature Review 9 2.1 A Brief Overview of Rule-Based Fault Diagnosis 9 2.2 A Brief Overview of Conventional AI-Based Fault Diagnosis 11 Chapter 3 Extracting Health Feature Space via Semi-Supervised Autoencoder with an Auxiliary Task (SAAT) 13 3.1 Backgrounds of Semi-supervised autoencoder (SSAE) 15 3.1.1 Autoencoder: Unsupervised Feature Extraction 15 3.1.2 Softmax Classifier: Supervised Classification 17 3.1.3 Semi-supervised Autoencoder 18 3.2 Input DGA Data Preprocessing 20 3.3 SAAT-Based Fault Diagnosis Method 21 3.3.1 Roles of the Auxiliary Detection Task 23 3.3.2 Architecture of the Proposed SAAT 27 3.3.3 Health Feature Space Visualization 29 3.3.4 Overall Procedure of the Proposed SAAT-based Fault Diagnosis 30 3.4 Performance Evaluation of SAAT 31 3.4.1 Data Description and Implementation 31 3.4.2 An Outline of Four Comparative Studies and Quantitative Evaluation Metrics 33 3.4.3 Experimental Results and Discussion 36 3.5 Summary and Discussion 49 Chapter 4 Learning from Even a Weak Teacher: Bridging Rule-based Duval Weak Supervision and a Deep Neural Network (BDD) for Diagnosing Transformer 51 4.1 Backgrounds of BDD 53 4.1.1 Rule-based method: Duval Method 53 4.1.2 Deep learning Based Method: Deep Neural Network 54 4.1.3 Parameter Transfer 55 4.2 BDD Based Fault Diagnosis 56 4.2.1 Problem Statement 56 4.2.2 Framework of the Proposed BDD 57 4.2.3 Overall Procedure of BDD-based Fault Diagnosis 63 4.3 Performance Evaluation of the BDD 64 4.3.1 Description of Data and the DNN Architecture 64 4.3.2 Experimental Results and Discussion 66 4.4 Summary and Discussion 76 Chapter 5 Generative Adversarial Network with Embedding Severity DGA Level 79 5.1 Backgrounds of Generative Adversarial Network 81 5.2 GANES based Fault Diagnosis 82 5.2.1 Training Strategy of GANES 82 5.2.2 Overall procedure of GANES 87 5.3 Performance Evaluation of GANES 91 5.3.1 Description of Data 91 5.3.2 Outlines of Experiments 91 5.3.3 Preliminary Experimental Results of Various GANs 95 5.3.4 Experiments for the Effectiveness of Embedding Severity DGA Level 99 5.4 Summary and Discussion 105 Chapter 6 Conclusion 106 6.1 Contributions and Significance 106 6.2 Suggestions for Future Research 108 References 110 κ΅­λ¬Έ 초둝 127λ°•

    The Effect of Data Curation on the Accuracy of Quantitative Structure-Activity Relationship Models

    Get PDF
    In the 33 years since the first public release of GenBank, and the 15 years since the publication of the first pilot assembly of the human genome, drug discovery has been awash in a tsunami of data. But it has only been within the past decade that medicinal chemists and chemical biologists have had access to the same sorts of large-scale, public-access databases as bioinformaticians and molecular biologists have had for so long. The release of this data has sparked a renewed interest in computational methods for rational drug design, but questions have arisen recently about the accuracy and quality of this data. The same question has arisen in other scientific disciplines, but it has a particular urgency to practitioners of Quantitative Structure-Activity Relationship (QSAR) modeling. By its nature QSAR modeling depends on both activity data and chemical structures. While activities are usually expressed as numerical scalar values, a form ubiquitous throughout the sciences, chemical structures (especially that must be interpretable as such by computer software) are stored in a variety of specialized formats which are much less common and mostly ignored outside of cheminformatics and related fields. While previous research has determined that a 5% error rate in data being used for modeling can cause a QSAR model to be non-predictive and useless for its intended purpose, and workflows have been proposed which reduce the effect of inconsistent chemical structure representations on model accuracy, a fundamental question remains: β€œhow accurate are the structure and activity data freely available to researchers?” To this end, we have undertaken two surveys of data quality, one focusing on chemical structure information in Internet resources and a second examining the uncertainty associated with compounds reported in the medicinal chemistry literature as abstracted in ChEMBL. The results of these studies have informed the creation of an improved workflow for the curation of structure-activity data which is intended to identify problematic data points in raw data extracted from databases so that an expert human curator can examine the underlying literature and resolve discrepancies between reported values. This workflow was in turn applied to the creation of two QSAR models that were used to implement a virtual screen seeking molecules capable of binding to both the serotonergic reuptake transporter and the alpha2a adrenergic receptor. While no suitable compounds were identified in the initial screening process, regions of chemical space that may yield truly novel alpha 2a receptor ligands have been identified. These regions can be targeted in future efforts. Basing data curation workflows on manual processes by human curators is not particularly viable, as humans have a tendency to introduce errors by inattention even as they identify and repair other problems. Computers cannot effectively curate data either. While they are highly accurate when programmed properly, they lack human creativity and insight that would allow them to determine which data points represent truly inaccurate information. In order to effectively curate data, humans and computers must both be incorporated into a workflow that harnesses their strengths and limits their liabilities.Doctor of Philosoph
    corecore