196,923 research outputs found

    Evaluating machine translation in a low-resource language combination : Spanish-Galician

    Get PDF
    This paper reports the results of a study designed to assess the perception of adequacy of three different types of machine translation systems within the context of a minoritized language combination (Spanish-Galician). To perform this evaluation, a mixed design with three different metrics (BLEU, survey and error analysis) is used to extract quantitative and qualitative data about two marketing letters from the energy industry translated with a rulebased system (RBMT), a phrase-based system (PBMT) and a neural system (NMT). Results show that in the case of low-resource languages rule-based and phrase-based machine translations systems still play an important role

    A comparison of CK and Martin's package metric suites in predicting reusability in open source object-oriented software

    Get PDF
    Packages are units that organize source code in large object-oriented systems. Metrics used at the package granularity level mostly characterize attributes such as complexity, size, cohesion and coupling. Many of these characterized attributes have direct relationships with the quality of the software system being produced. Empirical evidence is required to support the collection of measures for such metrics; hence these metrics are used as early indicators of such important external quality attributes. This research compared the CK and Martinโ€™s package metric suites in order to characterize the package reusability level in object-oriented software. Comparing the package level of metrics suites as they measure an external software quality attribute is supposed to help a developer knows which metric suite can be used to effectively predict the software quality attribute at package level. In this research two open source Java applications, namely; jEdit and BlueJ were used in the evaluation of two package metrics suites, and were compared empirically to predict the package reusability level. The metric measures were also used to compare the effectiveness of the metrics in these package metrics suites in evaluating the reusability at the package granularity level. Thereafter metric measures of each package were normalized to allow for the comparison of the package reusability level among packages within an application. The Bansiya reusability model equation was adapted as a reusability reference quality model in this research work. Correlation analysis was performed to help compare the metrics within package metrics suites. Through the ranking of the package reusability level, results show that the jEdit application has 30% of its packages ranked with a very high reusability level, thus conformed to the Pareto rule (80:20). This means that the jEdit application has packages that are more reusable than packages in the BlueJ application. Empirically, the Martinโ€™s package coupling metric Ce with an r value of 0.68, is ranked as having a positive strong correlation with RL, and this has distinguished the Martinโ€™s package metrics suite as an effective predictor of package reusability level from the CK package metrics suite

    Characterizing realistic signature-based intrusion detection benchmarks

    Get PDF
    ยฉ 2018 Association for Computing Machinery. Speeding up pattern matching for intrusion detection systems has been a growing field of research. There has been a flux of new algorithms, modifications to existing algorithms and even hardware architectures aimed at improving pattern matching performance. Establishing an accurate comparison to related work is a real challenge because researchers use different datasets and metrics to evaluate their work. The purpose of this paper is to characterize and identify realistic workloads, propose standard benchmarks, and establish common metrics to better compare work in the area of pattern matching for intrusion detection. We collect traffic traces and attack signatures from popular open source platforms. The datasets are processed, cleansed and studied, to give the researchers a better understanding of their characteristics. The final datasets along with detailed information about their origins, contents, features, statistical analysis and performance evaluation using well-known pattern-matching algorithms are available to the public. In addition, we provide a generic parser capable of parsing different intrusion detection systems rule formats and extract attack signatures. Finally, a pattern-matching engine that enables researchers to plug-and-play their new pattern matching algorithms and compare to existing algorithms using the predefined metrics

    Customer Sentiments in Product Reviews: A Comparative Study with GooglePaLM

    Get PDF
    In this work, we evaluated the efficacy of Googleโ€™s Pathways Language Model (GooglePaLM) in analyzing sentiments expressed in product reviews. Although conventional Natural Language Processing (NLP) techniques such as the rule-based Valence Aware Dictionary for Sentiment Reasoning (VADER) and the long sequence Bidirectional Encoder Representations from Transformers (BERT) model are effective, they frequently encounter difficulties when dealing with intricate linguistic features like sarcasm and contextual nuances commonly found in customer feedback. We performed a sentiment analysis on Amazonโ€™s fashion review datasets using the VADER, BERT, and GooglePaLM models, respectively, and compared the results based on evaluation metrics such as precision, recall, accuracy correct positive prediction, and correct negative prediction. We used the default values of the VADER and BERT models and slightly finetuned GooglePaLM with a Temperature of 0.0 and an N-value of 1. We observed that GooglePaLM performed better with correct positive and negative prediction values of 0.91 and 0.93, respectively, followed by BERT and VADER. We concluded that large language models surpass traditional rule-based systems for natural language processing tasks

    ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.์ตœ๊ทผ์˜ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์˜ ํ™œ์šฉ์„ ํ†ตํ•œ ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ๋“ค์˜ ๋ฐœ์ „์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์‹œ์Šคํ…œ๋“ค์˜ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ์€ ์—ฌ์ „ํžˆ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ํŽธ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” n-๊ทธ๋žจ ๊ธฐ๋ฐ˜ ์œ ์‚ฌ์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์€ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ ํ‰๊ฐ€์— ๋งค์šฐ ์ทจ์•ฝํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์‚ฌ์‹ค ์ผ๊ด€๋œ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € ์‹œ์Šคํ…œ์˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋™ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด, ์ด์ „ ํ‰๊ฐ€ ๊ธฐ๋ฒ•๋ณด๋‹ค ์‚ฌ์‹ค ๊ด€๊ณ„ ์ผ๊ด€์„ฑ ํ‰๊ฐ€์—์„œ ์ธ๊ฐ„์˜ ํŒ๋‹จ๊ณผ ๋งค์šฐ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” 4๊ฐ€์ง€ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•๋“ค์€ (1) ๋ณด์กฐ ํƒœ์Šคํฌ ํ™œ์šฉ ๋ฐ (2) ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ• ๋“ฑ์„ ํ™œ์šฉํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์šฐ๋ฆฌ๋Š” ์ค‘์š”ํ•œ ํ•ต์‹ฌ ๋‹จ์–ด๋˜๋Š” ํ•ต์‹ฌ ๊ตฌ๋ฌธ์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๋ณด์กฐ ํƒœ์Šคํฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋จผ์ € ํ•ต์‹ฌ ๊ตฌ๋ฌธ์˜ ๊ฐ€์ค‘์น˜ ์˜ˆ์ธก ํƒœ์Šคํฌ๋ฅผ ์ด์ „ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์— ๊ฒฐํ•ฉํ•˜์—ฌ ์ฃผ๊ด€์‹ ์งˆ์˜ ์‘๋‹ต์„ ์œ„ํ•œ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์งˆ์˜ ์ƒ์„ฑ ๋ฐ ์‘๋‹ต์„ ํ™œ์šฉํ•˜์—ฌ ํ‚ค์›Œ๋“œ์— ๋Œ€ํ•œ ์งˆ์˜๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฏธ์ง€์™€ ์บก์…˜์— ๋Œ€ํ•œ ์งˆ๋ฌธ์˜ ๋‹ต์„ ๋น„๊ตํ•˜์—ฌ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ์„ ํ™•์ธํ•˜๋Š” QACE๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋‘˜์งธ๋กœ, ์šฐ๋ฆฌ๋Š” ๋ณด์กฐ ํƒœ์Šคํฌ ํ™œ์šฉ๊ณผ ๋‹ฌ๋ฆฌ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ์‹์˜ ํ•™์Šต์„ ํ†ตํ•ด ๋‘ ๊ฐ€์ง€์˜ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” ์ฆ๊ฐ•๋œ ์ผ๊ด€์„ฑ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ผ๊ด€์„ฑ ์žˆ๋Š” ํ…์ŠคํŠธ์™€ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ›ˆ๋ จํ•œ๋‹ค. ๋จผ์ € ๊ทœ์น™ ๊ธฐ๋ฐ˜ ๋ณ€ํ˜•์„ ํ†ตํ•œ ๋ถˆ์ผ์น˜ ์บก์…˜ ์ƒ์„ฑ์œผ๋กœ ์ด๋ฏธ์ง€ ์บก์…˜ ํ‰๊ฐ€ ์ง€ํ‘œ UMIC์„ ์ œ์•ˆํ•œ๋‹ค. ๋‹ค์Œ ๋‹จ๊ณ„๋กœ, ๋งˆ์Šคํ‚น๋œ ์†Œ์Šค์™€ ๋งˆ์Šคํ‚น๋œ ์š”์•ฝ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๊ด€์„ฑ์ด ์—†๋Š” ์š”์•ฝ์„ ์ƒ์„ฑํ•˜๋Š” MFMA๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ• ๊ฐœ๋ฐœ์˜ ํ™•์žฅ์œผ๋กœ, ์‹œ์Šคํ…œ์˜ ์‚ฌ์‹ค ๊ด€๊ณ„ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋น ๋ฅธ ์‚ฌํ›„ ๊ต์ • ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค.Despite the recent advances of conditional text generation systems leveraged from pre-trained language models, factual consistency of the systems are still not sufficient. However, widely used n-gram similarity metrics are vulnerable to evaluate the factual consistency. Hence, in order to develop a factual consistent system, an automatic factuality metric is first necessary. In this dissertation, we propose four metrics that show very higher correlation with human judgments than previous metrics in evaluating factual consistency, for diverse conditional text generation systems. To build such metrics, we utilize (1) auxiliary tasks and (2) data augmentation methods. First, we focus on the keywords or keyphrases that are critical for evaluating factual consistency and propose two factual consistency metrics using two different auxiliary tasks. We first integrate the keyphrase weights prediction task to the previous metrics to propose a KPQA (Keyphrase Prediction for Question Answering)-metric for generative QA. Also, we apply question generation and answering to develop a captioning metric QACE (Question Answering for Captioning Evaluation). QACE generates questions on the keywords of the candidate. QACE checks the factual consistency by comparing the answers of these questions for the source image and the caption. Secondly, different from using auxiliary tasks, we directly train a metric with a data-driven approach to propose two metrics. Specifically, we train a metric to distinguish augmented inconsistent texts with the consistent text. We first modify the original reference captions to generate inconsistent captions using several rule-based methods such as substituting keywords to propose UMIC (Unreferenced Metric for Image Captioning). As a next step, we introduce a MFMA (Mask-and-Fill with Masked-Article)-metric by generating inconsistent summary using the masked source and the masked summary. Finally, as an extension of developing data-driven factual consistency metrics, we also propose a faster post-editing system that can fix the factual errors in the system.1 Introduction 1 2 Background 10 2.1 Text Evaluation Metrics 10 2.1.1 N-gram Similarity Metrics 10 2.1.2 Embedding Similarity Metrics 12 2.1.3 Auxiliary Task Based Metrics 12 2.1.4 Entailment Based Metrics 13 2.2 Evaluating Automated Metrics 14 3 Integrating Keyphrase Weights for Factual Consistency Evaluation 15 3.1 Related Work 17 3.2 Proposed Approach: KPQA-Metric 18 3.2.1 KPQA 18 3.2.2 KPQA Metric 19 3.3 Experimental Setup and Dataset 23 3.3.1 Dataset 23 3.3.2 Implementation Details 26 3.4 Empirical Results 27 3.4.1 Comparison with Other Methods 27 3.4.2 Analysis 29 3.5 Conclusion 35 4 Question Generation and Question Answering for Factual Consistency Evaluation 36 4.1 Related Work 37 4.2 Proposed Approach: QACE 38 4.2.1 Question Generation 38 4.2.2 Question Answering 39 4.2.3 Abstractive Visual Question Answering 40 4.2.4 QACE Metric 42 4.3 Experimental Setup and Dataset 43 4.3.1 Dataset 43 4.3.2 Implementation Details 44 4.4 Empirical Results 45 4.4.1 Comparison with Other Methods 45 4.4.2 Analysis 46 4.5 Conclusion 48 5 Rule-Based Inconsistent Data Augmentation for Factual Consistency Evaluation 49 5.1 Related Work 51 5.2 Proposed Approach: UMIC 52 5.2.1 Modeling 52 5.2.2 Negative Samples 53 5.2.3 Contrastive Learning 55 5.3 Experimental Setup and Dataset 56 5.3.1 Dataset 56 5.3.2 Implementation Details 60 5.4 Empirical Results 61 5.4.1 Comparison with Other Methods 61 5.4.2 Analysis 62 5.5 Conclusion 65 6 Inconsistent Data Augmentation with Masked Generation for Factual Consistency Evaluation 66 6.1 Related Work 68 6.2 Proposed Approach: MFMA and MSM 70 6.2.1 Mask-and-Fill with Masked Article 71 6.2.2 Masked Summarization 72 6.2.3 Training Factual Consistency Checking Model 72 6.3 Experimental Setup and Dataset 73 6.3.1 Dataset 73 6.3.2 Implementation Details 74 6.4 Empirical Results 75 6.4.1 Comparison with Other Methods 75 6.4.2 Analysis 78 6.5 Conclusion 84 7 Factual Error Correction for Improving Factual Consistency 85 7.1 Related Work 87 7.2 Proposed Approach: RFEC 88 7.2.1 Problem Formulation 88 7.2.2 Training Dataset Construction 89 7.2.3 Evidence Sentence Retrieval 90 7.2.4 Entity Retrieval Based Factual Error Correction 90 7.3 Experimental Setup and Dataset 92 7.3.1 Dataset 92 7.3.2 Implementation Details 93 7.4 Empirical Results 93 7.4.1 Comparison with Other Methods 93 7.4.2 Analysis 95 7.5 Conclusion 95 8 Conclusion 97 Abstract (In Korean) 118๋ฐ•

    Perceived Information Revisited

    Get PDF
    In this study, we present new analytical metrics for evaluating the performance of side-channel attacks (SCAs) by revisiting the perceived information (PI), which is defined using cross-entropy (CE). PI represents the amount of information utilized by a probability distribution that determines a distinguishing rule in SCA. Our analysis partially solves an important open problem in the performance evaluation of deep-learning based SCAs (DL-SCAs) that the relationship between neural network (NN) model evaluation metrics (such as accuracy, loss, and recall) and guessing entropy (GE)/success rate (SR) is unclear. We first theoretically show that the conventional CE/PI is non-calibrated and insufficient for evaluating the SCA performance, as it contains uncertainty in terms of SR. More precisely, we show that an infinite number of probability distributions with different CE/PI can achieve an identical SR. With the above analysis result, we present a modification of CE/PI, named effective CE/PI (ECE/EPI), to eliminate the above uncertainty. The ECE/EPI can be easily calculated for a given probability distribution and dataset, which would be suitable for DL-SCA. Using the ECE/EPI, we can accurately evaluate the SR hrough the validation loss in the training phase, and can measure the generalization of the NN model in terms of SR in the attack phase. We then analyze and discuss the proposed metrics regarding their relationship to SR, conditions of successful attacks for a distinguishing rule with a probability distribution, a statistic/asymptotic aspect, and the order of key ranks in SCA. Finally, we validate the proposed metrics through experimental attacks on masked AES implementations using DL-SCA

    Open source software maturity model based on linear regression and Bayesian analysis

    Get PDF
    Open Source Software (OSS) is widely used and is becoming a significant and irreplaceable part of the software engineering community. Today a huge number of OSS exist. This becomes a problem if one needs to choose from such a large pool of OSS candidates in the same category. An OSS maturity model that facilitates the software assessment and helps users to make a decision is needed. A few maturity models have been proposed in the past. However, the parameters in the model are assigned not based on experimental data but on human experiences, feelings and judgments. These models are subjective and can provide only limited guidance for the users at the best. This dissertation has proposed a quantitative and objective model which is built from the statistical perspective. In this model, seven metrics are chosen as criteria for OSS evaluation. A linear multiple-regression model is created to assign a final score based on these seven metrics. This final score provides a convenient and objective way for the users to make a decision. The coefficients in the linear multiple-regression model are calculated from 43 OSS. From the statistical perspective, these coefficients are considered random variables. The joint distribution of the coefficients is discussed based on Bayesian statistics. More importantly, an updating rule is established through Bayesian analysis to improve the joint distribution, and thus the objectivity of the coefficients in the linear multiple-regression model, according to new incoming data. The updating rule provides the model the ability to learn and improve itself continually

    Building an Expert System for Evaluation of Commercial Cloud Services

    Full text link
    Commercial Cloud services have been increasingly supplied to customers in industry. To facilitate customers' decision makings like cost-benefit analysis or Cloud provider selection, evaluation of those Cloud services are becoming more and more crucial. However, compared with evaluation of traditional computing systems, more challenges will inevitably appear when evaluating rapidly-changing and user-uncontrollable commercial Cloud services. This paper proposes an expert system for Cloud evaluation that addresses emerging evaluation challenges in the context of Cloud Computing. Based on the knowledge and data accumulated by exploring the existing evaluation work, this expert system has been conceptually validated to be able to give suggestions and guidelines for implementing new evaluation experiments. As such, users can conveniently obtain evaluation experiences by using this expert system, which is essentially able to make existing efforts in Cloud services evaluation reusable and sustainable.Comment: 8 page, Proceedings of the 2012 International Conference on Cloud and Service Computing (CSC 2012), pp. 168-175, Shanghai, China, November 22-24, 201

    Automating Cyber Analytics

    Get PDF
    Model based security metrics are a growing area of cyber security research concerned with measuring the risk exposure of an information system. These metrics are typically studied in isolation, with the formulation of the test itself being the primary finding in publications. As a result, there is a flood of metric specifications available in the literature but a corresponding dearth of analyses verifying results for a given metric calculation under different conditions or comparing the efficacy of one measurement technique over another. The motivation of this thesis is to create a systematic methodology for model based security metric development, analysis, integration, and validation. In doing so we hope to fill a critical gap in the way we view and improve a systemโ€™s security. In order to understand the security posture of a system before it is rolled out and as it evolves, we present in this dissertation an end to end solution for the automated measurement of security metrics needed to identify risk early and accurately. To our knowledge this is a novel capability in design time security analysis which provides the foundation for ongoing research into predictive cyber security analytics. Modern development environments contain a wealth of information in infrastructure-as-code repositories, continuous build systems, and container descriptions that could inform security models, but risk evaluation based on these sources is ad-hoc at best, and often simply left until deployment. Our goal in this work is to lay the groundwork for security measurement to be a practical part of the system design, development, and integration lifecycle. In this thesis we provide a framework for the systematic validation of the existing security metrics body of knowledge. In doing so we endeavour not only to survey the current state of the art, but to create a common platform for future research in the area to be conducted. We then demonstrate the utility of our framework through the evaluation of leading security metrics against a reference set of system models we have created. We investigate how to calibrate security metrics for different use cases and establish a new methodology for security metric benchmarking. We further explore the research avenues unlocked by automation through our concept of an API driven S-MaaS (Security Metrics-as-a-Service) offering. We review our design considerations in packaging security metrics for programmatic access, and discuss how various client access-patterns are anticipated in our implementation strategy. Using existing metric processing pipelines as reference, we show how the simple, modular interfaces in S-MaaS support dynamic composition and orchestration. Next we review aspects of our framework which can benefit from optimization and further automation through machine learning. First we create a dataset of network models labeled with the corresponding security metrics. By training classifiers to predict security values based only on network inputs, we can avoid the computationally expensive attack graph generation steps. We use our findings from this simple experiment to motivate our current lines of research into supervised and unsupervised techniques such as network embeddings, interaction rule synthesis, and reinforcement learning environments. Finally, we examine the results of our case studies. We summarize our security analysis of a large scale network migration, and list the friction points along the way which are remediated by this work. We relate how our research for a large-scale performance benchmarking project has influenced our vision for the future of security metrics collection and analysis through dev-ops automation. We then describe how we applied our framework to measure the incremental security impact of running a distributed stream processing system inside a hardware trusted execution environment
    • โ€ฆ
    corecore