Search CORE

196,980 research outputs found

Evaluating machine translation in a low-resource language combination : Spanish-Galician

Author: do Campo Bayón Maria
Sánchez-Gijón Pilar
Publication venue: European Association for Machine Translation,
Publication date: 01/01/2019
Field of study

This paper reports the results of a study designed to assess the perception of adequacy of three different types of machine translation systems within the context of a minoritized language combination (Spanish-Galician). To perform this evaluation, a mixed design with three different metrics (BLEU, survey and error analysis) is used to extract quantitative and qualitative data about two marketing letters from the energy industry translated with a rulebased system (RBMT), a phrase-based system (PBMT) and a neural system (NMT). Results show that in the case of low-resource languages rule-based and phrase-based machine translations systems still play an important role

Diposit Digital de Documents de la UAB

A comparison of CK and Martin's package metric suites in predicting reusability in open source object-oriented software

Author: Alhadi Meftah Khaled
Publication venue
Publication date: 01/03/2016
Field of study

Packages are units that organize source code in large object-oriented systems. Metrics used at the package granularity level mostly characterize attributes such as complexity, size, cohesion and coupling. Many of these characterized attributes have direct relationships with the quality of the software system being produced. Empirical evidence is required to support the collection of measures for such metrics; hence these metrics are used as early indicators of such important external quality attributes. This research compared the CK and Martin’s package metric suites in order to characterize the package reusability level in object-oriented software. Comparing the package level of metrics suites as they measure an external software quality attribute is supposed to help a developer knows which metric suite can be used to effectively predict the software quality attribute at package level. In this research two open source Java applications, namely; jEdit and BlueJ were used in the evaluation of two package metrics suites, and were compared empirically to predict the package reusability level. The metric measures were also used to compare the effectiveness of the metrics in these package metrics suites in evaluating the reusability at the package granularity level. Thereafter metric measures of each package were normalized to allow for the comparison of the package reusability level among packages within an application. The Bansiya reusability model equation was adapted as a reusability reference quality model in this research work. Correlation analysis was performed to help compare the metrics within package metrics suites. Through the ranking of the package reusability level, results show that the jEdit application has 30% of its packages ranked with a very high reusability level, thus conformed to the Pareto rule (80:20). This means that the jEdit application has packages that are more reusable than packages in the BlueJ application. Empirically, the Martin’s package coupling metric Ce with an r value of 0.68, is ranked as having a positive strong correlation with RL, and this has distinguished the Martin’s package metrics suite as an effective predictor of package reusability level from the CK package metrics suite

UTHM Institutional Repository

Characterizing realistic signature-based intrusion detection benchmarks

Author: Aldwairi Monther
Alshboul Mohammad A.
Seyam Asmaa
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

© 2018 Association for Computing Machinery. Speeding up pattern matching for intrusion detection systems has been a growing field of research. There has been a flux of new algorithms, modifications to existing algorithms and even hardware architectures aimed at improving pattern matching performance. Establishing an accurate comparison to related work is a real challenge because researchers use different datasets and metrics to evaluate their work. The purpose of this paper is to characterize and identify realistic workloads, propose standard benchmarks, and establish common metrics to better compare work in the area of pattern matching for intrusion detection. We collect traffic traces and attack signatures from popular open source platforms. The datasets are processed, cleansed and studied, to give the researchers a better understanding of their characteristics. The final datasets along with detailed information about their origins, contents, features, statistical analysis and performance evaluation using well-known pattern-matching algorithms are available to the public. In addition, we provide a generic parser capable of parsing different intrusion detection systems rule formats and extract attack signatures. Finally, a pattern-matching engine that enables researchers to plug-and-play their new pattern matching algorithms and compare to existing algorithms using the predefined metrics

ZU Scholars (Zayed University)

Crossref

Customer Sentiments in Product Reviews: A Comparative Study with GooglePaLM

Author: Makkar Sandhya
Okoyeigbo Obinna
Sasikumar Swethika
Shobayo Olamilekan
Publication venue
Publication date: 18/06/2024
Field of study

In this work, we evaluated the efficacy of Google’s Pathways Language Model (GooglePaLM) in analyzing sentiments expressed in product reviews. Although conventional Natural Language Processing (NLP) techniques such as the rule-based Valence Aware Dictionary for Sentiment Reasoning (VADER) and the long sequence Bidirectional Encoder Representations from Transformers (BERT) model are effective, they frequently encounter difficulties when dealing with intricate linguistic features like sarcasm and contextual nuances commonly found in customer feedback. We performed a sentiment analysis on Amazon’s fashion review datasets using the VADER, BERT, and GooglePaLM models, respectively, and compared the results based on evaluation metrics such as precision, recall, accuracy correct positive prediction, and correct negative prediction. We used the default values of the VADER and BERT models and slightly finetuned GooglePaLM with a Temperature of 0.0 and an N-value of 1. We observed that GooglePaLM performed better with correct positive and negative prediction values of 0.91 and 0.93, respectively, followed by BERT and VADER. We concluded that large language models surpass traditional rule-based systems for natural language processing tasks

Edge Hill University Research Information Repository

조건부 텍스트 생성 시스템에 대한 사실 관계의 일관성 평가

Author: 이환희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정교민.최근의 사전학습 언어모델의 활용을 통한 조건부 텍스트 생성 시스템들의 발전에도 불구하고, 시스템들의 사실 관계의 일관성은 여전히 충분하지 않은 편이다. 그러나 널리 사용되는 n-그램 기반 유사성 평가 기법은 사실 일관성 평가에 매우 취약하다. 따라서, 사실 일관된 텍스트 생성 시스템을 개발하기 위해서는 먼저 시스템의 사실 관계를 제대로 평가할 수 있는 자동 평가 기법이 필요하다. 본 논문에서는 다양한 조건부 텍스트 생성 시스템에 대해, 이전 평가 기법보다 사실 관계 일관성 평가에서 인간의 판단과 매우 높은 상관관계를 보여주는 4가지 평가 기법을 제안한다. 이 기법들은 (1) 보조 태스크 활용 및 (2) 데이터 증강 기법 등을 활용한다. 첫째로, 우리는 중요한 핵심 단어또는 핵심 구문에 초점을 맞춘 두 가지 다른 보조 태스크를 활용하여 두 가지 사실 관계의 일관성 평가 기법을 제안한다. 우리는 먼저 핵심 구문의 가중치 예측 태스크를 이전 평가 기법에 결합하여 주관식 질의 응답을 위한 평가 기법을 제안한다. 또한, 우리는 질의 생성 및 응답을 활용하여 키워드에 대한 질의를 생성하고, 이미지와 캡션에 대한 질문의 답을 비교하여 사실 일관성을 확인하는 QACE를 제안한다. 둘째로, 우리는 보조 태스크 활용과 달리, 데이터 기반 방식의 학습을 통해 두 가지의 평가 기법을 제안한다. 구체적으로, 우리는 증강된 일관성 없는 텍스트를 일관성 있는 텍스트와 구분하도록 훈련한다. 먼저 규칙 기반 변형을 통한 불일치 캡션 생성으로 이미지 캡션 평가 지표 UMIC을 제안한다. 다음 단계로, 마스킹된 소스와 마스킹된 요약을 사용하여 일관성이 없는 요약을 생성하는 MFMA를 통해 평가 지표를 개발한다. 마지막으로, 데이터 기반 사실 일관성 평가 기법 개발의 확장으로, 시스템의 사실 관계 오류를 수정할 수 있는 빠른 사후 교정 시스템을 제안한다.Despite the recent advances of conditional text generation systems leveraged from pre-trained language models, factual consistency of the systems are still not sufficient. However, widely used n-gram similarity metrics are vulnerable to evaluate the factual consistency. Hence, in order to develop a factual consistent system, an automatic factuality metric is first necessary. In this dissertation, we propose four metrics that show very higher correlation with human judgments than previous metrics in evaluating factual consistency, for diverse conditional text generation systems. To build such metrics, we utilize (1) auxiliary tasks and (2) data augmentation methods. First, we focus on the keywords or keyphrases that are critical for evaluating factual consistency and propose two factual consistency metrics using two different auxiliary tasks. We first integrate the keyphrase weights prediction task to the previous metrics to propose a KPQA (Keyphrase Prediction for Question Answering)-metric for generative QA. Also, we apply question generation and answering to develop a captioning metric QACE (Question Answering for Captioning Evaluation). QACE generates questions on the keywords of the candidate. QACE checks the factual consistency by comparing the answers of these questions for the source image and the caption. Secondly, different from using auxiliary tasks, we directly train a metric with a data-driven approach to propose two metrics. Specifically, we train a metric to distinguish augmented inconsistent texts with the consistent text. We first modify the original reference captions to generate inconsistent captions using several rule-based methods such as substituting keywords to propose UMIC (Unreferenced Metric for Image Captioning). As a next step, we introduce a MFMA (Mask-and-Fill with Masked-Article)-metric by generating inconsistent summary using the masked source and the masked summary. Finally, as an extension of developing data-driven factual consistency metrics, we also propose a faster post-editing system that can fix the factual errors in the system.1 Introduction 1 2 Background 10 2.1 Text Evaluation Metrics 10 2.1.1 N-gram Similarity Metrics 10 2.1.2 Embedding Similarity Metrics 12 2.1.3 Auxiliary Task Based Metrics 12 2.1.4 Entailment Based Metrics 13 2.2 Evaluating Automated Metrics 14 3 Integrating Keyphrase Weights for Factual Consistency Evaluation 15 3.1 Related Work 17 3.2 Proposed Approach: KPQA-Metric 18 3.2.1 KPQA 18 3.2.2 KPQA Metric 19 3.3 Experimental Setup and Dataset 23 3.3.1 Dataset 23 3.3.2 Implementation Details 26 3.4 Empirical Results 27 3.4.1 Comparison with Other Methods 27 3.4.2 Analysis 29 3.5 Conclusion 35 4 Question Generation and Question Answering for Factual Consistency Evaluation 36 4.1 Related Work 37 4.2 Proposed Approach: QACE 38 4.2.1 Question Generation 38 4.2.2 Question Answering 39 4.2.3 Abstractive Visual Question Answering 40 4.2.4 QACE Metric 42 4.3 Experimental Setup and Dataset 43 4.3.1 Dataset 43 4.3.2 Implementation Details 44 4.4 Empirical Results 45 4.4.1 Comparison with Other Methods 45 4.4.2 Analysis 46 4.5 Conclusion 48 5 Rule-Based Inconsistent Data Augmentation for Factual Consistency Evaluation 49 5.1 Related Work 51 5.2 Proposed Approach: UMIC 52 5.2.1 Modeling 52 5.2.2 Negative Samples 53 5.2.3 Contrastive Learning 55 5.3 Experimental Setup and Dataset 56 5.3.1 Dataset 56 5.3.2 Implementation Details 60 5.4 Empirical Results 61 5.4.1 Comparison with Other Methods 61 5.4.2 Analysis 62 5.5 Conclusion 65 6 Inconsistent Data Augmentation with Masked Generation for Factual Consistency Evaluation 66 6.1 Related Work 68 6.2 Proposed Approach: MFMA and MSM 70 6.2.1 Mask-and-Fill with Masked Article 71 6.2.2 Masked Summarization 72 6.2.3 Training Factual Consistency Checking Model 72 6.3 Experimental Setup and Dataset 73 6.3.1 Dataset 73 6.3.2 Implementation Details 74 6.4 Empirical Results 75 6.4.1 Comparison with Other Methods 75 6.4.2 Analysis 78 6.5 Conclusion 84 7 Factual Error Correction for Improving Factual Consistency 85 7.1 Related Work 87 7.2 Proposed Approach: RFEC 88 7.2.1 Problem Formulation 88 7.2.2 Training Dataset Construction 89 7.2.3 Evidence Sentence Retrieval 90 7.2.4 Entity Retrieval Based Factual Error Correction 90 7.3 Experimental Setup and Dataset 92 7.3.1 Dataset 92 7.3.2 Implementation Details 93 7.4 Empirical Results 93 7.4.1 Comparison with Other Methods 93 7.4.2 Analysis 95 7.5 Conclusion 95 8 Conclusion 97 Abstract (In Korean) 118박

SNU Open Repository and Archive

Perceived Information Revisited

Author: Akira Ito
Naofumi Homma
Rei Ueno
Publication venue: 'Universitatsbibliothek der Ruhr-Universitat Bochum'
Publication date: 01/08/2022
Field of study

In this study, we present new analytical metrics for evaluating the performance of side-channel attacks (SCAs) by revisiting the perceived information (PI), which is defined using cross-entropy (CE). PI represents the amount of information utilized by a probability distribution that determines a distinguishing rule in SCA. Our analysis partially solves an important open problem in the performance evaluation of deep-learning based SCAs (DL-SCAs) that the relationship between neural network (NN) model evaluation metrics (such as accuracy, loss, and recall) and guessing entropy (GE)/success rate (SR) is unclear. We first theoretically show that the conventional CE/PI is non-calibrated and insufficient for evaluating the SCA performance, as it contains uncertainty in terms of SR. More precisely, we show that an infinite number of probability distributions with different CE/PI can achieve an identical SR. With the above analysis result, we present a modification of CE/PI, named effective CE/PI (ECE/EPI), to eliminate the above uncertainty. The ECE/EPI can be easily calculated for a given probability distribution and dataset, which would be suitable for DL-SCA. Using the ECE/EPI, we can accurately evaluate the SR hrough the validation loss in the training phase, and can measure the generalization of the NN model in terms of SR in the attack phase. We then analyze and discuss the proposed metrics regarding their relationship to SR, conditions of successful attacks for a distinguishing rule with a probability distribution, a statistic/asymptotic aspect, and the order of key ranks in SCA. Finally, we validate the proposed metrics through experimental attacks on masked AES implementations using DL-SCA

Directory of Open Access Journals

Open source software maturity model based on linear regression and Bayesian analysis

Author: Zhang Dongmin
Publication venue
Publication date: 15/05/2009
Field of study

Open Source Software (OSS) is widely used and is becoming a significant and irreplaceable part of the software engineering community. Today a huge number of OSS exist. This becomes a problem if one needs to choose from such a large pool of OSS candidates in the same category. An OSS maturity model that facilitates the software assessment and helps users to make a decision is needed. A few maturity models have been proposed in the past. However, the parameters in the model are assigned not based on experimental data but on human experiences, feelings and judgments. These models are subjective and can provide only limited guidance for the users at the best. This dissertation has proposed a quantitative and objective model which is built from the statistical perspective. In this model, seven metrics are chosen as criteria for OSS evaluation. A linear multiple-regression model is created to assign a final score based on these seven metrics. This final score provides a convenient and objective way for the users to make a decision. The coefficients in the linear multiple-regression model are calculated from 43 OSS. From the statistical perspective, these coefficients are considered random variables. The joint distribution of the coefficients is discussed based on Bayesian statistics. More importantly, an updating rule is established through Bayesian analysis to improve the joint distribution, and thus the objectivity of the coefficients in the linear multiple-regression model, according to new incoming data. The updating rule provides the model the ability to learn and improve itself continually

Texas A&M Repository

Building an Expert System for Evaluation of Commercial Cloud Services

Author: Cai Rainbow
Li Zheng
O'Brien Liam
Zhang He
Publication venue
Publication date: 09/02/2013
Field of study

Commercial Cloud services have been increasingly supplied to customers in industry. To facilitate customers' decision makings like cost-benefit analysis or Cloud provider selection, evaluation of those Cloud services are becoming more and more crucial. However, compared with evaluation of traditional computing systems, more challenges will inevitably appear when evaluating rapidly-changing and user-uncontrollable commercial Cloud services. This paper proposes an expert system for Cloud evaluation that addresses emerging evaluation challenges in the context of Cloud Computing. Based on the knowledge and data accumulated by exploring the existing evaluation work, this expert system has been conceptually validated to be able to give suggestions and guidelines for implementing new evaluation experiments. As such, users can conveniently obtain evaluation experiences by using this expert system, which is essentially able to make existing efforts in Cloud services evaluation reusable and sustainable.Comment: 8 page, Proceedings of the 2012 International Conference on Cloud and Service Computing (CSC 2012), pp. 168-175, Shanghai, China, November 22-24, 201

arXiv.org e-Print Archive

Crossref

Automating Cyber Analytics

Author: Zaber Matthew
Publication venue: SMU Scholar
Publication date: 16/05/2020
Field of study

Model based security metrics are a growing area of cyber security research concerned with measuring the risk exposure of an information system. These metrics are typically studied in isolation, with the formulation of the test itself being the primary finding in publications. As a result, there is a flood of metric specifications available in the literature but a corresponding dearth of analyses verifying results for a given metric calculation under different conditions or comparing the efficacy of one measurement technique over another. The motivation of this thesis is to create a systematic methodology for model based security metric development, analysis, integration, and validation. In doing so we hope to fill a critical gap in the way we view and improve a system’s security. In order to understand the security posture of a system before it is rolled out and as it evolves, we present in this dissertation an end to end solution for the automated measurement of security metrics needed to identify risk early and accurately. To our knowledge this is a novel capability in design time security analysis which provides the foundation for ongoing research into predictive cyber security analytics. Modern development environments contain a wealth of information in infrastructure-as-code repositories, continuous build systems, and container descriptions that could inform security models, but risk evaluation based on these sources is ad-hoc at best, and often simply left until deployment. Our goal in this work is to lay the groundwork for security measurement to be a practical part of the system design, development, and integration lifecycle. In this thesis we provide a framework for the systematic validation of the existing security metrics body of knowledge. In doing so we endeavour not only to survey the current state of the art, but to create a common platform for future research in the area to be conducted. We then demonstrate the utility of our framework through the evaluation of leading security metrics against a reference set of system models we have created. We investigate how to calibrate security metrics for different use cases and establish a new methodology for security metric benchmarking. We further explore the research avenues unlocked by automation through our concept of an API driven S-MaaS (Security Metrics-as-a-Service) offering. We review our design considerations in packaging security metrics for programmatic access, and discuss how various client access-patterns are anticipated in our implementation strategy. Using existing metric processing pipelines as reference, we show how the simple, modular interfaces in S-MaaS support dynamic composition and orchestration. Next we review aspects of our framework which can benefit from optimization and further automation through machine learning. First we create a dataset of network models labeled with the corresponding security metrics. By training classifiers to predict security values based only on network inputs, we can avoid the computationally expensive attack graph generation steps. We use our findings from this simple experiment to motivate our current lines of research into supervised and unsupervised techniques such as network embeddings, interaction rule synthesis, and reinforcement learning environments. Finally, we examine the results of our case studies. We summarize our security analysis of a large scale network migration, and list the friction points along the way which are remediated by this work. We relate how our research for a large-scale performance benchmarking project has influenced our vision for the future of security metrics collection and analysis through dev-ops automation. We then describe how we applied our framework to measure the incremental security impact of running a distributed stream processing system inside a hardware trusted execution environment

Southern Methodist University

SMU Digital Repository