2,427 research outputs found
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework
In this paper, we argue that the future of Artificial Intelligence research
resides in two keywords: integration and embodiment. We support this claim by
analyzing the recent advances of the field. Regarding integration, we note that
the most impactful recent contributions have been made possible through the
integration of recent Machine Learning methods (based in particular on Deep
Learning and Recurrent Neural Networks) with more traditional ones (e.g.
Monte-Carlo tree search, goal babbling exploration or addressable memory
systems). Regarding embodiment, we note that the traditional benchmark tasks
(e.g. visual classification or board games) are becoming obsolete as
state-of-the-art learning algorithms approach or even surpass human performance
in most of them, having recently encouraged the development of first-person 3D
game platforms embedding realistic physics. Building upon this analysis, we
first propose an embodied cognitive architecture integrating heterogenous
sub-fields of Artificial Intelligence into a unified framework. We demonstrate
the utility of our approach by showing how major contributions of the field can
be expressed within the proposed framework. We then claim that benchmarking
environments need to reproduce ecologically-valid conditions for bootstrapping
the acquisition of increasingly complex cognitive skills through the concept of
a cognitive arms race between embodied agents.Comment: Updated version of the paper accepted to the ICDL-Epirob 2017
conference (Lisbon, Portugal
A bio-inspired bistable recurrent cell allows for long-lasting memory
Recurrent neural networks (RNNs) provide state-of-the-art performances in a
wide variety of tasks that require memory. These performances can often be
achieved thanks to gated recurrent cells such as gated recurrent units (GRU)
and long short-term memory (LSTM). Standard gated cells share a layer internal
state to store information at the network level, and long term memory is shaped
by network-wide recurrent connection weights. Biological neurons on the other
hand are capable of holding information at the cellular level for an arbitrary
long amount of time through a process called bistability. Through bistability,
cells can stabilize to different stable states depending on their own past
state and inputs, which permits the durable storing of past information in
neuron state. In this work, we take inspiration from biological neuron
bistability to embed RNNs with long-lasting memory at the cellular level. This
leads to the introduction of a new bistable biologically-inspired recurrent
cell that is shown to strongly improves RNN performance on time-series which
require very long memory, despite using only cellular connections (all
recurrent connections are from neurons to themselves, i.e. a neuron state is
not influenced by the state of other neurons). Furthermore, equipping this cell
with recurrent neuromodulation permits to link them to standard GRU cells,
taking a step towards the biological plausibility of GRU
Human explainability through an auxiliary Neural Network
Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2020, Tutor: Santi Seguí Mesquida[en] Explainability in Deep Learning has become a hot topic in recent years due to the necessity of insights and justifications for predictions. Although this field has an extensive range of different approaches, this thesis explores the feasibility of a new methodology that seeks to provide human-interpretable explanations for each sample being processed by a Neural Network. The term black box is often used in the Explainability field, meaning that there is a lack in transparency within the model when processing data. The explored approach tries to deal with the black box by using the outputs of the hidden layers of a Neural Network as inputs for the model responsible for the explanations. This model is another Neural Network that can be seen as an auxiliary Neural Network to the main task. The predicted explanations are formed by a subset of a list of human-designed justifications for the possible outcomes of the main task. Using the predictions from both networks a cross comparison process is also performed in order to build confidence on the main predictions. Results successfully show how a significant proportion of incorrect outputs are questioned thanks to the predicted explanations
문맥 인식기반의 문서 단위 신경망 기계 번역 연구
학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 정교민.The neural machine translation (NMT) has attracted great attention in recent years, as it has yielded state-of-the-art translation quality. Despite of their promising results, many current NMT systems are sentence-level; translating each sentence independently. This ignores contexts on text thus producing inadequate and inconsistent translations at the document-level. To overcome the shortcomings, the context-aware NMT (CNMT) has been proposed that takes contextual sentences as input. This dissertation proposes novel methods for improving the CNMT system and an application of CNMT. We first tackle the efficient modeling of multiple contextual sentences on CNMT encoder. For this purpose, we propose a hierarchical context encoder that encodes contextual sentences from token-level to sentence-level. This novel architecture enables the model to achieve state-of-the-art performance on translation quality while taking less computation time on training and translation than existing methods.
Secondly, we investigate the training method for CNMT models, where most models rely on negative log-likelihood (NLL) that do not fully exploit contextual dependencies. To overcome the insufficiency, we introduce coreference-based contrastive learning for CNMT that generates contrastive examples from coreference chains between the source and target sentences. The proposed method improves pronoun resolution accuracy of CNMT models, as well as overall translation quality.
Finally, we investigate an application of CNMT on dealing with Korean honorifics which depends on contextual information for generating adequate translations. For the English-Korean translation task, we propose to use CNMT models that capture crucial contextual information on the English source document and adopt a context-aware post-editing system for exploiting contexts on Korean target sentences, resulting in more consistent Korean honorific translations.신경망 기계번역 기법은 최근 번역 품질에 있어서 큰 성능 향상을 이룩하여 많은 주목을 받고 있다. 그럼에도 불구하고 현재 대부분의 신경망 번역 시스템은 텍스트를 독립된 문장 단위로 번역을 수행하기 때문에 텍스트에 존재하는 문맥을 무시하고 결국 문서 단위로 파악했을 때 적절하지 않은 번역문을 생성할 수 있는 단점이 있다. 이를 극복하기 위해 주변 문장을 동시에 고려하는 문맥 인식 기반 신경망 번역 기법이 제안되고 있다.
본 학위 논문은 문맥 인식 기반 신경망 번역 시스템의 성능을 개선시킬 수 있는 기법들과 문맥 인식 기반 신경망 번역 기법의 활용 방안을 제시한다. 먼저 여러 개의 문맥 문장들을 효과적으로 모델링하기 위해 문맥 문장들을 토큰 레벨 및 문장 레벨로 단계적으로 표현하는 계층적 문맥 인코더를 제시하였다. 제시된 모델은 기존 모델들과 비교하여 가장 좋은 번역 품질을 얻으면서 동시에 학습 및 번역에 걸리는 연산 시간을 단축하였다. 두 번째로는 문맥 인식 기반 신경망 번역모델의 학습 방법을 개선하고자 하였는데 이는 기존 연구에서는 문맥에 대한 의존 관계를 전부 활용하지 못하는 전통적인 음의 로그우도 손실함수에 의존하고 있기 때문이다. 이를 보완하기 위해 문맥 인식 기반 신경망 번역모델을 위한 상호참조에 기반한 대조학습 기법을 제시한다. 제시된 기법은 원문과 주변 문맥 문장들 사이에 존재하는 상호참조 사슬을 활용하여 대조 사례를 생성하며, 문맥 인식 기반 신경망 번역 모델들의 전반적인 번역 품질 뿐만 아니라 대명사 해결 성능도 크게 향상시켰다. 마지막으로는 맥락 정보가 필요한 한국어 경어체 번역에 있어서 문맥 인식 기반 신경망 번역 기법의 활용 방안에 대해서도 연구하였다. 이에 영어-한국어 번역 문제에 문맥 인식 기반 신경망 번역 기법을 적용하여 영어 원문에서 필수적인 맥락 정보를 추출하는 한편 한국어 번역문에서도 문맥 인식 사후편집 시스템을 활용하여 보다 일관된 한국어 경어체 표현을 번역하도록 개선하는 기법을 제시하였다.Abstract i
Contents ii
List of Tables vi
List of Figures viii
1 Introduction 1
2 Background: Neural Machine Translation 7
2.1 A Brief History 7
2.2 Problem Setup 9
2.3 Encoder-Decoder architectures 10
2.3.1 RNN-based Architecture 11
2.3.2 SAN-based Architecture 13
2.4 Training 16
2.5 Decoding 16
2.6 Evaluation 17
3 Efficient Hierarchical Architecture for Modeling Contextual Sentences 18
3.1 Related works 20
3.1.1 Modeling Context in NMT 20
3.1.2 Hierarchical Context Modeling 21
3.1.3 Evaluation of Context-aware NMT 21
3.2 Model description 22
3.2.1 Context-aware NMT encoders 22
3.2.2 Hierarchical context encoder 27
3.3 Data 28
3.3.1 English-German IWSLT 2017 corpus 29
3.3.2 OpenSubtitles corpus 29
3.3.3 English-Korean subtitle corpus 31
3.4 Experiments 31
3.4.1 Hyperparameters and Training details 31
3.4.2 Overall BLEU evaluation 32
3.4.3 Model complexity analysis 32
3.4.4 BLEU evaluation on helpful/unhelpful context 34
3.4.5 EnKo pronoun resolution test suite 35
3.4.6 Qualitative Analysis 37
3.5 Summary of Efficient Hierarchical Architecture for Modeling Contextual Sentences 43
4 Contrastive Learning for Context-aware Neural Machine Translation 44
4.1 Related Works 46
4.1.1 Context-aware NMT Architectures 46
4.1.2 Coreference and NMT 47
4.1.3 Data augmentation for NMT 47
4.1.4 Contrastive Learning 47
4.2 Context-aware NMT models 48
4.3 Our Method: CorefCL 50
4.3.1 Data Augmentation Using Coreference 50
4.3.2 Contrastive Learning for Context-aware NMT 52
4.4 Experiments 53
4.4.1 Datasets 53
4.4.2 Settings 54
4.4.3 Overall BLEU Evaluation 55
4.4.4 Results on English-German Contrastive Evaluation Set 57
4.4.5 Analysis 58
4.5 Summary of Contrastive Learning for Context-aware Neural Machine Translation 59
5 Improving English-Korean Honorific Translation Using Contextual Information 60
5.1 Related Works 63
5.1.1 Neural Machine Translation dealing with Korean 63
5.1.2 Controlling the Styles in NMT 63
5.1.3 Context-Aware NMT Framework and Application 64
5.2 Addressing Korean Honorifics in Context 65
5.2.1 Overview of Korean Honorifics System 65
5.2.2 The Role of Context on Choosing Honorifics 68
5.3 Context-Aware NMT Frameworks 69
5.3.1 NMT Model with Contextual Encoders 71
5.3.2 Context-Aware Post Editing (CAPE) 71
5.4 Our Proposed Method - Context-Aware NMT for Korean Honorifics 73
5.4.1 Using CNMT methods for Honorific-Aware Translation 74
5.4.2 Scope of Honorific Expressions 75
5.4.3 Automatic Honorific Labeling 76
5.5 Experiments 77
5.5.1 Dataset and Preprocessing 77
5.5.2 Model Implementation and Training Details 80
5.5.3 Metrics 80
5.5.4 Results 81
5.5.5 Translation Examples and Analysis 86
5.6 Summary of Improving English-Korean Honorific Translation Using Contextual Information 89
6 Future Directions 91
6.1 Document-level Datasets 91
6.2 Document-level Evaluation 92
6.3 Bias and Fairness of Document-level NMT 93
6.4 Towards Practical Applications 94
7 Conclusions 96
Abstract (In Korean) 117
Acknowledgment 119박
Big data analytics: a predictive analysis applied to cybersecurity in a financial organization
Project Work presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceWith the generalization of the internet access, cyber attacks have registered an alarming growth in frequency and severity of damages, along with the awareness of organizations with heavy investments in cybersecurity, such as in the financial sector. This work is focused on an organization’s financial service that operates on the international markets in the payment systems industry. The objective was to develop a predictive framework solution responsible for threat detection to support the security team to open investigations on intrusive server requests, over the exponentially growing log events collected by the SIEM from the Apache Web Servers for the financial service.
A Big Data framework, using Hadoop and Spark, was developed to perform classification tasks over the financial service requests, using Neural Networks, Logistic Regression, SVM, and Random Forests algorithms, while handling the training of the imbalance dataset through BEV. The main conclusions over the analysis conducted, registered the best scoring performances for the Random Forests classifier using all the preprocessed features available. Using the all the available worker nodes with a balanced configuration of the Spark executors, the most performant elapsed times for loading and preprocessing of the data were achieved using the column-oriented ORC with native format, while the row-oriented CSV format performed the best for the training of the classifiers.Com a generalização do acesso à internet, os ciberataques registaram um crescimento alarmante em frequência e severidade de danos causados, a par da consciencialização das organizações, com elevados investimentos em cibersegurança, como no setor financeiro. Este trabalho focou-se no serviço financeiro de uma organização que opera nos mercados internacionais da indústria de sistemas de pagamento. O objetivo consistiu no desenvolvimento uma solução preditiva responsável pela detecção de ameaças, por forma a dar suporte à equipa de segurança na abertura de investigações sobre pedidos intrusivos no servidor, relativamente aos exponencialmente crescentes eventos de log coletados pelo SIEM, referentes aos Apache Web Servers, para o serviço financeiro.
Uma solução de Big Data, usando Hadoop e Spark, foi desenvolvida com o objectivo de executar tarefas de classificação sobre os pedidos do serviço financeiros, usando os algoritmos Neural Networks, Logistic Regression, SVM e Random Forests, solucionando os problemas associados ao treino de um dataset desequilibrado através de BEV. As principais conclusões sobre as análises realizadas registaram os melhores resultados de classificação usando o algoritmo Random Forests com todas as variáveis pré-processadas disponíveis. Usando todos os nós do cluster e uma configuração balanceada dos executores do Spark, os melhores tempos para carregar e pré-processar os dados foram obtidos usando o formato colunar ORC nativo, enquanto o formato CSV, orientado a linhas, apresentou os melhores tempos para o treino dos classificadores
Many-core and heterogeneous architectures: programming models and compilation toolchains
1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInopartially_openembargoed_20211002Barchi, Francesc
Feature learning and clustering analysis for images classification
The problem this thesis is addressing is to improve an existing classification in 10 categories of the images captured by SEM microscopes. In particular, the challenge faced is to classify those images according to a hierarchical tree structure of sub-categories without requiring any further human labelling effort. In order to uncover intrinsic structures among the images, a procedure involving supervised and unsupervised feature learning, as well as cluster analysis is defined. Moreover, to reduce the bias introduced in the supervised
phase, various strategies focusing on features of different nature and level of abstraction are analyzed
- …