Search CORE

1,248 research outputs found

FasTrCaps: An Integrated Framework for Fast yet Accurate Training of Capsule Networks

Author: Bussolino Beatrice
Colucci Alessio
Hanif Muhammad Abdullah
Marchisio Alberto
Martina Maurizio
Masera Guido
Shafique Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Recently, Capsule Networks (CapsNets) have shown improved performance compared to the traditional Convolutional Neural Networks (CNNs), by encoding and preserving spatial relationships between the detected features in a better way. This is achieved through the so-called Capsules (i.e., groups of neurons) that encode both the instantiation probability and the spatial information. However, one of the major hurdles in the wide adoption of CapsNets is their gigantic training time, which is primarily due to the relatively higher complexity of their new constituting elements that are different from CNNs.In this paper, we implement different optimizations in the training loop of the CapsNets, and investigate how these optimizations affect their training speed and the accuracy. Towards this, we propose a novel framework FasTrCaps that integrates multiple lightweight optimizations and a novel learning rate policy called WarmAdaBatch (that jointly performs warm restarts and adaptive batch size), and steers them in an appropriate way to provide high training-loop speedup at minimal accuracy loss. We also propose weight sharing for capsule layers. The goal is to reduce the hardware requirements of CapsNets by removing unused/redundant connections and capsules, while keeping high accuracy through tests of different learning rate policies and batch sizes. We demonstrate that one of the solutions generated by the FasTrCaps framework can achieve 58.6% reduction in the training time, while preserving the accuracy (even 0.12% accuracy improvement for the MNIST dataset), compared to the CapsNet by Google Brain [25]. Moreover, the Pareto-optimal solutions generated by FasTrCaps can be leveraged to realize trade-offs between training time and achieved accuracy. We have open-sourced our framework on GitHub 1

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Rethinking Learning Rate Tuning in the Era of Large Language Models

Author: Jin Hongpeng
Wang Xuyu
Wei Wenqi
Wu Yanzhao
Zhang Wenbin
Publication venue
Publication date: 15/09/2023
Field of study

Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance. It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications due to the prohibitive expenses associated with LLM training. The learning rate is one of the most important hyperparameters in LLM fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned LLM quality. Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs), which may not work well for LLM fine-tuning. We reassess the research challenges and opportunities of learning rate tuning in the coming era of Large Language Models. This paper makes three original contributions. First, we revisit existing learning rate policies to analyze the critical challenges of learning rate tuning in the era of LLMs. Second, we present LRBench++ to benchmark learning rate policies and facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our experimental analysis with LRBench++ demonstrates the key differences between LLM fine-tuning and traditional DNN training and validates our analysis

arXiv.org e-Print Archive

Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks

Author: Azhar Mohammad Faris
Ibad Ahmad Mustafidul
Muflikhah Lailil
Pratama Mochamad Arfan Ravy Wahyu
Rafif Sulthan
Yudistira Novanto
Publication venue
Publication date: 17/01/2024
Field of study

Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation

arXiv.org e-Print Archive

PassGAN: A Deep Learning Approach for Password Guessing

Author: A Ciaramella
B Duc
DE Rumelhart
KP Murphy
KP Murphy
M Dürmuth
M Frank
R Morris
Y LeCun
Z Sitová
Publication venue
Publication date: 14/02/2019
Field of study

State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s5w0rd"). Although these rules work well in practice, expanding them to model further passwords is a laborious task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%-73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode.Comment: This is an extended version of the paper which appeared in NeurIPS 2018 Workshop on Security in Machine Learning (SecML'18), see https://github.com/secml2018/secml2018.github.io/raw/master/PASSGAN_SECML2018.pd

arXiv.org e-Print Archive

Crossref

Algorithms in future capital markets: A survey on AI, ML and associated algorithms in capital markets

Author: Firoozye N
Koshiyama A
Treleaven P
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 15/10/2020
Field of study

This paper reviews Artificial Intelligence (AI), Machine Learning (ML) and associated algorithms in future Capital Markets. New AI algorithms are constantly emerging, with each 'strain' mimicking a new form of human learning, reasoning, knowledge, and decisionmaking. The current main disrupting forms of learning include Deep Learning, Adversarial Learning, Transfer and Meta Learning. Albeit these modes of learning have been in the AI/ML field more than a decade, they now are more applicable due to the availability of data, computing power and infrastructure. These forms of learning have produced new models (e.g., Long Short-Term Memory, Generative Adversarial Networks) and leverage important applications (e.g., Natural Language Processing, Adversarial Examples, Deep Fakes, etc.). These new models and applications will drive changes in future Capital Markets, so it is important to understand their computational strengths and weaknesses. Since ML algorithms effectively self-program and evolve dynamically, financial institutions and regulators are becoming increasingly concerned with ensuring there remains a modicum of human control, focusing on Algorithmic Interpretability/Explainability, Robustness and Legality. For example, the concern is that, in the future, an ecology of trading algorithms across different institutions may 'conspire' and become unintentionally fraudulent (cf. LIBOR) or subject to subversion through compromised datasets (e.g. Microsoft Tay). New and unique forms of systemic risks can emerge, potentially coming from excessive algorithmic complexity. The contribution of this paper is to review AI, ML and associated algorithms, their computational strengths and weaknesses, and discuss their future impact on the Capital Markets

UCL Discovery