1,248 research outputs found
FasTrCaps: An Integrated Framework for Fast yet Accurate Training of Capsule Networks
Recently, Capsule Networks (CapsNets) have shown improved performance compared to the traditional Convolutional Neural Networks (CNNs), by encoding and preserving spatial relationships between the detected features in a better way. This is achieved through the so-called Capsules (i.e., groups of neurons) that encode both the instantiation probability and the spatial information. However, one of the major hurdles in the wide adoption of CapsNets is their gigantic training time, which is primarily due to the relatively higher complexity of their new constituting elements that are different from CNNs.In this paper, we implement different optimizations in the training loop of the CapsNets, and investigate how these optimizations affect their training speed and the accuracy. Towards this, we propose a novel framework FasTrCaps that integrates multiple lightweight optimizations and a novel learning rate policy called WarmAdaBatch (that jointly performs warm restarts and adaptive batch size), and steers them in an appropriate way to provide high training-loop speedup at minimal accuracy loss. We also propose weight sharing for capsule layers. The goal is to reduce the hardware requirements of CapsNets by removing unused/redundant connections and capsules, while keeping high accuracy through tests of different learning rate policies and batch sizes. We demonstrate that one of the solutions generated by the FasTrCaps framework can achieve 58.6% reduction in the training time, while preserving the accuracy (even 0.12% accuracy improvement for the MNIST dataset), compared to the CapsNet by Google Brain [25]. Moreover, the Pareto-optimal solutions generated by FasTrCaps can be leveraged to realize trade-offs between training time and achieved accuracy. We have open-sourced our framework on GitHub 1
Rethinking Learning Rate Tuning in the Era of Large Language Models
Large Language Models (LLMs) represent the recent success of deep learning in
achieving remarkable human-like predictive performance. It has become a
mainstream strategy to leverage fine-tuning to adapt LLMs for various
real-world applications due to the prohibitive expenses associated with LLM
training. The learning rate is one of the most important hyperparameters in LLM
fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned
LLM quality. Existing learning rate policies are primarily designed for
training traditional deep neural networks (DNNs), which may not work well for
LLM fine-tuning. We reassess the research challenges and opportunities of
learning rate tuning in the coming era of Large Language Models. This paper
makes three original contributions. First, we revisit existing learning rate
policies to analyze the critical challenges of learning rate tuning in the era
of LLMs. Second, we present LRBench++ to benchmark learning rate policies and
facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our
experimental analysis with LRBench++ demonstrates the key differences between
LLM fine-tuning and traditional DNN training and validates our analysis
Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks
Stride determines the distance between adjacent filter positions as the
filter moves across the input. A fixed stride causes important information
contained in the image can not be captured, so that important information is
not classified. Therefore, in previous research, the DiffStride Method was
applied, namely the Strided Convolution Method with which it can learn its own
stride value. Severe Quantization and a constraining lower bound on preserved
information are arises with Max Pooling Downsampling Method. Spectral Pooling
reduce the constraint lower bound on preserved information by cutting off the
representation in the frequency domain. In this research a CNN Model is
proposed with the Downsampling Learnable Stride Technique performed by
Backpropagation combined with the Spectral Pooling Technique. Diffstride and
Spectral Pooling techniques are expected to maintain most of the information
contained in the image. In this study, we compare the Hybrid Method, which is a
combined implementation of Spectral Pooling and DiffStride against the Baseline
Method, which is the DiffStride implementation on ResNet 18. The accuracy
result of the DiffStride combination with Spectral Pooling improves over
DiffStride which is baseline method by 0.0094. This shows that the Hybrid
Method can maintain most of the information by cutting of the representation in
the frequency domain and determine the stride of the learning result through
Backpropagation
PassGAN: A Deep Learning Approach for Password Guessing
State-of-the-art password guessing tools, such as HashCat and John the
Ripper, enable users to check billions of passwords per second against password
hashes. In addition to performing straightforward dictionary attacks, these
tools can expand password dictionaries using password generation rules, such as
concatenation of words (e.g., "password123456") and leet speak (e.g.,
"password" becomes "p4s5w0rd"). Although these rules work well in practice,
expanding them to model further passwords is a laborious task that requires
specialized expertise. To address this issue, in this paper we introduce
PassGAN, a novel approach that replaces human-generated password rules with
theory-grounded machine learning algorithms. Instead of relying on manual
password analysis, PassGAN uses a Generative Adversarial Network (GAN) to
autonomously learn the distribution of real passwords from actual password
leaks, and to generate high-quality password guesses. Our experiments show that
this approach is very promising. When we evaluated PassGAN on two large
password datasets, we were able to surpass rule-based and state-of-the-art
machine learning password guessing tools. However, in contrast with the other
tools, PassGAN achieved this result without any a-priori knowledge on passwords
or common password structures. Additionally, when we combined the output of
PassGAN with the output of HashCat, we were able to match 51%-73% more
passwords than with HashCat alone. This is remarkable, because it shows that
PassGAN can autonomously extract a considerable number of password properties
that current state-of-the art rules do not encode.Comment: This is an extended version of the paper which appeared in NeurIPS
2018 Workshop on Security in Machine Learning (SecML'18), see
https://github.com/secml2018/secml2018.github.io/raw/master/PASSGAN_SECML2018.pd
Algorithms in future capital markets: A survey on AI, ML and associated algorithms in capital markets
This paper reviews Artificial Intelligence (AI), Machine Learning (ML) and associated algorithms in future Capital Markets. New AI algorithms are constantly emerging, with each 'strain' mimicking a new form of human learning, reasoning, knowledge, and decisionmaking. The current main disrupting forms of learning include Deep Learning, Adversarial Learning, Transfer and Meta Learning. Albeit these modes of learning have been in the AI/ML field more than a decade, they now are more applicable due to the availability of data, computing power and infrastructure. These forms of learning have produced new models (e.g., Long Short-Term Memory, Generative Adversarial Networks) and leverage important applications (e.g., Natural Language Processing, Adversarial Examples, Deep Fakes, etc.). These new models and applications will drive changes in future Capital Markets, so it is important to understand their computational strengths and weaknesses. Since ML algorithms effectively self-program and evolve dynamically, financial institutions and regulators are becoming increasingly concerned with ensuring there remains a modicum of human control, focusing on Algorithmic Interpretability/Explainability, Robustness and Legality. For example, the concern is that, in the future, an ecology of trading algorithms across different institutions may 'conspire' and become unintentionally fraudulent (cf. LIBOR) or subject to subversion through compromised datasets (e.g. Microsoft Tay). New and unique forms of systemic risks can emerge, potentially coming from excessive algorithmic complexity. The contribution of this paper is to review AI, ML and associated algorithms, their computational strengths and weaknesses, and discuss their future impact on the Capital Markets
- …