65 research outputs found
On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks
Theoretical analysis of the error landscape of deep neural networks has
garnered significant interest in recent years. In this work, we theoretically
study the importance of noise in the trajectories of gradient descent towards
optimal solutions in multi-layer neural networks. We show that adding noise (in
different ways) to a neural network while training increases the rank of the
product of weight matrices of a multi-layer linear neural network. We thus
study how adding noise can assist reaching a global optimum when the product
matrix is full-rank (under certain conditions). We establish theoretical
foundations between the noise induced into the neural network - either to the
gradient, to the architecture, or to the input/output to a neural network - and
the rank of product of weight matrices. We corroborate our theoretical findings
with empirical results.Comment: 4 pages + 1 figure (main, excluding references), 5 pages + 4 figures
(appendix
ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent
Two major momentum-based techniques that have achieved tremendous success in
optimization are Polyak's heavy ball method and Nesterov's accelerated
gradient. A crucial step in all momentum-based methods is the choice of the
momentum parameter which is always suggested to be set to less than .
Although the choice of is justified only under very strong theoretical
assumptions, it works well in practice even when the assumptions do not
necessarily hold. In this paper, we propose a new momentum based method
, which relaxes the constraint of and allows the
learning algorithm to use adaptive higher momentum. We motivate our hypothesis
on by experimentally verifying that a higher momentum () can help
escape saddles much faster. Using this motivation, we propose our method
that helps weigh the previous updates more (by setting the
momentum parameter ), evaluate our proposed algorithm on deep neural
networks and show that helps the learning algorithm to
converge much faster without compromising on the generalization error.Comment: 8 + 1 pages, 12 figures, accepted at CoDS-COMAD 201
DANTE: Deep AlterNations for Training nEural networks
We present DANTE, a novel method for training neural networks using the
alternating minimization principle. DANTE provides an alternate perspective to
traditional gradient-based backpropagation techniques commonly used to train
deep networks. It utilizes an adaptation of quasi-convexity to cast training a
neural network as a bi-quasi-convex optimization problem. We show that for
neural network configurations with both differentiable (e.g. sigmoid) and
non-differentiable (e.g. ReLU) activation functions, we can perform the
alternations effectively in this formulation. DANTE can also be extended to
networks with multiple hidden layers. In experiments on standard datasets,
neural networks trained using the proposed method were found to be promising
and competitive to traditional backpropagation techniques, both in terms of
quality of the solution, as well as training speed.Comment: 19 page
A database management system for selection of steel
Selection of an ideal material for a given application will be a relatively simple matter, if perfect or near perfect materials are available. Such a material will have high strength, high toughness, good ductility and good fabricability. These properties would not necessarily
be compatible in an existing material. Compromises and trade-offs among various properties become inevitable. On the other hand, infinite number of possible materials in various forms and its usage are so intertwined in all industries that a person can have no real comprehension
of the characteristics of all the materials. A computer assistance either in the form of a software or a database is therefore unavoidable. A database is developed to assist in the selection of steels for scientific and engineering applications. A program is written in Foxpro
to identify the ideal steel based on its tensile strength, elongation, toughness and hardness. The program can select the suitable steels and can generate the forging temper-ature, heat treatment procedure, etc. The input data are obtained from various handbooks and textbooks. A Pentium-586 with FOXPRO is used to build the prototype database
management system in DOS environment. Human interface with the system is enhanced by the user-friendly menus. The database can be made compatible easily to a wide variety of micro, mini and mainframe computers
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Loss landscape analysis is extremely useful for a deeper understanding of the
generalization ability of deep neural network models. In this work, we propose
a layerwise loss landscape analysis where the loss surface at every layer is
studied independently and also on how each correlates to the overall loss
surface. We study the layerwise loss landscape by studying the eigenspectra of
the Hessian at each layer. In particular, our results show that the layerwise
Hessian geometry is largely similar to the entire Hessian. We also report an
interesting phenomenon where the Hessian eigenspectrum of middle layers of the
deep neural network are observed to most similar to the overall Hessian
eigenspectrum. We also show that the maximum eigenvalue and the trace of the
Hessian (both full network and layerwise) reduce as training of the network
progresses. We leverage on these observations to propose a new regularizer
based on the trace of the layerwise Hessian. Penalizing the trace of the
Hessian at every layer indirectly forces Stochastic Gradient Descent to
converge to flatter minima, which are shown to have better generalization
performance. In particular, we show that such a layerwise regularizer can be
leveraged to penalize the middlemost layers alone, which yields promising
results. Our empirical studies on well-known deep nets across datasets support
the claims of this workComment: Accepted at AAAI 202
Similarity-based Contrastive Divergence Methods for Energy-based Deep Learning Models
Energy-based deep learning models like Restricted Boltzmann Machines are increasingly used for real-world applications. However, all these models inherently depend on the Contrastive Divergence (CD) method for training and maximization of log likelihood of generating the given data distribution. CD, which internally uses Gibbs sampling, often does not perform well due to issues such as biased samples, poor mixing of Markov chains and highmass probability modes. Variants of CD such as PCD, Fast PCD and Tempered MCMC have been proposed to address this issue. In this work, we propose a new approach to CDbased methods, called Diss-CD, which uses dissimilar data to allow the Markov chain to explore new modes in the probability space. This method can be used with all variants of CD (or PCD), and across all energy-based deep learning models. Our experiments on using this approach on standard datasets including MNIST, Caltech-101 Silhouette and Synthetic Transformations, demonstrate the promise of this approach, showing fast convergence of error in learning and also a better approximation of log likelihood of the data
Deep Learning Frameworks for Cardiovascular Arrhythmia Classification
Arrhythmia classification is a prominent research problem due to the computational complexities of learning the morphology of various ECG patterns and its wide prevalence in the medical field, particularly during the COVID-19 pandemic. In this article, we used Empirical Mode Decomposition and Discrete Wavelet Transform for preprocessing and then the modified signal is classified using various classifiers such as Decision Tree, Logistic Regression, Gaussian Naïve Bayes, Random Forest, Linear SVM, Polynomial SVM, RBF SVM, Sigmoid SVM and Convolutional Neural Networks. The proposed method classify the data into five classes N (Normal), S (Supraventricular premature) beat, (V) Premature ventricular contraction, F (Fusion of ventricular and normal), and Q, (Unclassifiable Beat) using softmax regressor at the end of the network. The proposed approach performs well in terms of classification accuracy when tested using ECG signals acquired from the MIT-BIH database. In comparison to existing classifiers, the Accuracy, Precision, Recall, and F1 score values of the proposed technique are 98.5%, 96.9%, 94.3%, and 91.32%, respectively.  
Phytochemical Screening and Antimicrobial Activity of the Leaf Extract of Mirabilis jalapa Against Pathogenic Microorganisms
Investigation of the phytochemical constituents and antimicrobial activity of the leaf extracts of Mirabilis jalapa were carried out using acetone, chloroform, ethanol and methanol. These extracts were subjected to screening of preliminary phytochemical tests. Phytochemical analysis showed the presence of alkaloids, flavanoids, phenols, glycosides, tannins, saponins and lignins. The methanol extract exhibited the largest zone of inhibition (21mm in dia with 500μg/disc extract) against Staphylococcus aureus and the highest inhibition of fungal radial mycelial growth (97.5% with 500μg/ml medium) against Aspergillus flavus. The methanol extract exhibited the lowest MIC against Staphylococcus aureus (39 μg/ml) and Aspergillus flavus (45μg/ml). It appeared that M. jalapa could be a potential natural source of new antimicrobial agent.Keywords: Mirabilis jalapa, leaf extract, phytochemicals, antimicrobial activity
A framework for human microbiome research
A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies
Structure, function and diversity of the healthy human microbiome
Author Posting. © The Authors, 2012. This article is posted here by permission of Nature Publishing Group. The definitive version was published in Nature 486 (2012): 207-214, doi:10.1038/nature11234.Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.This research was supported in
part by National Institutes of Health grants U54HG004969 to B.W.B.; U54HG003273
to R.A.G.; U54HG004973 to R.A.G., S.K.H. and J.F.P.; U54HG003067 to E.S.Lander;
U54AI084844 to K.E.N.; N01AI30071 to R.L.Strausberg; U54HG004968 to G.M.W.;
U01HG004866 to O.R.W.; U54HG003079 to R.K.W.; R01HG005969 to C.H.;
R01HG004872 to R.K.; R01HG004885 to M.P.; R01HG005975 to P.D.S.;
R01HG004908 to Y.Y.; R01HG004900 to M.K.Cho and P. Sankar; R01HG005171 to
D.E.H.; R01HG004853 to A.L.M.; R01HG004856 to R.R.; R01HG004877 to R.R.S. and
R.F.; R01HG005172 to P. Spicer.; R01HG004857 to M.P.; R01HG004906 to T.M.S.;
R21HG005811 to E.A.V.; M.J.B. was supported by UH2AR057506; G.A.B. was
supported by UH2AI083263 and UH3AI083263 (G.A.B., C. N. Cornelissen, L. K. Eaves
and J. F. Strauss); S.M.H. was supported by UH3DK083993 (V. B. Young, E. B. Chang,
F. Meyer, T. M. S., M. L. Sogin, J. M. Tiedje); K.P.R. was supported by UH2DK083990 (J.
V.); J.A.S. and H.H.K. were supported by UH2AR057504 and UH3AR057504 (J.A.S.);
DP2OD001500 to K.M.A.; N01HG62088 to the Coriell Institute for Medical Research;
U01DE016937 to F.E.D.; S.K.H. was supported by RC1DE0202098 and
R01DE021574 (S.K.H. and H. Li); J.I. was supported by R21CA139193 (J.I. and
D. S. Michaud); K.P.L. was supported by P30DE020751 (D. J. Smith); Army Research
Office grant W911NF-11-1-0473 to C.H.; National Science Foundation grants NSF
DBI-1053486 to C.H. and NSF IIS-0812111 to M.P.; The Office of Science of the US
Department of Energy under Contract No. DE-AC02-05CH11231 for P.S. C.; LANL
Laboratory-Directed Research and Development grant 20100034DR and the US
Defense Threat Reduction Agency grants B104153I and B084531I to P.S.C.; Research
Foundation - Flanders (FWO) grant to K.F. and J.Raes; R.K. is an HHMI Early Career
Scientist; Gordon&BettyMoore Foundation funding and institutional funding fromthe
J. David Gladstone Institutes to K.S.P.; A.M.S. was supported by fellowships provided by
the Rackham Graduate School and the NIH Molecular Mechanisms in Microbial
Pathogenesis Training Grant T32AI007528; a Crohn’s and Colitis Foundation of
Canada Grant in Aid of Research to E.A.V.; 2010 IBM Faculty Award to K.C.W.; analysis
of the HMPdata was performed using National Energy Research Scientific Computing
resources, the BluBioU Computational Resource at Rice University
- …