301 research outputs found
BayesNAS: A Bayesian Approach for Neural Architecture Search
One-Shot Neural Architecture Search (NAS) is a promising method to
significantly reduce search time without any separate training. It can be
treated as a Network Compression problem on the architecture parameters from an
over-parameterized network. However, there are two issues associated with most
one-shot NAS methods. First, dependencies between a node and its predecessors
and successors are often disregarded which result in improper treatment over
zero operations. Second, architecture parameters pruning based on their
magnitude is questionable. In this paper, we employ the classic Bayesian
learning approach to alleviate these two issues by modeling architecture
parameters using hierarchical automatic relevance determination (HARD) priors.
Unlike other NAS methods, we train the over-parameterized network for only one
epoch then update the architecture. Impressively, this enabled us to find the
architecture on CIFAR-10 within only 0.2 GPU days using a single GPU.
Competitive performance can be also achieved by transferring to ImageNet. As a
byproduct, our approach can be applied directly to compress convolutional
neural networks by enforcing structural sparsity which achieves extremely
sparse networks without accuracy deterioration.Comment: International Conference on Machine Learning 201
Microstructure Evolution and Surface Cracking Behavior of Superheavy Forgings during Hot Forging
In recent years, superheavy forgings that are manufactured from 600 t grade ingots have been applied in the latest generation of nuclear power plants to provide good safety. However, component production is pushing the limits of the current free-forging industry. Large initial grain sizes and a low strain rate are the main factors that contribute to the deformation of superheavy forgings during forging. In this study, 18Mn18Cr0.6N steel with a coarse grain structure was selected as a model material. Hot compression and hot tension tests were conducted at a strain rate of 10−4·s−1. The essential nucleation mechanism of the dynamic recrystallization involved low-angle grain boundary formation and subgrain rotation, which was independent of the original high-angle grain boundary bulging and the presence of twins. Twins were formed during the growth of dynamic recrystallization grains. The grain refinement was not obvious at 1150°C. A lowering of the deformation temperature to 1050°C resulted in a fine grain structure; however, the stress increased significantly. Crack-propagation paths included high-angle grain boundaries, twin boundaries, and the insides of grains, in that order. For superheavy forging, the ingot should have a larger height and a smaller diameter
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
Background: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/
Rethinking Learning Rate Tuning in the Era of Large Language Models
Large Language Models (LLMs) represent the recent success of deep learning in
achieving remarkable human-like predictive performance. It has become a
mainstream strategy to leverage fine-tuning to adapt LLMs for various
real-world applications due to the prohibitive expenses associated with LLM
training. The learning rate is one of the most important hyperparameters in LLM
fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned
LLM quality. Existing learning rate policies are primarily designed for
training traditional deep neural networks (DNNs), which may not work well for
LLM fine-tuning. We reassess the research challenges and opportunities of
learning rate tuning in the coming era of Large Language Models. This paper
makes three original contributions. First, we revisit existing learning rate
policies to analyze the critical challenges of learning rate tuning in the era
of LLMs. Second, we present LRBench++ to benchmark learning rate policies and
facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our
experimental analysis with LRBench++ demonstrates the key differences between
LLM fine-tuning and traditional DNN training and validates our analysis
Prediction of TF-binding site by inclusion of higher order position dependencies
Most proposed methods for TF-binding site (TFBS) predictions only use low order dependencies for predictions due to the lack of efficient methods to extract higher order dependencies. In this work, We first propose a novel method to extract higher order dependencies by applying CNN on histone modification features. We then propose a novel TFBS prediction method, referred to as CNN_TF, by incorporating low order and higher order dependencies. CNN_TF is first evaluated on 13 TFs in the mES cell. Results show that using higher order dependencies outperforms low order dependencies significantly on 11 TFs. This indicates that higher order dependencies are indeed more effective for TFBS predictions than low order dependencies. Further experiments show that using both low order dependencies and higher order dependencies improves performance significantly on 12 TFs, indicating the two dependency types are complementary. To evaluate the influence of cell-types on prediction performances, CNN_TF was applied to five TFs in five cell-types of humans. Even though low order dependencies and higher order dependencies show different contributions in different cell-types, they are always complementary in predictions. When comparing to several state-of-the-art methods, CNN_TF outperforms them by at least 5.3% in AUPR
MTTFsite : cross-cell-type TF binding site prediction by using multi-task learning
Motivation
The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data.
Results
In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained
- …