215,342 research outputs found
Accelerating Deep Learning with Shrinkage and Recall
Deep Learning is a very powerful machine learning model. Deep Learning trains
a large number of parameters for multiple layers and is very slow when data is
in large scale and the architecture size is large. Inspired from the shrinking
technique used in accelerating computation of Support Vector Machines (SVM)
algorithm and screening technique used in LASSO, we propose a shrinking Deep
Learning with recall (sDLr) approach to speed up deep learning computation. We
experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network
(DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data
sets. Results show that the speedup using shrinking Deep Learning with recall
(sDLr) can reach more than 2.0 while still giving competitive classification
performance.Comment: The 22nd IEEE International Conference on Parallel and Distributed
Systems (ICPADS 2016
Naïve Bayes: Machine Learning and Text Classification Application of Bayes’ Theorem
This paper introduces the basics of classification and machine learning, as well as building an application of one classification model. The classification model chosen is based on Bayes’ Theorem and adapts it to handle large datasets. The paper also introduces a more meaningful precision-recall measure which is suited to machine learning algorithms
HMM word-to-phrase alignment with dependency constraints
In this paper, we extend the HMMwordto-phrase alignment model with syntactic dependency constraints. The syntactic
dependencies between multiple words in one language are introduced into the model in a bid to produce coherent
alignments. Our experimental results on a variety of Chinese–English data show that our syntactically constrained
model can lead to as much as a 3.24% relative improvement in BLEU score over current HMM word-to-phrase alignment models on a Phrase-Based Statistical Machine Translation system when the training data is small, and a comparable performance compared to IBM model 4 on a Hiero-style system
with larger training data. An intrinsic alignment quality evaluation shows that our alignment model with dependency
constraints leads to improvements in both precision (by 1.74% relative) and recall (by 1.75% relative) over the model without dependency information
Exploiting citation networks for large-scale author name disambiguation
We present a novel algorithm and validation method for disambiguating author
names in very large bibliographic data sets and apply it to the full Web of
Science (WoS) citation index. Our algorithm relies only upon the author and
citation graphs available for the whole period covered by the WoS. A pair-wise
publication similarity metric, which is based on common co-authors,
self-citations, shared references and citations, is established to perform a
two-step agglomerative clustering that first connects individual papers and
then merges similar clusters. This parameterized model is optimized using an
h-index based recall measure, favoring the correct assignment of well-cited
publications, and a name-initials-based precision using WoS metadata and
cross-referenced Google Scholar profiles. Despite the use of limited metadata,
we reach a recall of 87% and a precision of 88% with a preference for
researchers with high h-index values. 47 million articles of WoS can be
disambiguated on a single machine in less than a day. We develop an h-index
distribution model, confirming that the prediction is in excellent agreement
with the empirical data, and yielding insight into the utility of the h-index
in real academic ranking scenarios.Comment: 14 pages, 5 figure
- …