2,766 research outputs found
Distributed Representations of Words and Phrases and their Compositionality
The recently introduced continuous Skip-gram model is an efficient method for
learning high-quality distributed vector representations that capture a large
number of precise syntactic and semantic word relationships. In this paper we
present several extensions that improve both the quality of the vectors and the
training speed. By subsampling of the frequent words we obtain significant
speedup and also learn more regular word representations. We also describe a
simple alternative to the hierarchical softmax called negative sampling. An
inherent limitation of word representations is their indifference to word order
and their inability to represent idiomatic phrases. For example, the meanings
of "Canada" and "Air" cannot be easily combined to obtain "Air Canada".
Motivated by this example, we present a simple method for finding phrases in
text, and show that learning good vector representations for millions of
phrases is possible
Recurrent Models of Visual Attention
Applying convolutional neural networks to large images is computationally
expensive because the amount of computation scales linearly with the number of
image pixels. We present a novel recurrent neural network model that is capable
of extracting information from an image or video by adaptively selecting a
sequence of regions or locations and only processing the selected regions at
high resolution. Like convolutional neural networks, the proposed model has a
degree of translation invariance built-in, but the amount of computation it
performs can be controlled independently of the input image size. While the
model is non-differentiable, it can be trained using reinforcement learning
methods to learn task-specific policies. We evaluate our model on several image
classification tasks, where it significantly outperforms a convolutional neural
network baseline on cluttered images, and on a dynamic visual control problem,
where it learns to track a simple object without an explicit training signal
for doing so
Deep AutoRegressive Networks
We introduce a deep, generative autoencoder capable of learning hierarchies
of distributed representations from data. Successive deep stochastic hidden
layers are equipped with autoregressive connections, which enable the model to
be sampled from quickly and exactly via ancestral sampling. We derive an
efficient approximate parameter estimation method based on the minimum
description length (MDL) principle, which can be seen as maximising a
variational lower bound on the log-likelihood, with a feedforward neural
network implementing approximate inference. We demonstrate state-of-the-art
generative performance on a number of classic data sets: several UCI data sets,
MNIST and Atari 2600 games.Comment: Appears in Proceedings of the 31st International Conference on
Machine Learning (ICML), Beijing, China, 201
Webometric analysis of departments of librarianship and information science: a follow-up study
This paper reports an analysis of the websites of UK departments of library and information science. Inlink counts of these websites revealed no statistically significant correlation with the quality of the research carried out by these departments, as quantified using departmental grades in the 2001 Research Assessment Exercise and citations in Google Scholar to publications submitted for that Exercise. Reasons for this lack of correlation include: difficulties in disambiguating departmental websites from larger institutional structures; the relatively small amount of research-related material in departmental websites; and limitations in the ways that current Web search engines process linkages to URLs. It is concluded that departmental-level webometric analyses do not at present provide an appropriate technique for evaluating academic research quality, and, more generally, that standards are needed for the formatting of URLs if inlinks are to become firmly established as a tool for website analysis
Learning nuanced cross-disciplinary citation metric normalization using the hierarchical Dirichlet process on big scholarly data
Citation counts have long been used in academia as a way of measuring, inter alia, the importance of journals, quantifying the significance and the impact of a researcher's body of work, and allocating funding for individuals and departments. For example, the h-index proposed by Hirsch is one of the most popular metrics that utilizes citation analysis to determine an individual's research impact. Among many issues, one of the pitfalls of citation metrics is the unfairness which emerges when comparisons are made between researchers in different fields. The algorithm we described in the present paper learns evidence based, nuanced, and probabilistic representations of academic fields, and uses data collected by crawling Google Scholar to perform field of study based normalization of citation based impact metrics such as the h-index.Postprin
Recovering Residual Forensic Data from Smartphone Interactions with Cloud Storage Providers
There is a growing demand for cloud storage services such as Dropbox, Box,
Syncplicity and SugarSync. These public cloud storage services can store
gigabytes of corporate and personal data in remote data centres around the
world, which can then be synchronized to multiple devices. This creates an
environment which is potentially conducive to security incidents, data breaches
and other malicious activities. The forensic investigation of public cloud
environments presents a number of new challenges for the digital forensics
community. However, it is anticipated that end-devices such as smartphones,
will retain data from these cloud storage services. This research investigates
how forensic tools that are currently available to practitioners can be used to
provide a practical solution for the problems related to investigating cloud
storage environments. The research contribution is threefold. First, the
findings from this research support the idea that end-devices which have been
used to access cloud storage services can be used to provide a partial view of
the evidence stored in the cloud service. Second, the research provides a
comparison of the number of files which can be recovered from different
versions of cloud storage applications. In doing so, it also supports the idea
that amalgamating the files recovered from more than one device can result in
the recovery of a more complete dataset. Third, the chapter contributes to the
documentation and evidentiary discussion of the artefacts created from specific
cloud storage applications and different versions of these applications on iOS
and Android smartphones
Vol. IX, Tab 47 - Ex. 12 - Email from AdWords Support - Your Google AdWords Approval Status
Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion
Vol. IX, Tab 41 - Ex 6 - Google Three Ad Policy Changes
Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion
Vol. VIII, Tab 39 - Ex. 3 - Google\u27s Trademark Complaint Policy
Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion
- …
