Search CORE

2,766 research outputs found

Distributed Representations of Words and Phrases and their Compositionality

Author: Google Inc
Google Inc
Greg Corrado
Ilya Sutskever
Jeffrey Dean
Kai Chen
Tomas Mikolov
Publication venue
Publication date: 16/10/2013
Field of study

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible

arXiv.org e-Print Archive

CiteSeerX

Recurrent Models of Visual Attention

Author: Alex Graves
Google Deepmind
Koray Kavukcuoglu
Nicolas Heess
Volodymyr Mnih
Publication venue
Publication date: 24/06/2014
Field of study

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so

arXiv.org e-Print Archive

CiteSeerX

Deep AutoRegressive Networks

Author: Andriy Mnih
Charles Blundell
Daan Wierstra
Google Deepmind
Ivo Danihelka
Karol Gregor
Publication venue
Publication date: 20/05/2014
Field of study

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.Comment: Appears in Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 201

arXiv.org e-Print Archive

CiteSeerX

Webometric analysis of departments of librarianship and information science: a follow-up study

Author: A. Holmes
B. Cronin
British Academy
C.L. Borgman
D. Sullivan
Department for Children School and Families
E. Garfield
G. McKiernan
Google
Hero
HERO Research Assessment Exercise
M. Arakaki
Mónica Arakaki
P. Jacso
Peter Willett
R. Rousseau
R.R. Larson
Publication venue: 'SAGE Publications'
Publication date: 01/04/2009
Field of study

This paper reports an analysis of the websites of UK departments of library and information science. Inlink counts of these websites revealed no statistically significant correlation with the quality of the research carried out by these departments, as quantified using departmental grades in the 2001 Research Assessment Exercise and citations in Google Scholar to publications submitted for that Exercise. Reasons for this lack of correlation include: difficulties in disambiguating departmental websites from larger institutional structures; the relatively small amount of research-related material in departmental websites; and limitations in the ways that current Web search engines process linkages to URLs. It is concluded that departmental-level webometric analyses do not at present provide an appropriate technique for evaluating academic research quality, and, more generally, that standards are needed for the formatting of URLs if inlinks are to become firmly established as a tool for website analysis

Crossref

White Rose Research Online

Learning nuanced cross-disciplinary citation metric normalization using the hierarchical Dirichlet process on big scholarly data

Author: Andrei V.
Blog Google Scholar
Chang J.
Garfield E.
Teh Y. W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Citation counts have long been used in academia as a way of measuring, inter alia, the importance of journals, quantifying the significance and the impact of a researcher's body of work, and allocating funding for individuals and departments. For example, the h-index proposed by Hirsch is one of the most popular metrics that utilizes citation analysis to determine an individual's research impact. Among many issues, one of the pitfalls of citation metrics is the unfairness which emerges when comparisons are made between researchers in different fields. The algorithm we described in the present paper learns evidence based, nuanced, and probabilistic representations of academic fields, and uses data collected by crawling Google Scholar to perform field of study based normalization of citation based impact metrics such as the h-index.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Recovering Residual Forensic Data from Smartphone Interactions with Cloud Storage Providers

Author: Amazon Web Services
Amazon Web Services
Biggs
Biggs
Brodkin
Chung
Constine
CRN
Delport
Distefano
Dykstra
Gartner
Glisson
Google
Grispos
Grispos
Grispos
Grispos
Grispos
Hay
Hay
Hengeveld
Hong
Hoog
Hunsinger
Ibrahim
Jansen
Jansen
Kaufman
Kholia
Lee
Lessard
Levinson
Mager
MarketWire
MarketWire
Martini
Martini
Morrissey
Mozy
Mulazzani
Phillips
Quick
Quick
Quick
Quick
Quick
Reilly
Ruan
Subashini
Taylor
TrendMicro
Zdziarski
Zissis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

There is a growing demand for cloud storage services such as Dropbox, Box, Syncplicity and SugarSync. These public cloud storage services can store gigabytes of corporate and personal data in remote data centres around the world, which can then be synchronized to multiple devices. This creates an environment which is potentially conducive to security incidents, data breaches and other malicious activities. The forensic investigation of public cloud environments presents a number of new challenges for the digital forensics community. However, it is anticipated that end-devices such as smartphones, will retain data from these cloud storage services. This research investigates how forensic tools that are currently available to practitioners can be used to provide a practical solution for the problems related to investigating cloud storage environments. The research contribution is threefold. First, the findings from this research support the idea that end-devices which have been used to access cloud storage services can be used to provide a partial view of the evidence stored in the cloud service. Second, the research provides a comparison of the number of files which can be recovered from different versions of cloud storage applications. In doing so, it also supports the idea that amalgamating the files recovered from more than one device can result in the recovery of a more complete dataset. Third, the chapter contributes to the documentation and evidentiary discussion of the artefacts created from specific cloud storage applications and different versions of these applications on iOS and Android smartphones

arXiv.org e-Print Archive

CiteSeerX

Crossref

Enlighten

Vol. IX, Tab 41 - Ex 6 - Google Three Ad Policy Changes

Author: Google
Publication venue: Santa Clara Law Digital Commons
Publication date: 07/04/2004
Field of study

Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion

bepress Legal Repository

Santa Clara University School of Law

Vol. IX, Tab 47 - Ex. 12 - Email from AdWords Support - Your Google AdWords Approval Status

Author: Google
Publication venue: Santa Clara Law Digital Commons
Publication date: 19/01/2005
Field of study

bepress Legal Repository

Santa Clara University School of Law

Vol. VIII, Tab 39 - Ex. 3 - Google\u27s Trademark Complaint Policy

Author: Google
Publication venue: Santa Clara Law Digital Commons
Publication date: 20/01/2010
Field of study

bepress Legal Repository

Santa Clara University School of Law