2,766 research outputs found

    Distributed Representations of Words and Phrases and their Compositionality

    Full text link
    The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible

    Recurrent Models of Visual Attention

    Full text link
    Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so

    Deep AutoRegressive Networks

    Full text link
    We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.Comment: Appears in Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China, 201

    Webometric analysis of departments of librarianship and information science: a follow-up study

    Get PDF
    This paper reports an analysis of the websites of UK departments of library and information science. Inlink counts of these websites revealed no statistically significant correlation with the quality of the research carried out by these departments, as quantified using departmental grades in the 2001 Research Assessment Exercise and citations in Google Scholar to publications submitted for that Exercise. Reasons for this lack of correlation include: difficulties in disambiguating departmental websites from larger institutional structures; the relatively small amount of research-related material in departmental websites; and limitations in the ways that current Web search engines process linkages to URLs. It is concluded that departmental-level webometric analyses do not at present provide an appropriate technique for evaluating academic research quality, and, more generally, that standards are needed for the formatting of URLs if inlinks are to become firmly established as a tool for website analysis

    Learning nuanced cross-disciplinary citation metric normalization using the hierarchical Dirichlet process on big scholarly data

    Get PDF
    Citation counts have long been used in academia as a way of measuring, inter alia, the importance of journals, quantifying the significance and the impact of a researcher's body of work, and allocating funding for individuals and departments. For example, the h-index proposed by Hirsch is one of the most popular metrics that utilizes citation analysis to determine an individual's research impact. Among many issues, one of the pitfalls of citation metrics is the unfairness which emerges when comparisons are made between researchers in different fields. The algorithm we described in the present paper learns evidence based, nuanced, and probabilistic representations of academic fields, and uses data collected by crawling Google Scholar to perform field of study based normalization of citation based impact metrics such as the h-index.Postprin

    Recovering Residual Forensic Data from Smartphone Interactions with Cloud Storage Providers

    Full text link
    There is a growing demand for cloud storage services such as Dropbox, Box, Syncplicity and SugarSync. These public cloud storage services can store gigabytes of corporate and personal data in remote data centres around the world, which can then be synchronized to multiple devices. This creates an environment which is potentially conducive to security incidents, data breaches and other malicious activities. The forensic investigation of public cloud environments presents a number of new challenges for the digital forensics community. However, it is anticipated that end-devices such as smartphones, will retain data from these cloud storage services. This research investigates how forensic tools that are currently available to practitioners can be used to provide a practical solution for the problems related to investigating cloud storage environments. The research contribution is threefold. First, the findings from this research support the idea that end-devices which have been used to access cloud storage services can be used to provide a partial view of the evidence stored in the cloud service. Second, the research provides a comparison of the number of files which can be recovered from different versions of cloud storage applications. In doing so, it also supports the idea that amalgamating the files recovered from more than one device can result in the recovery of a more complete dataset. Third, the chapter contributes to the documentation and evidentiary discussion of the artefacts created from specific cloud storage applications and different versions of these applications on iOS and Android smartphones

    Vol. IX, Tab 47 - Ex. 12 - Email from AdWords Support - Your Google AdWords Approval Status

    Get PDF
    Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion

    Vol. IX, Tab 41 - Ex 6 - Google Three Ad Policy Changes

    Get PDF
    Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion

    Vol. VIII, Tab 39 - Ex. 3 - Google\u27s Trademark Complaint Policy

    Get PDF
    Exhibits from the un-sealed joint appendix for Rosetta Stone Ltd., v. Google Inc., No. 10-2007, on appeal to the 4th Circuit. Issue presented: Under the Lanham Act, does the use of trademarked terms in keyword advertising result in infringement when there is evidence of actual confusion
    corecore