52,472 research outputs found

    A Statistical Toolbox For Mining And Modeling Spatial Data

    Get PDF
    Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis

    KBGAN: Adversarial Learning for Knowledge Graph Embeddings

    Full text link
    We introduce KBGAN, an adversarial learning framework to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graphs typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts, but the majority of the generated negative facts can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings.Comment: To appear at NAACL HLT 201

    Exploring Student Check-In Behavior for Improved Point-of-Interest Prediction

    Full text link
    With the availability of vast amounts of user visitation history on location-based social networks (LBSN), the problem of Point-of-Interest (POI) prediction has been extensively studied. However, much of the research has been conducted solely on voluntary checkin datasets collected from social apps such as Foursquare or Yelp. While these data contain rich information about recreational activities (e.g., restaurants, nightlife, and entertainment), information about more prosaic aspects of people's lives is sparse. This not only limits our understanding of users' daily routines, but more importantly the modeling assumptions developed based on characteristics of recreation-based data may not be suitable for richer check-in data. In this work, we present an analysis of education "check-in" data using WiFi access logs collected at Purdue University. We propose a heterogeneous graph-based method to encode the correlations between users, POIs, and activities, and then jointly learn embeddings for the vertices. We evaluate our method compared to previous state-of-the-art POI prediction methods, and show that the assumptions made by previous methods significantly degrade performance on our data with dense(r) activity signals. We also show how our learned embeddings could be used to identify similar students (e.g., for friend suggestions).Comment: published in KDD'1

    Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

    Get PDF
    Dimensionality reduction and manifold learning methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) are routinely used to map high-dimensional data into a 2-dimensional space to visualize and explore the data. However, two dimensions are typically insufficient to capture all structure in the data, the salient structure is often already known, and it is not obvious how to extract the remaining information in a similarly effective manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a generalization of t-SNE that discounts prior information from the embedding in the form of labels. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has one extra parameter over t-SNE; we investigate its effects and show how to efficiently optimize the objective. Factoring out prior knowledge allows complementary structure to be captured in the embedding, providing new insights. Qualitative and quantitative empirical results on synthetic and (large) real data show ct-SNE is effective and achieves its goal

    PlaNet - Photo Geolocation with Convolutional Neural Networks

    Full text link
    Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model

    Strategies for embedding eLearning in traditional universities: drivers and barriers

    Get PDF
    This paper addresses the question: how can elearning be embedded in traditional universities so that it contributes to the transformation of the university? The paper examines elearning strategies in higher education, locating the institutional context within the broader framework of national and international policy drivers which link elearning with the achievement of strategic goals such as widening access to lifelong learning, and upskilling for the knowledge and information society. The focus will be on traditional universities i.e. universities whose main form of teaching is on-campus and face-to-face, rather than on open and distance teaching universities, which face different strategic issues in implementing elearning. Reports on the adoption of elearning in traditional universities indicate extensive use of elearning to improve the quality of learning for on-campus students, but this has not yet translated into a significant increase in opportunities for lifelong learners in the workforce and those unable to attend on-campus. One vision of the future of universities is that ‘Virtualisation and remote working technologies will enable us to study at any university in the world, from home’. However, this paper will point out that realisation of this vision of ubiquitous and lifelong access to higher education requires that a fully articulated elearning strategy aims to have a ‘transformative’ rather than just a ‘sustaining’ effect on teaching functions carried out in traditional universities. In order words, rather than just facilitating universities to improve their teaching, elearning should transform how universities currently teach. However, to achieve this transformation, universities will have to introduce strategies and policies which implement flexible academic frameworks, innovative pedagogical approaches, new forms of assessments, cross-institutional accreditation and credit transfer agreements, institutional collaboration in development and delivery, and, most crucially, commitment to equivalence of access for students on and off-campus. The insights in this paper are drawn from an action research case study involving both qualitative and quantitative approaches, utilising interviews, surveys and focus groups with stakeholders, in addition to comparative research on international best practice. The paper will review the drivers and rationales at international, national and institutional level which are leading to the development of elearning strategies, before outlining the outcomes of a case study of elearning strategy development in a traditional Irish university. This study examined the drivers and barriers which increase or decrease motivation to engage in elearning, and provides some insights into the challenges of embedding elearning in higher education. While recognising the desirability of reaching out to new students and engaging in innovative pedagogical approaches, many academic staff continue to prefer traditional lectures, and are sceptical about the potential for student learning in online settings. Extrinsic factors in terms of lack of time and support serve to decrease motivation and there are also fears of loss of academic control to central administration. The paper concludes with some observations on how university elearning strategies must address staff concerns through capacity building, awareness raising and the establishment of effective support structures for embedding elearning
    corecore