145,072 research outputs found

    Search Efficient Binary Network Embedding

    Full text link
    Traditional network embedding primarily focuses on learning a dense vector representation for each node, which encodes network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned dense vector representations are inefficient for large-scale similarity search, which requires to find the nearest neighbor measured by Euclidean distance in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a sparse binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations efficiently through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support much quicker network node search compared to Euclidean distance or other distance measures. Our experiments and comparisons show that BinaryNE not only delivers more than 23 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods

    Academic Performance and Behavioral Patterns

    Get PDF
    Identifying the factors that influence academic performance is an essential part of educational research. Previous studies have documented the importance of personality traits, class attendance, and social network structure. Because most of these analyses were based on a single behavioral aspect and/or small sample sizes, there is currently no quantification of the interplay of these factors. Here, we study the academic performance among a cohort of 538 undergraduate students forming a single, densely connected social network. Our work is based on data collected using smartphones, which the students used as their primary phones for two years. The availability of multi-channel data from a single population allows us to directly compare the explanatory power of individual and social characteristics. We find that the most informative indicators of performance are based on social ties and that network indicators result in better model performance than individual characteristics (including both personality and class attendance). We confirm earlier findings that class attendance is the most important predictor among individual characteristics. Finally, our results suggest the presence of strong homophily and/or peer effects among university students

    The Synthetic-Oversampling Method: Using Photometric Colors to Discover Extremely Metal-Poor Stars

    Get PDF
    Extremely metal-poor (EMP) stars ([Fe/H] < -3.0 dex) provide a unique window into understanding the first generation of stars and early chemical enrichment of the Universe. EMP stars are exceptionally rare, however, and the relatively small number of confirmed discoveries limits our ability to exploit these near-field probes of the first ~500 Myr after the Big Bang. Here, a new method to photometrically estimate [Fe/H] from only broadband photometric colors is presented. I show that the method, which utilizes machine-learning algorithms and a training set of ~170,000 stars with spectroscopically measured [Fe/H], produces a typical scatter of ~0.29 dex. This performance is similar to what is achievable via low-resolution spectroscopy, and outperforms other photometric techniques, while also being more general. I further show that a slight alteration to the model, wherein synthetic EMP stars are added to the training set, yields the robust identification of EMP candidates. In particular, this synthetic-oversampling method recovers ~20% of the EMP stars in the training set, at a precision of ~0.05. Furthermore, ~65% of the false positives from the model are very metal-poor stars ([Fe/H] < -2.0 dex). The synthetic-oversampling method is biased towards the discovery of warm (~F-type) stars, a consequence of the targeting bias from the SDSS/SEGUE survey. This EMP selection method represents a significant improvement over alternative broadband optical selection techniques. The models are applied to >12 million stars, with an expected yield of ~600 new EMP stars, which promises to open new avenues for exploring the early universe.Comment: 15 pages, 7 figures, to be submitted to Ap

    Designing Semantic Kernels as Implicit Superconcept Expansions

    Get PDF
    Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel
    • …
    corecore