145,072 research outputs found
Search Efficient Binary Network Embedding
Traditional network embedding primarily focuses on learning a dense vector
representation for each node, which encodes network structure and/or node
content information, such that off-the-shelf machine learning algorithms can be
easily applied to the vector-format node representations for network analysis.
However, the learned dense vector representations are inefficient for
large-scale similarity search, which requires to find the nearest neighbor
measured by Euclidean distance in a continuous vector space. In this paper, we
propose a search efficient binary network embedding algorithm called BinaryNE
to learn a sparse binary code for each node, by simultaneously modeling node
context relations and node attribute relations through a three-layer neural
network. BinaryNE learns binary node representations efficiently through a
stochastic gradient descent based online learning algorithm. The learned binary
encoding not only reduces memory usage to represent each node, but also allows
fast bit-wise comparisons to support much quicker network node search compared
to Euclidean distance or other distance measures. Our experiments and
comparisons show that BinaryNE not only delivers more than 23 times faster
search speed, but also provides comparable or better search quality than
traditional continuous vector based network embedding methods
Academic Performance and Behavioral Patterns
Identifying the factors that influence academic performance is an essential
part of educational research. Previous studies have documented the importance
of personality traits, class attendance, and social network structure. Because
most of these analyses were based on a single behavioral aspect and/or small
sample sizes, there is currently no quantification of the interplay of these
factors. Here, we study the academic performance among a cohort of 538
undergraduate students forming a single, densely connected social network. Our
work is based on data collected using smartphones, which the students used as
their primary phones for two years. The availability of multi-channel data from
a single population allows us to directly compare the explanatory power of
individual and social characteristics. We find that the most informative
indicators of performance are based on social ties and that network indicators
result in better model performance than individual characteristics (including
both personality and class attendance). We confirm earlier findings that class
attendance is the most important predictor among individual characteristics.
Finally, our results suggest the presence of strong homophily and/or peer
effects among university students
The Synthetic-Oversampling Method: Using Photometric Colors to Discover Extremely Metal-Poor Stars
Extremely metal-poor (EMP) stars ([Fe/H] < -3.0 dex) provide a unique window
into understanding the first generation of stars and early chemical enrichment
of the Universe. EMP stars are exceptionally rare, however, and the relatively
small number of confirmed discoveries limits our ability to exploit these
near-field probes of the first ~500 Myr after the Big Bang. Here, a new method
to photometrically estimate [Fe/H] from only broadband photometric colors is
presented. I show that the method, which utilizes machine-learning algorithms
and a training set of ~170,000 stars with spectroscopically measured [Fe/H],
produces a typical scatter of ~0.29 dex. This performance is similar to what is
achievable via low-resolution spectroscopy, and outperforms other photometric
techniques, while also being more general. I further show that a slight
alteration to the model, wherein synthetic EMP stars are added to the training
set, yields the robust identification of EMP candidates. In particular, this
synthetic-oversampling method recovers ~20% of the EMP stars in the training
set, at a precision of ~0.05. Furthermore, ~65% of the false positives from the
model are very metal-poor stars ([Fe/H] < -2.0 dex). The synthetic-oversampling
method is biased towards the discovery of warm (~F-type) stars, a consequence
of the targeting bias from the SDSS/SEGUE survey. This EMP selection method
represents a significant improvement over alternative broadband optical
selection techniques. The models are applied to >12 million stars, with an
expected yield of ~600 new EMP stars, which promises to open new avenues for
exploring the early universe.Comment: 15 pages, 7 figures, to be submitted to Ap
Designing Semantic Kernels as Implicit Superconcept Expansions
Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel
- …