721 research outputs found
The geography of city liveliness and consumption: evidence from location-based big data
Understanding the complexity in the connection between city liveliness and spatial configurationsfor consumptive amenities has been an important but understudied research field in fast urbanising countries like China. This paper presents the first step towards filling this gap though location-based big data perspectives. City liveliness is measured by aggregated spacetime human activity intensities using mobile phone positioning data.Consumptive amenities are identified by point-of-interest data from Chinese Yelp website (dian ping). The results provide the insights into the geographic contextual uncertainties of consumptive amenities in shaping the rise and fall in the vibrancy of city liveliness
Estimating Target Heights Based on the Earth Curvature Model and Micromultipath Effect in Skywave OTH Radar
Skywave over-the-horizon (OTH) radar systems have important long-range strategic warning values. They exploit skywave propagation reflection of high frequency signals from the ionosphere, which provides the ultra-long-range surveillance capabilities to detect and track maneuvering targets. Current OTH radar systems are capable of localizing targets in range and azimuth but are unable to achieve reliable instantaneous altitude estimation. Most existing height measurement methods of skywave OTH radar systems have taken advantage of the micromultipath effect and been considered in the flat earth model. However, the flat earth model is not proper since large error is inevitable, when the detection range is over one thousand kilometers. In order to avoid the error caused by the flat earth model, in this paper, an earth curvature model is introduced into OTH radar altimetry methods. The simulation results show that application of the earth curvature model can effectively reduce the estimation error
Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification
Recent advances in weakly supervised text classification mostly focus on
designing sophisticated methods to turn high-level human heuristics into
quality pseudo-labels. In this paper, we revisit the seed matching-based
method, which is arguably the simplest way to generate pseudo-labels, and show
that its power was greatly underestimated. We show that the limited performance
of seed matching is largely due to the label bias injected by the simple
seed-match rule, which prevents the classifier from learning reliable
confidence for selecting high-quality pseudo-labels. Interestingly, simply
deleting the seed words present in the matched input texts can mitigate the
label bias and help learn better confidence. Subsequently, the performance
achieved by seed matching can be improved significantly, making it on par with
or even better than the state-of-the-art. Furthermore, to handle the case when
the seed words are not made known, we propose to simply delete the word tokens
in the input text randomly with a high deletion ratio. Remarkably, seed
matching equipped with this random deletion method can often achieve even
better performance than that with seed deletion
Variational Estimation for Multidimensional Generalized Partial Credit Model
Multidimensional item response theory (MIRT) models have generated increasing
interest in the psychometrics literature. Efficient approaches for estimating
MIRT models with dichotomous responses have been developed, but constructing an
equally efficient and robust algorithm for polytomous models has received
limited attention. To address this gap, this paper presents a novel Gaussian
variational estimation algorithm for the multidimensional generalized partial
credit model (MGPCM). The proposed algorithm demonstrates both fast and
accurate performance, as illustrated through a series of simulation studies and
two real data analyses
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval
Large-scale pre-trained text-image models with dual-encoder architectures
(such as CLIP) are typically adopted for various vision-language applications,
including text-image retrieval. However,these models are still less practical
on edge devices or for real-time situations, due to the substantial indexing
and inference time and the large consumption of computational resources.
Although knowledge distillation techniques have been widely utilized for
uni-modal model compression, how to expand them to the situation when the
numbers of modalities and teachers/students are doubled has been rarely
studied. In this paper, we conduct comprehensive experiments on this topic and
propose the fully-Connected knowledge interaction graph (Cona) technique for
cross-modal pre-training distillation. Based on our findings, the resulting
ConaCLIP achieves SOTA performances on the widely-used Flickr30K and MSCOCO
benchmarks under the lightweight setting. An industry application of our method
on an e-commercial platform further demonstrates the significant effectiveness
of ConaCLIP.Comment: ACL 2023 Industry Trac
Distances to the Supernova Remnants in the Inner Disk
Distance measurements of supernova remnants (SNRs) are essential and
important. Accurate estimates of physical size, dust masses, and some other
properties of SNRs depend critically on accurate distance measurements.
However, the determination of SNR distances is still a tough task. Red clump
stars (RCs) have a long history been used as standard candles. In this work, we
take RCs as tracers to determine the distances to a large group of SNRs in the
inner disk. We first select RC stars based on the near-infrared (IR)
color-magnitude diagram (CMD). Then, the distance to and extinction of RC stars
are calculated. To extend the measurable range of distance, we combine near-IR
photometric data from the 2MASS survey with the deeper UKIDSS and VVV surveys.
With the help of the Gaia parallaxes, we also remove contaminants including
dwarfs and giants. Because an SN explosion compresses the surrounding
interstellar medium, the SNR region would become denser and exhibit higher
extinction than the surroundings. The distance of a SNR is then recognized by
the position where the extinction and its gradient is higher than that of the
ambient medium. A total of 63 SNRs' distances in the Galactic inner disk are
determined and divided into three Levels A, B, and C with decreasing
reliability. The distances to 43 SNRs are well determined with reliability A or
B. The diameters and dust masses of SNRs are estimated with the obtained
distance and extinction.Comment: 31 pages, 25 figures, 2 tables, accepted for publication in A&
Boosting In-Context Learning with Factual Knowledge
In-Context Learning (ICL) over Large language models (LLMs) aims at solving
previously unseen tasks by conditioning on a few training examples, eliminating
the need for parameter updates and achieving competitive performance. In this
paper, we demonstrate that factual knowledge is imperative for the performance
of ICL in three core facets, i.e., the inherent knowledge learned in LLMs, the
factual knowledge derived from the selected in-context examples, and the
knowledge biases in LLMs for output generation. To unleash the power of LLMs in
few-shot learning scenarios, we introduce a novel Knowledgeable In-Context
Tuning (KICT) framework to further improve the performance of ICL: 1) injecting
factual knowledge to LLMs during continual self-supervised pre-training, 2)
judiciously selecting the examples with high knowledge relevance, and 3)
calibrating the prediction results based on prior knowledge. We evaluate the
proposed approaches on auto-regressive LLMs (e.g., GPT-style models) over
multiple text classification and question answering tasks. Experimental results
demonstrate that KICT substantially outperforms strong baselines, and improves
by more than 13% and 7% of accuracy on text classification and question
answering tasks, respectively
- …