176,117 research outputs found
Ranking Large Temporal Data
Ranking temporal data has not been studied until recently, even though
ranking is an important operator (being promoted as a firstclass citizen) in
database systems. However, only the instant top-k queries on temporal data were
studied in, where objects with the k highest scores at a query time instance t
are to be retrieved. The instant top-k definition clearly comes with
limitations (sensitive to outliers, difficult to choose a meaningful query time
t). A more flexible and general ranking operation is to rank objects based on
the aggregation of their scores in a query interval, which we dub the aggregate
top-k query on temporal data. For example, return the top-10 weather stations
having the highest average temperature from 10/01/2010 to 10/07/2010; find the
top-20 stocks having the largest total transaction volumes from 02/05/2011 to
02/07/2011. This work presents a comprehensive study to this problem by
designing both exact and approximate methods (with approximation quality
guarantees). We also provide theoretical analysis on the construction cost, the
index size, the update and the query costs of each approach. Extensive
experiments on large real datasets clearly demonstrate the efficiency, the
effectiveness, and the scalability of our methods compared to the baseline
methods.Comment: VLDB201
Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval
In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method
Inequality of Educational Opportunity in India: Changes over Time and across States
This paper documents the extent of inequality of educational opportunity in India spanning the period 1983-2004 using National Sample Survey (NSS) data. We build on recent developments in the literature that has operationalized concepts in the inequality of opportunity theory (including Roemer's) and construct three indices of inequality of educational opportunity using data on an adult sample. Irrespective of the index used, the state of Kerala stands out as the least unequal in terms of educational opportunities. However, even after excluding Kerala, significant inter-state divergence remains amongst the remaining states. Transition matrix analysis confirms substantial inter-temporal mobility in inequality of opportunity across Indian states. Rajasthan and Gujarat in the West and Uttar Pradesh and Bihar in the Centre experienced large-scale fall in the ranking of inequality of opportunities. However, despite being poor, Eastern states of West Bengal and Orissa made significant progress in reducing inequality of opportunity. At a region level, Southern, North-eastern and Eastern regions on average experienced upward mobility (i.e. decline in inequality of opportunity) whilst the Central region experienced downward mobility. We conclude by examining the link between progress towards equality of opportunity and poverty reduction, growth and a selection of pro-poor policies.schooling mobility, dissimilarity index, Gini of opportunity index, overlap index
Spatial variation in prices and expenditure inequalities in Australia
This study proposes a method of calculating preference-based spatial price indices that measure price variation between regions. It shows how the traditional concept of the 'true cost of living index', used in temporal price comparisons, can also be used in spatial price comparisons. The usefulness of the proposed procedures is illustrated by applying them to Australian household expenditure data. The results show that during the past two decades spatial price variation has increased steadily, with the most recent period (2005-2009) witnessing a large increase. The results also show that the ranking of the states, on both cost of living and inequality, has altered significantly over the past two decades
Ranking Archived Documents for Structured Queries on Semantic Layers
Archived collections of documents (like newspaper and web archives) serve as
important information sources in a variety of disciplines, including Digital
Humanities, Historical Science, and Journalism. However, the absence of
efficient and meaningful exploration methods still remains a major hurdle in
the way of turning them into usable sources of information. A semantic layer is
an RDF graph that describes metadata and semantic information about a
collection of archived documents, which in turn can be queried through a
semantic query language (SPARQL). This allows running advanced queries by
combining metadata of the documents (like publication date) and content-based
semantic information (like entities mentioned in the documents). However, the
results returned by such structured queries can be numerous and moreover they
all equally match the query. In this paper, we deal with this problem and
formalize the task of "ranking archived documents for structured queries on
semantic layers". Then, we propose two ranking models for the problem at hand
which jointly consider: i) the relativeness of documents to entities, ii) the
timeliness of documents, and iii) the temporal relations among the entities.
The experimental results on a new evaluation dataset show the effectiveness of
the proposed models and allow us to understand their limitation
Temporal effects in trend prediction: identifying the most popular nodes in the future
Prediction is an important problem in different science domains. In this
paper, we focus on trend prediction in complex networks, i.e. to identify the
most popular nodes in the future. Due to the preferential attachment mechanism
in real systems, nodes' recent degree and cumulative degree have been
successfully applied to design trend prediction methods. Here we took into
account more detailed information about the network evolution and proposed a
temporal-based predictor (TBP). The TBP predicts the future trend by the node
strength in the weighted network with the link weight equal to its exponential
aging. Three data sets with time information are used to test the performance
of the new method. We find that TBP have high general accuracy in predicting
the future most popular nodes. More importantly, it can identify many potential
objects with low popularity in the past but high popularity in the future. The
effect of the decay speed in the exponential aging on the results is discussed
in detail
Early identification of important patents through network centrality
One of the most challenging problems in technological forecasting is to
identify as early as possible those technologies that have the potential to
lead to radical changes in our society. In this paper, we use the US patent
citation network (1926-2010) to test our ability to early identify a list of
historically significant patents through citation network analysis. We show
that in order to effectively uncover these patents shortly after they are
issued, we need to go beyond raw citation counts and take into account both the
citation network topology and temporal information. In particular, an
age-normalized measure of patent centrality, called rescaled PageRank, allows
us to identify the significant patents earlier than citation count and PageRank
score. In addition, we find that while high-impact patents tend to rely on
other high-impact patents in a similar way as scientific papers, the patents'
citation dynamics is significantly slower than that of papers, which makes the
early identification of significant patents more challenging than that of
significant papers.Comment: 14 page
- …