176,117 research outputs found

    Ranking Large Temporal Data

    Full text link
    Ranking temporal data has not been studied until recently, even though ranking is an important operator (being promoted as a firstclass citizen) in database systems. However, only the instant top-k queries on temporal data were studied in, where objects with the k highest scores at a query time instance t are to be retrieved. The instant top-k definition clearly comes with limitations (sensitive to outliers, difficult to choose a meaningful query time t). A more flexible and general ranking operation is to rank objects based on the aggregation of their scores in a query interval, which we dub the aggregate top-k query on temporal data. For example, return the top-10 weather stations having the highest average temperature from 10/01/2010 to 10/07/2010; find the top-20 stocks having the largest total transaction volumes from 02/05/2011 to 02/07/2011. This work presents a comprehensive study to this problem by designing both exact and approximate methods (with approximation quality guarantees). We also provide theoretical analysis on the construction cost, the index size, the update and the query costs of each approach. Extensive experiments on large real datasets clearly demonstrate the efficiency, the effectiveness, and the scalability of our methods compared to the baseline methods.Comment: VLDB201

    Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval

    Get PDF
    In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method

    Inequality of Educational Opportunity in India: Changes over Time and across States

    Get PDF
    This paper documents the extent of inequality of educational opportunity in India spanning the period 1983-2004 using National Sample Survey (NSS) data. We build on recent developments in the literature that has operationalized concepts in the inequality of opportunity theory (including Roemer's) and construct three indices of inequality of educational opportunity using data on an adult sample. Irrespective of the index used, the state of Kerala stands out as the least unequal in terms of educational opportunities. However, even after excluding Kerala, significant inter-state divergence remains amongst the remaining states. Transition matrix analysis confirms substantial inter-temporal mobility in inequality of opportunity across Indian states. Rajasthan and Gujarat in the West and Uttar Pradesh and Bihar in the Centre experienced large-scale fall in the ranking of inequality of opportunities. However, despite being poor, Eastern states of West Bengal and Orissa made significant progress in reducing inequality of opportunity. At a region level, Southern, North-eastern and Eastern regions on average experienced upward mobility (i.e. decline in inequality of opportunity) whilst the Central region experienced downward mobility. We conclude by examining the link between progress towards equality of opportunity and poverty reduction, growth and a selection of pro-poor policies.schooling mobility, dissimilarity index, Gini of opportunity index, overlap index

    Spatial variation in prices and expenditure inequalities in Australia

    Get PDF
    This study proposes a method of calculating preference-based spatial price indices that measure price variation between regions. It shows how the traditional concept of the 'true cost of living index', used in temporal price comparisons, can also be used in spatial price comparisons. The usefulness of the proposed procedures is illustrated by applying them to Australian household expenditure data. The results show that during the past two decades spatial price variation has increased steadily, with the most recent period (2005-2009) witnessing a large increase. The results also show that the ranking of the states, on both cost of living and inequality, has altered significantly over the past two decades

    Ranking Archived Documents for Structured Queries on Semantic Layers

    Full text link
    Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitation

    Temporal effects in trend prediction: identifying the most popular nodes in the future

    Full text link
    Prediction is an important problem in different science domains. In this paper, we focus on trend prediction in complex networks, i.e. to identify the most popular nodes in the future. Due to the preferential attachment mechanism in real systems, nodes' recent degree and cumulative degree have been successfully applied to design trend prediction methods. Here we took into account more detailed information about the network evolution and proposed a temporal-based predictor (TBP). The TBP predicts the future trend by the node strength in the weighted network with the link weight equal to its exponential aging. Three data sets with time information are used to test the performance of the new method. We find that TBP have high general accuracy in predicting the future most popular nodes. More importantly, it can identify many potential objects with low popularity in the past but high popularity in the future. The effect of the decay speed in the exponential aging on the results is discussed in detail

    Early identification of important patents through network centrality

    Full text link
    One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effectively uncover these patents shortly after they are issued, we need to go beyond raw citation counts and take into account both the citation network topology and temporal information. In particular, an age-normalized measure of patent centrality, called rescaled PageRank, allows us to identify the significant patents earlier than citation count and PageRank score. In addition, we find that while high-impact patents tend to rely on other high-impact patents in a similar way as scientific papers, the patents' citation dynamics is significantly slower than that of papers, which makes the early identification of significant patents more challenging than that of significant papers.Comment: 14 page
    • …
    corecore