141 research outputs found

    Lifecycle-Aware Online Video Caching

    Get PDF
    The current explosion of video traffic compels service providers to deploy caches at edge networks. Nowadays, most caching systems store data with a high programming voltage corresponding to the largest possible ‘expiry date’, typically on the order of years, which maximizes the cache damage. However, popular videos rarely exhibit lifecycles longer than a couple of months. Consequently, the programming voltage can instead be adapted to fit the lifecycle and mitigate the cache damage accordingly. In this paper, we propose LiA-cache, a Lifecycle-Aware caching policy for online videos. LiA-cache finds both near-optimal caching retention times and cache eviction policies by optimizing traffic delivery cost and cache damage cost conjointly. We first investigate temporal patterns of video access from a real-world dataset covering 10 million online videos collected by one of the largest mobile network operators in the world. We next cluster the videos based on their access lifecycles and integrate the clustering into a general model of the caching system. Specifically, LiA-cache analyzes videos and caches them depending on their cluster label. Compared to other popular policies in real-world scenarios, LiA-cache can reduce cache damage up to 90%, while keeping a cache hit ratio close to a policy purely relying on video popularity.Peer reviewe

    Effective retrieval and new indexing method for case based reasoning: Application in chemical process design

    Get PDF
    In this paper we try to improve the retrieval step for case based reasoning for preliminary design. This improvement deals with three major parts of our CBR system. First, in the preliminary design step, some uncertainties like imprecise or unknown values remain in the description of the problem, because they need a deeper analysis to be withdrawn. To deal with this issue, the faced problem description is soften with the fuzzy sets theory. Features are described with a central value, a percentage of imprecision and a relation with respect to the central value. These additional data allow us to build a domain of possible values for each attributes. With this representation, the calculation of the similarity function is impacted, thus the characteristic function is used to calculate the local similarity between two features. Second, we focus our attention on the main goal of the retrieve step in CBR to find relevant cases for adaptation. In this second part, we discuss the assumption of similarity to find the more appropriated case. We put in highlight that in some situations this classical similarity must be improved with further knowledge to facilitate case adaptation. To avoid failure during the adaptation step, we implement a method that couples similarity measurement with adaptability one, in order to approximate the cases utility more accurately. The latter gives deeper information for the reusing of cases. In a last part, we present a generic indexing technique for the base, and a new algorithm for the research of relevant cases in the memory. The sphere indexing algorithm is a domain independent index that has performances equivalent to the decision tree ones. But its main strength is that it puts the current problem in the center of the research area avoiding boundaries issues. All these points are discussed and exemplified through the preliminary design of a chemical engineering unit operation

    Improving k-nn search and subspace clustering based on local intrinsic dimensionality

    Get PDF
    In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks. Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset. Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces

    Global typologies of coastal wetland status to inform conservation and management

    Get PDF
    Global-scale conservation initiatives and policy instruments rely on ecosystem indicators to track progress towards targets and objectives. A deeper understanding of indicator interrelationships would benefit these efforts and help characterize ecosystem status. We study interrelationships among 34 indicators for mangroves, saltmarsh, and seagrass ecosystems, and develop data-driven, spatially explicit typologies of coastal wetland status at a global scale. After accounting for environmental covariates and gap-filling missing data, we obtained two levels of clustering at 5 and 18 typologies, providing outputs at different scales for different end users. We generated 2,845 cells (1° (lat) × 1° (long)) globally, of which 29.7% were characterized by high land- and marine-based impacts and a high proportion of threatened species, 13.5% by high climate-based impacts, and 9.6% were refuges with lower impacts, high fish density and a low proportion of threatened species. We identify instances where specific actions could have positive outcomes for coastal wetlands across regions facing similar issues. For example, land- and marine-based threats to coastal wetlands were associated with ecological structure and function indicators, suggesting that reducing these threats may reduce habitat degradation and threats to species persistence. However, several interdimensional relationships might be affected by temporal or spatial mismatches in data. Weak relationships mean that global biodiversity maps that categorize areas by single indicators (such as threats or trends in habitat size) may not be representative of changes in other indicators (e.g., ecosystem function). By simplifying the complex global mosaic of coastal wetland status and identifying regions with similar issues that could benefit from knowledge exchange across national boundaries, we help set the scene for globally and regionally coordinated conservation

    Tree-based Partition Querying: A Methodology for Computing Medoids in Large Spatial Datasets

    Get PDF
    Besides traditional domains (e.g., resource allocation, data mining applications), algorithms for medoid computation and related problems will play an important role in numerous emerging fields, such as location based services and sensor networks. Since the k-medoid problem is NP-hard, all existing work deals with approximate solutions on relatively small datasets. This paper aims at efficient methods for very large spatial databases, motivated by: (1) the high and ever increasing availability of spatial data, and (2) the need for novel query types and improved services. The proposed solutions exploit the intrinsic grouping properties of a data partition index in order to read only a small part of the dataset. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands). In addition, we study medoid-aggregate queries, where k is not known in advance, but we are asked to compute a medoid set that leads to an average distance close to a user-specified value. Similarly, medoid-optimization queries aim at minimizing both the number of medoids k and the average distance. We also consider the max version for the aforementioned problems, where the goal is to minimize the maximum (instead of the average) distance between any object and its closest medoid. Finally, we investigate bichromatic and weighted medoid versions for all query types, as well as, maximum capacity and dynamic medoids
    corecore