4,262 research outputs found

    DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams

    Get PDF
    Similarity matching and join of time series data streams has gained a lot of relevance in today's world that has large streaming data. This process finds wide scale application in the areas of location tracking, sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream increases, the cost involved to retain all the data in order to aid the process of similarity matching also increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into a compact representation such that it retains all the crucial information by a technique called Multi-level Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have performed exhaustive experimental trials to show that the proposed framework is both efficient and competent in comparison with earlier works.Comment: 20 pages,8 figures, 6 Table

    Data-Mining Synthesised Schedulers for Hard Real-Time Systems

    Get PDF
    The analysis of hard real-time systems, traditionally performed using RMA/PCP or simulation, is nowadays also studied as a scheduler synthesis problem, where one automatically constructs a scheduler which can guarantee avoidance of deadlock and deadline-miss system states. Even though this approach has the potential for a finer control of a hard real-time system, using fewer resources and easily adapting to further quality aspects (memory/energy consumption, jitter minimisation, etc.), synthesised schedulers are usually extremely large and difficult to understand. Their big size is a consequence of their inherent precision, since they attempt to describe exactly the frontier among the safe and unsafe system states. It nevertheless hinders their application in practise, since it is extremely difficult to validate them or to use them for better understanding the behaviour of the system. In this paper, we show how one can adapt data-mining techniques to decrease the size of a synthesised scheduler and force its inherent structure to appear, thus giving the system designer a wealth of additional information for understanding and optimising the scheduler and the underlying system. We present, in particular, how it can be used for obtaining hints for a good task distribution to different processing units, for optimising the scheduler itself (sometimes even removing it altogether in a safe manner) and obtaining both per-task and per-system views of the schedulability of the system

    Semantic Image Collection Summarization with Frequent Subgraph Mining

    Get PDF
    Applications such as providing a preview of personal albums (e.g., Google Photos) or suggesting thematic collections based on user interests (e.g., Pinterest) require a semantically-enriched image representation, which should be more informative with respect to simple low-level visual features and image tags. To this aim, we propose an image collection summarization technique based on frequent subgraph mining. We represent images with a novel type of scene graphs including fine-grained relationship types between objects. These scene graphs are automatically derived by our method. The resulting summary consists of a set of frequent subgraphs describing the underlying patterns of the image dataset. Our results are interpretable and provide more powerful semantic information with respect to previous techniques, in which the summary is a subset of the collection in terms of images or image patches. The experimental evaluation shows that the proposed technique yields non-redundant summaries, with a high diversity of the discovered patterns

    Iteratively Learning Embeddings and Rules for Knowledge Graph Reasoning

    Full text link
    Reasoning is essential for the development of large knowledge graphs, especially for completion, which aims to infer new triples based on existing ones. Both rules and embeddings can be used for knowledge graph reasoning and they have their own advantages and difficulties. Rule-based reasoning is accurate and explainable but rule learning with searching over the graph always suffers from efficiency due to huge search space. Embedding-based reasoning is more scalable and efficient as the reasoning is conducted via computation between embeddings, but it has difficulty learning good representations for sparse entities because a good embedding relies heavily on data richness. Based on this observation, in this paper we explore how embedding and rule learning can be combined together and complement each other's difficulties with their advantages. We propose a novel framework IterE iteratively learning embeddings and rules, in which rules are learned from embeddings with proper pruning strategy and embeddings are learned from existing triples and new triples inferred by rules. Evaluations on embedding qualities of IterE show that rules help improve the quality of sparse entity embeddings and their link prediction results. We also evaluate the efficiency of rule learning and quality of rules from IterE compared with AMIE+, showing that IterE is capable of generating high quality rules more efficiently. Experiments show that iteratively learning embeddings and rules benefit each other during learning and prediction.Comment: This paper is accepted by WWW'1

    Software for supporting large scale data processing for High Throughput Screening

    Get PDF
    High Throughput Screening for is a valuable data generation technique for data driven knowledge discovery. Because the rate of data generation is so great, it is a challenge to cope with the demands of post experiment data analysis. This thesis presents three software solutions that I implemented in an attempt to alleviate this problem. The first is K-Screen, a Laboratory Information Management System designed to handle and visualize large High Throughput Screening datasets. K-Screen is being successfully used by the University of Kansas High Throughput Screening Laboratory to better organize and visualize their data. The next two algorithms are designed to accelerate the search times for chemical similarity searches using 1-dimensional fingerprints. The first algorithm balances information content in bit strings to attempt to find more optimal ordering and segmentation patterns for chemical fingerprints. The second algorithm eliminates redundant pruning calculations for large batch chemical similarity searches and shows a 250% improvement for the fastest current fingerprint search algorithm for large batch queries
    • …
    corecore