21 research outputs found

    Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

    Full text link
    The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure

    Instant-Fuzzy search using Phrase Indexing and Segmentation with Proximity Ranking

    Get PDF
    Quick search is an information-retrieval in which a system finds answers to a query instantly whenever a user types query character-by-character. Now a days, instant search is basically beneficial task for the user to get effective responses to the query when user typing a query in search engine. Fuzzy search used to improve user search familiarities by finding relevant answers with keywords similar to query keywords. We are using phrase inception value which is used to limit the answer set generated by instant fuzzy search. For that main challenge is that to improve the speed of performance as well as minimize answer set to retrieval of desired documents for the user query. At the same time, we also need better ranking functions that consider the proximity of keywords to compute relevance scores. In this paper, we study how to compute proximity information into ranking in instant-fuzzy search while achieving efficient time and space complexities. A phrase base indexing technique is used to overcome the space and time limitations of these solutions, we propose an approach that focuses on mutual phrases in the database. We study how to index these phrase threshold value and compare user threshold for effective answer set and develop an computational algorithm for workwise segmenting a query into phrases and computing these phrases using algorithm to find related answers to the user query

    A comparison of automatic search query enhancement algorithms that utilise Wikipedia as a source of a priori knowledge

    Get PDF
    This paper describes the benchmarking and analysis of five Automatic Search Query Enhancement (ASQE) algorithms that utilise Wikipedia as the sole source for a priori knowledge. The contributions of this paper include: 1) A comprehensive review into current ASQE algorithms that utilise Wikipedia as the sole source for a priori knowledge; 2) benchmarking of five existing ASQE algorithms using the TREC-9 Web Topics on the ClueWeb12 data set and 3) analysis of the results from the benchmarking process to identify the strengths and weaknesses each algorithm. During the benchmarking process, 2,500 relevance assessments were performed. Results of these tests are analysed using the Average Precision @10 per query and Mean Average Precision @10 per algorithm. From this analysis we show that the scope of a priori knowledge utilised during enhancement and the available term weighting methods available from Wikipedia can further aid the ASQE process. Although approaches taken by the algorithms are still relevant, an over dependence on weighting schemes and data sources used can easily impact results of an ASQE algorithm

    Learning by example : training users with high-quality query suggestions

    Get PDF
    The queries submitted by users to search engines often poorly describe their information needs and represent a potential bottleneck in the system. In this paper we investigate to what extent it is possible to aid users in learning how to formulate better queries by providing examples of high-quality queries interactively during a number of search sessions. By means of several controlled user studies we collect quantitative and qualitative evidence that shows: (1) study participants are able to identify and abstract qualities of queries that make them highly effective, (2) after seeing high-quality example queries participants are able to themselves create queries that are highly effective, and, (3) those queries look similar to expert queries as defined in the literature. We conclude by discussing what the findings mean in the context of the design of interactive search systems

    Semantics-based approach for generating partial views from linked life-cycle highway project data

    Get PDF
    The purpose of this dissertation is to develop methods that can assist data integration and extraction from heterogeneous sources generated throughout the life-cycle of a highway project. In the era of computerized technologies, project data is largely available in digital format. Due to the fragmented nature of the civil infrastructure sector, digital data are created and managed separately by different project actors in proprietary data warehouses. The differences in the data structure and semantics greatly hinder the exchange and fully reuse of digital project data. In order to address those issues, this dissertation carries out the following three individual studies. The first study aims to develop a framework for interconnecting heterogeneous life cycle project data into an unified and linked data space. This is an ontology-based framework that consists of two phases: (1) translating proprietary datasets into homogeneous RDF data graphs; and (2) connecting separate data networks to each other. Three domain ontologies for design, construction, and asset condition survey phases are developed to support data transformation. A merged ontology that integrates the domain ontologies is constructed to provide guidance on how to connect data nodes from domain graphs. The second study is to deal with the terminology inconsistency between data sources. An automated method is developed that employs Natural Language Processing (NLP) and machine learning techniques to support constructing a domain specific lexicon from design manuals. The method utilizes pattern rules to extract technical terms from texts and learns their representation vectors using a neural network based word embedding approach. The study also includes the development of an integrated method of minimal-supervised machine learning, clustering analysis, and word vectors, for computing the term semantics and classifying the relations between terms in the target lexicon. In the last study, a data retrieval technique for extracting subsets of an XML civil data schema is designed and tested. The algorithm takes a keyword input of the end user and returns a ranked list of the most relevant XML branches. This study utilizes a lexicon of the highway domain generated from the second study to analyze the semantics of the end user keywords. A context-based similarity measure is introduced to evaluate the relevance between a certain branch in the source schema and the user query. The methods and algorithms resulting from this research were tested using case studies and empirical experiments. The results indicate that the study successfully address the heterogeneity in the structure and terminology of data and enable a fast extraction of sub-models of data. The study is expected to enhance the efficiency in reusing digital data generated throughout the project life-cycle, and contribute to the success in transitioning from paper-based to digital project delivery for civil infrastructure projects

    High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors

    Get PDF
    Current trends in processor architectures increasingly include more cores on a single chip and more complex memory hierarchies, and such a trend is likely to continue in the foreseeable future. These processors offer unprecedented opportunities for speeding up demanding computations if the available resources can be effectively utilized. Simultaneously, parallel programming languages such as OpenMP and MPI have been commonly used on clusters of multicore CPUs while newer programming languages such as OpenCL and CUDA have been widely adopted on recent heterogeneous systems and GPUs respectively. The main goal of this dissertation is to develop techniques and methodologies for exploiting these emerging parallel architectures and parallel programming languages to solve large scale irregular applications such as the construction of inverted files. The extraction of inverted files from large collections of documents forms a critical component of all information retrieval systems including web search engines. In this problem, the disk I/O throughput is the major performance bottleneck especially when intermediate results are written onto disks. In addition to the I/O bottleneck, a number of synchronization and consistency issues must be resolved in order to build the dictionary and postings lists efficiently. To address these issues, we introduce a dictionary data structure using a hybrid of trie and B-trees and a high-throughput pipeline strategy that completely avoids the use of disks as temporary storage for intermediate results, while ensuring the consumption of the input data at a high rate. The high-throughput pipelined strategy produces parallel parsed streams that are consumed at the same rate by parallel indexers. The pipelined strategy is implemented on a single multicore CPU as well as on a cluster of such nodes. We were able to achieve a throughput of more than 262MB/s on the ClueWeb09 dataset on a single node. On a cluster of 32 nodes, our experimental results show scalable performance using different metrics, significantly improving on prior published results. On the other hand, we develop a new approach for handling time-evolving documents using additional small temporal indexing structures. The lifetime of the collection is partitioned into multiple time windows, which guarantees a very fast temporal query response time at a small space overhead relative to the non-temporal case. Extensive experimental results indicate that the overhead in both indexing and querying is small in this more complicated case, and the query performance can indeed be improved using finer temporal partitioning of the collection. Finally, we employ GPUs to accelerate the indexing process for building inverted files and to develop a very fast algorithm for the highly irregular list ranking problem. For the indexing problem, the workload is split between CPUs and GPUs in such a way that the strengths of both architectures are exploited. For the list ranking problem involved in the decompression of inverted files, an optimized GPU algorithm is introduced by reducing the problem to a large number of fine grain computations in such a way that the processing cost per element is shown to be close to the best possible
    corecore