3,126 research outputs found

    Twenty-One at TREC-8: using Language Technology for Information Retrieval

    Get PDF
    This paper describes the official runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In filtering we extended the use of language models for document ranking with a relevance feedback algorithm for query term reweightin

    A Memory Contention Responsive Hash Join Algorithm Design and Implementation on Apache AsterixDB

    Get PDF
    Efficient data management is crucial in complex computer systems, and Database Management Systems (DBMS) are indispensable for handling and processing large datasets. In DBMSs that concurrently execute multiple queries, adapting to varying workloads is desirable. Yet, predicting the fluctuating quantity and size of queries in such environments proves challenging. Over-allocating resources to a single query can impede the execution of future queries while under-allocating resources to a query expecting increased workload can lead to significant processing delays. Moreover, join operations place substantial demands on memory. This resource’s availability fluctuates as queries enter and exit the DBMS. The development of join operators capable of dynamically adapting to memory fluctuations is a complex undertaking, with few recent authors proposing memory-adaptive algorithms. This scarcity of proposals suggests the inherent difficulty in designing, implementing, and analyzing such algorithms. This thesis proposes a new memory adaptive Hash-Based join algorithm extended from designs presented by prior authors. This algorithm is implemented and experimented with in a real DBMS environment to evaluate its memory fluctuation responsiveness. A mathematical model for the increase in I/O caused by it is proposed and compared with actual results. The impacts of memory variation and frequence of memory updates reveal the importance of this thesis for further development of memory adaptive algorithms

    An economic energy approach for queries on data centers

    Get PDF
    Energy consumption is an issue that involves all of us, both as individuals and as members of a society, and covers all our areas of activity. It is something so broad that its impact has important reflections on our social, cultural and financial structures. The domain of software, and in particular database systems, is not an exception. Although it seems to be a little bit strange to study the energy consumption of just one query, when we consider the execution of a a few thousand queries per second, quickly we see the importance of the querying consumption in the monthly account of any company that has a conventional data center. To demonstrate the energy consumption of queries in data centers, we idealized a small dashboard for monitoring and analyzing the sales of a company, and implemented all the queries needed for populating it and ensuring its operation. The queries were organized into two groups, oriented especially to two distinct database management systems: one relational (MySQL) and one non relational (Neo4J). The goal is to evaluate the energy consumption of different types of queries, and at the same time compare it in terms of relational and non-relational database approaches. This paper relates the process we implemented to set up the energy consumption application scenario, measure the energy consumption of each query, and present our first preliminary results

    Use-cases on evolution

    Get PDF
    This report presents a set of use cases for evolution and reactivity for data in the Web and Semantic Web. This set is organized around three different case study scenarios, each of them is related to one of the three different areas of application within Rewerse. Namely, the scenarios are: “The Rewerse Information System and Portal”, closely related to the work of A3 – Personalised Information Systems; “Organizing Travels”, that may be related to the work of A1 – Events, Time, and Locations; “Updates and evolution in bioinformatics data sources” related to the work of A2 – Towards a Bioinformatics Web

    A fuzzy kernel c-means clustering model for handling concept drift in regression

    Full text link
    © 2017 IEEE. Concept drift, given the huge volume of high-speed data streams, requires traditional machine learning models to be self-adaptive. Techniques to handle drift are especially needed in regression cases for a wide range of applications in the real world. There is, however, a shortage of research on drift adaptation for regression cases in the literature. One of the main obstacles to further research is the resulting model complexity when regression methods and drift handling techniques are combined. This paper proposes a self-adaptive algorithm, based on a fuzzy kernel c-means clustering approach and a lazy learning algorithm, called FKLL, to handle drift in regression learning. Using FKLL, drift adaptation first updates the learning set using lazy learning, then fuzzy kernel c-means clustering is used to determine the most relevant learning set. Experiments show that the FKLL algorithm is better able to respond to drift as soon as the learning sets are updated, and is also suitable for dealing with reoccurring drift, when compared to the original lazy learning algorithm and other state-of-the-art regression methods
    • …
    corecore