170 research outputs found
Recommended from our members
Heuristics and multi-dimensional physical database design
An expert system approach has recently been used in parameter selection for VSAM (Virtual Storage Access Method) file organisation [AL87a]. This system has been developed to aid in-house users to apply relevant facts and heuristics to optimise VSAM file design. Multi-dimensional physical
database design is more sophisticated and complicated than VSAM file design. The expert system approach can be applied to select and tune physical database design for various applications.
A great deal of work has been done in developing diverse algorithms or access methods to organise automated information on secondary storage devices [FA86b] [FR86] [FR88] [GU84] [HU88a] [KS88a] [KS86] [L087] [NI84] [OR88b] [OR86] [OT85] [R081], etc. However, little work has been done to enable designers to select an access method which matches a projected application profile (features and requirements) and perceived strengths and weaknesses of candidate algorithms. This thesis considers a number of grid based algorithms and makes expert assessments of each according to its strengths and weaknesses. It analyses features of various access methods and using expert knowledge matches features for a range of m-d (multi dimensional) algorithms with corresponding characteristics of an application. The knowledge-based system presented in this thesis can be applied either manually or computerised to give a systematic approach to m-d algorithm selection. A system is proposed to (1) heuristically select an initial algorithm; (2) describe how the selection process is evaluated against actual m-d algorithm performance and (3) show how the results of the evaluation can be used to refine expert knowledge embodied in the selection system. Heuristic assessments are given for several m-d access algorithms. Examples are
presented to show how these heuristics are used to select a m-d access algorithm for a specific application. It is reasonable to suppose that the initial heuristic assessments are not entirely accurate. A tuning mechanism for the system heuristics is given in section 4.9. The system selection process is thereby, able to adjust to real world results. Finally, we present a simple example to illustrate how the proposed system works
FPGA-based architectures for next generation communications networks
This engineering doctorate concerns the application of Field Programmable Gate Array (FPGA) technology to some of the challenges faced in the design of next generation communications networks. The growth and convergence of such networks has fuelled demand for higher bandwidth systems, and a requirement to support a diverse range of payloads across the network span.
The research which follows focuses on the development of FPGA-based architectures for two important paradigms in contemporary networking - Forward Error Correction and Packet Classification. The work seeks to combine analysis of the underlying algorithms and mathematical techniques which drive these applications, with an informed approach to the design of efficient FPGA-based circuits
Random hypergraphs for hashing-based data structures
This thesis concerns dictionaries and related data structures that rely on providing several random possibilities for storing each key. Imagine information on a set S of m = |S| keys should be stored in n memory locations, indexed by [n] = {1,…,n}. Each object x [ELEMENT OF] S is assigned a small set e(x) [SUBSET OF OR EQUAL TO] [n] of locations by a random hash function, independent of other objects. Information on x must then be stored in the locations from e(x) only. It is possible that too many objects compete for the same locations, in particular if the load c = m/n is high. Successfully storing all information may then be impossible. For most distributions of e(x), however, success or failure can be predicted very reliably, since the success probability is close to 1 for loads c less than a certain load threshold c^* and close to 0 for loads greater than this load threshold. We mainly consider two types of data structures: • A cuckoo hash table is a dictionary data structure where each key x [ELEMENT OF] S is stored together with an associated value f(x) in one of the memory locations with an index from e(x). The distribution of e(x) is controlled by the hashing scheme. We analyse three known hashing schemes, and determine their exact load thresholds. The schemes are unaligned blocks, double hashing and a scheme for dynamically growing key sets. • A retrieval data structure also stores a value f(x) for each x [ELEMENT OF] S. This time, the values stored in the memory locations from e(x) must satisfy a linear equation that characterises the value f(x). The resulting data structure is extremely compact, but unusual. It cannot answer questions of the form “is y [ELEMENT OF] S?”. Given a key y it returns a value z. If y [ELEMENT OF] S, then z = f(y) is guaranteed, otherwise z may be an arbitrary value. We consider two new hashing schemes, where the elements of e(x) are contained in one or two contiguous blocks. This yields good access times on a word RAM and high cache efficiency. An important question is whether these types of data structures can be constructed in linear time. The success probability of a natural linear time greedy algorithm exhibits, once again, threshold behaviour with respect to the load c. We identify a hashing scheme that leads to a particularly high threshold value in this regard. In the mathematical model, the memory locations [n] correspond to vertices, and the sets e(x) for x [ELEMENT OF] S correspond to hyperedges. Three properties of the resulting hypergraphs turn out to be important: peelability, solvability and orientability. Therefore, large parts of this thesis examine how hyperedge distribution and load affects the probabilities with which these properties hold and derive corresponding thresholds. Translated back into the world of data structures, we achieve low access times, high memory efficiency and low construction times. We complement and support the theoretical results by experiments.Diese Arbeit behandelt Wörterbücher und verwandte Datenstrukturen, die darauf aufbauen, mehrere zufällige Möglichkeiten zur Speicherung jedes Schlüssels vorzusehen. Man stelle sich vor, Information über eine Menge S von m = |S| Schlüsseln soll in n Speicherplätzen abgelegt werden, die durch [n] = {1,…,n} indiziert sind. Jeder Schlüssel x [ELEMENT OF] S bekommt eine kleine Menge e(x) [SUBSET OF OR EQUAL TO] [n] von Speicherplätzen durch eine zufällige Hashfunktion unabhängig von anderen Schlüsseln zugewiesen. Die Information über x darf nun ausschließlich in den Plätzen aus e(x) untergebracht werden. Es kann hierbei passieren, dass zu viele Schlüssel um dieselben Speicherplätze konkurrieren, insbesondere bei hoher Auslastung c = m/n. Eine erfolgreiche Speicherung der Gesamtinformation ist dann eventuell unmöglich. Für die meisten Verteilungen von e(x) lässt sich Erfolg oder Misserfolg allerdings sehr zuverlässig vorhersagen, da für Auslastung c unterhalb eines gewissen Auslastungsschwellwertes c* die Erfolgswahrscheinlichkeit nahezu 1 ist und für c jenseits dieses Auslastungsschwellwertes nahezu 0 ist. Hauptsächlich werden wir zwei Arten von Datenstrukturen betrachten: • Eine Kuckucks-Hashtabelle ist eine Wörterbuchdatenstruktur, bei der jeder Schlüssel x [ELEMENT OF] S zusammen mit einem assoziierten Wert f(x) in einem der Speicherplätze mit Index aus e(x) gespeichert wird. Die Verteilung von e(x) wird hierbei vom Hashing-Schema festgelegt. Wir analysieren drei bekannte Hashing-Schemata und bestimmen erstmals deren exakte Auslastungsschwellwerte im obigen Sinne. Die Schemata sind unausgerichtete Blöcke, Doppel-Hashing sowie ein Schema für dynamisch wachsenden Schlüsselmengen. • Auch eine Retrieval-Datenstruktur speichert einen Wert f(x) für alle x [ELEMENT OF] S. Diesmal sollen die Werte in den Speicherplätzen aus e(x) eine lineare Gleichung erfüllen, die den Wert f(x) charakterisiert. Die entstehende Datenstruktur ist extrem platzsparend, aber ungewöhnlich: Sie ist ungeeignet um Fragen der Form „ist y [ELEMENT OF] S?“ zu beantworten. Bei Anfrage eines Schlüssels y wird ein Ergebnis z zurückgegeben. Falls y [ELEMENT OF] S ist, so ist z = f(y) garantiert, andernfalls darf z ein beliebiger Wert sein. Wir betrachten zwei neue Hashing-Schemata, bei denen die Elemente von e(x) in einem oder in zwei zusammenhängenden Blöcken liegen. So werden gute Zugriffszeiten auf Word-RAMs und eine hohe Cache-Effizienz erzielt. Eine wichtige Frage ist, ob Datenstrukturen obiger Art in Linearzeit konstruiert werden können. Die Erfolgswahrscheinlichkeit eines naheliegenden Greedy-Algorithmus weist abermals ein Schwellwertverhalten in Bezug auf die Auslastung c auf. Wir identifizieren ein Hashing-Schema, das diesbezüglich einen besonders hohen Schwellwert mit sich bringt. In der mathematischen Modellierung werden die Speicherpositionen [n] als Knoten und die Mengen e(x) für x [ELEMENT OF] S als Hyperkanten aufgefasst. Drei Eigenschaften der entstehenden Hypergraphen stellen sich dann als zentral heraus: Schälbarkeit, Lösbarkeit und Orientierbarkeit. Weite Teile dieser Arbeit beschäftigen sich daher mit den Wahrscheinlichkeiten für das Vorliegen dieser Eigenschaften abhängig von Hashing Schema und Auslastung, sowie mit entsprechenden Schwellwerten. Eine Rückübersetzung der Ergebnisse liefert dann Datenstrukturen mit geringen Anfragezeiten, hoher Speichereffizienz und geringen Konstruktionszeiten. Die theoretischen Überlegungen werden dabei durch experimentelle Ergebnisse ergänzt und gestützt
Recommended from our members
An investigation to study the feasibility of on-line bibliographic information retrieval system using an APP
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis reports an investigation on the feasibility study of a
searching mechanism using an APP suitable for an on-line bibliographic
retrieval, operation, especially for retrospective searches.
From the study of the searching methods used in the conventional
systems it is seen that elaborate file- and data- structures are
introduced to improve the response time of the system. These
consequently lead to software and hardware redundancies. To mask
these complexities of the system an expensive computer with higher
capabilities and more powerful instruction set is commonly used.
Thus the service of the systen becomes cost-ineffective.
On the other hand the primitive operations of a searching mechanism,
such as, association, domain selection, intersection and unions, are
the intrinsic features of an associative parallel processor. Therefore
it is important to establish the feasibility of an APP as a cost-effective
searching mechanise.
In this thesis a searching mechanism using an 'ON-THE-FLY' searching
technique has been proposed. The parallel search unit uses a Byte-oriented
VRL-APP for efficient character string processing.
At the time of undertaking this work the specification for neither the
retrieval systems nor the BO-VRL APP's were well established; hence a
two-phase investigation was originated. In the Phase I of the work a
bottom up approach was adopted to derive a formal and precise
specification for the BO-VRL-APP. During the Phase II of the work
a top-down approach was opted for the implementation of the searching
mechanism.
An experimental research vehicle has been developed to establish
the feasibility of an APP as a cost-effective searching mechanism.
Although rigorous proof of the feasibility has not been obtained,
the thesis establishes that the APP is well suited for on-line
bibligraphic information retrieval operations where substring searches
including boolean selection and threshold weights are efficiently
supported
A Peer-to-Peer Network Framework Utilising the Public Mobile Telephone Network
P2P (Peer-to-Peer) technologies are well established and have now become accepted as a mainstream networking approach. However, the explosion of participating users has not been replicated within the mobile networking domain. Until recently the lack of suitable hardware and wireless network infrastructure to support P2P activities was perceived as contributing to the problem. This has changed with ready availability of handsets having ample processing resources utilising an almost ubiquitous mobile telephone network. Coupled with this has been a proliferation of software applications written for the more capable `smartphone' handsets. P2P systems have not naturally integrated and evolved into the mobile telephone ecosystem in a way that `client-server' operating techniques have. However as the number of clients for a particular mobile application increase, providing the `server side' data storage infrastructure becomes more onerous. P2P systems offer mobile telephone applications a way to circumvent this data storage issue by dispersing it across a network of the participating users handsets.
The main goal of this work was to produce a P2P Application Framework that supports developers in creating mobile telephone applications that use distributed storage. Effort was assigned to determining appropriate design requirements for a mobile handset based P2P system. Some of these requirements are related to the limitations of the host hardware, such as power consumption. Others relate to the network upon which the handsets operate, such as connectivity. The thesis reviews current P2P technologies to assess which was viable to form the technology foundations for the framework. The aim was not to re-invent a P2P system design, rather to adopt an existing one for mobile operation. Built upon the foundations of a prototype application, the P2P framework resulting from modifications and enhancements grants access via a simple API (Applications Programmer Interface) to a subset of Nokia `smartphone' devices. Unhindered operation across all mobile telephone networks is possible through a proprietary application implementing NAT (Network Address Translation) traversal techniques.
Recognising that handsets operate with limited resources, further optimisation of the P2P framework was also investigated. Energy consumption was a parameter chosen for further examination because of its impact on handset participation time.
This work has proven that operating applications in conjunction with a P2P data storage framework, connected via the mobile telephone network, is technically feasible. It also shows that opportunity remains for further research to realise the full potential of this data storage technique
A Digital Forensic Readiness Approach for e-Supply Chain Systems
The internet has had a major impact on how information is shared within supply chains, and in
commerce in general. This has resulted in the establishment of information systems such as esupply
chains (eSCs) amongst others which integrate the internet and other information and
communications technology (ICT) with traditional business processes for the swift
transmission of information between trading partners. Many organisations have reaped the
benefits that come from adopting the eSC model, but have also faced the challenges with which
it comes. One such major challenge is information security. With the current state of
cybercrime, system developers are challenged with the task of developing cutting-edge digital
forensic readiness (DFR) systems that can keep up with current technological advancements,
such as eSCs. Hence, the research highlights the lack of a well-formulated eSC-DFR approach
that can assist system developers in the development of e-supply chain digital forensic
readiness systems. The main objective of such a system is that it must be able to provide law
enforcement/digital forensic investigators that operate on eSC platforms with forensically
sound and readily available potential digital evidence that can expedite and support digital
forensics incident-response processes. This approach, if implemented can also prepare trading
partners for security incidents that might take place, if not prevent them from occurring.
Therefore, the work presented in this research is aimed at providing a procedural approach that
is based on digital forensic principles for eSC system architects and eSC network service
providers to follow in the design of eSC-DFR tools. The author proposes an eSC-DFR process
model and eSC-DFR system architectural design that was implemented as part of this research
illustrating the concepts of evidence collection, evidence pre-analysis, evidence preservation,
system usability alongside other digital forensic principles and techniques. It is the view of the
authors that the conclusions drawn from this research can spearhead the development of
cutting-edge eSC-DFR systems that are intelligent, effective, user friendly and compliant with
international standards.Dissertation (MEng)--University of Pretoria, 2019.Computer ScienceMScUnrestricte
Recommended from our members
Understanding the characteristics of Internet traffic and designing an efficient RaptorQ-based data transport protocol for modern data centres
This thesis is the amalgamation of research on efficient data transport protocols for data centres and a comprehensive and systematic study of Internet traffic, which came as a result of the need to understand traffic patterns and workloads in modern computer networks.
The first part of the thesis is on the development of efficient data transport pro- tocols for data centres. We study modern data transport protocols for data centres through large scale simulations using the OMNeT++ simulator. We developed and experimented with an OMNeT++ model of NDP. This has led to the identification of limitations of the state of the art and the formulation of research questions with respect to data transport protocols for modern data centres. The developed model includes an implementation of a Fat-tree topology and per-packet ECMP load bal- ancing. We discuss how we integrated the model with the INET Framework and validated it by running various experiments that test different model parameters and components. This work revealed limitations of NDP with respect to efficient one-to-many and many-to-one communication in data centres, which led to the de- velopment of SCDP, a novel and general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, natively supports one-to-many and many-to-one data communication, which is extremely common in modern data centres. SCDP does so without compromising on efficiency for short and long unicast flows. SCDP achieves this by integrating RaptorQ codes with receiver-driven data transport, in-network packet trimming and Multi-Level Feed- back Queuing (MLFQ); (1) RaptorQ codes enable efficient one-to-many and many- to-one data transport; (2) on top of RaptorQ codes, receiver- driven flow control, in combination with in-network packet trimming, enable efficient usage of network re- sources as well as multi-path transport and packet spraying for all transport modes. Incast and Outcast are eliminated; (3) the systematic nature of RaptorQ codes, in combination with MLFQ, enable fast, decoding-free completion of short flows. We extensively evaluated SCDP in a wide range of simulated scenarios with realistic data centre workloads. For one-to-many and many-to-one transport sessions, SCDP performs significantly better than NDP. For short and long unicast flows, SCDP performs equally well or better compared to NDP.
In the second part of the thesis, we extensively study Internet traffic. Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that the log-normal distribution is a better fit than the Gaussian distribution. We also investigate a second, heavy- tailed distribution and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which are a poor fit for all tested distributions and show that this is often due to traffic outages or links that hit maximum capacity. Stationarity tests showed that the traffic is stationary at some range of aggregation times. We demonstrate the utility of the log-normal distribution in two contexts: predicting the proportion of time traffic will exceed a given level (for link capacity estimation) and predicting 95th percentile pricing. We also show the log-normal distribution is a better predictor than Gaussian orWeibull distributions
Fusion OLAP : Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP
OLAP models can be categorized with two types: MOLAP (multidimensional OLAP) and ROLAP (relational OLAP). In particular, MOLAP is efficient in multidimensional computing at the cost of cube maintenance, while ROLAP reduces the data storage size at the cost of expensive multidimensional join operations. In this paper, we propose a novel Fusion OLAP model to fuse the multidimensional computing model and relational storage model together to make the best aspects of both MOLAP and ROLAP worlds. This is achieved by mapping the relation tables into virtual multidimensional model and binding the multidimensional operations into a set of vector indexes to enable multidimensional computing on relation tables. The Fusion OLAP model can be integrated into the state-of-the-art in-memory databases with additional surrogate key indexes and vector indexes. We compared the Fusion OLAP implementations with three leading analytical in-memory databases. Our comprehensive experimental results show that Fusion OLAP implementation can achieve up to 35, 365, and 169 percent performance improvements based on the Hyper, Vectorwise, and MonetDB databases, respectively, for the Star Schema Benchmark (SSB) with scale factor 100.Peer reviewe
- …