449 research outputs found

    An Expert Systems Approach to Realtime, Active Management of a Target Resource

    Get PDF
    The application of expert systems techniques to process control domains represents a potential approach to managing the increasing complexity and dynamics which characterizes many process control environments. This thesis reports on one such application in a complex, multi-agent environment, with an eye toward generalization to other process control domains. The application concerns the automation of large computing system operation. The requirement for high availability, high performance, computing systems has created a demand for fast, consistent, expert quality response to operational problems, and effective, flexible automation of computer operations would satisfy this demand while improving the productivity of operations. However, like many process control environments, the computer operations environment is characterized by high complexity and frequent change, rendering it difficult to automate operations in traditional procedural software. These are among the characteristics which motivate an expert systems approach to automation. JESQ, the focus of this thesis, is a realtime expert system which continuously monitors the level of operating system queue space in a large computing system and takes corrective action as queue space diminishes. JESQ is one of several expert systems which comprise a system called Yorktown Expert System/MVS Manager (YES/MVS). YES/MVS automates many tasks in the domain of computer operations, and is among the first expert systems designed for continuous execution in realtime. The expert system is currently running at the IBM Thomas J. Watson Research Center, and has received a favorable response from operations staff. The thesis concentrates on several related issues. The requirements which distinguish continuous realtime expert systems that exert active control over their environments from more conventional session-oriented expert systems are identified, and strategies for meeting these requirements are described. An alternative methodology for managing large computing installations is presented. The problems of developing and testing a realtime expert system in an industrial environment are described

    Examining Unstable Approach Predictors Using Flight Data Monitoring Information

    Get PDF
    The approach and landing phase of flight is statistically the most dangerous part of flying. While it only accounts for 4% of flight time, it represents 49% of commercial jet mishaps. One key to mitigating the risks involved in this flight segment is the stabilized approach. A stabilized approach requires meeting rigorous standards for many flight parameters as the aircraft nears landing. Exceeding any of these parameters results in an unstable approach (UA). The energy management (EM) accomplished by the flight crew, represented by the EM variables in the study, influences the execution of a stabilized approach. While EM is a critical element of executing a stabilized approach, there appears to be a lack of studies that identify specific EM variables that contribute to UA probability. Additionally, several possible moderating variables (MVs) may affect the probability of a UA. Fortunately, modern jet transport aircraft have Flight Data Monitoring (FDM) systems that capture a wealth of information that enable the analysis of these EM variables. This study used FDM data to answer the questions about what influence a set of EM variables has on the probability of a UA event. The analysis also determined what impact a set of possible MVs, not directly related to EM, has on these EM variables influence. The analysis used logistic regression (LR) to investigate FDM information. The LR provided estimations of odds ratios for each of the variables and the interaction factors for the MVs. These statistics defined a model to evaluate the influences of the EM and MVs, providing answers to the research questions posed. The results determined the model was a good fit to the data but had poor discrimination. The model supported three of the original seven EM hypotheses and none of the 28 MV hypotheses. The study identified three specific EM variables that significantly influenced the probability of a UA event. Of the MVs, only one significant influence was revealed but was opposite that hypothesized. Identifying the EM variables, and examining their impacts, shows their importance in preventing UAs. Further, the results help prevent future UAs by informing the design of training programs. Additionally, the current effort fills gaps in the current body of knowledge, as there appears to be a lack of studies in the areas investigated. A gap in the body of knowledge filled by investigating an area of limited research and the results provide practical application in the analysis of EM-related events. Aviation safety practitioners now have additional information to identify trend issues that may lead to the increased probability of a UA event. Finally, this study was one of very few granted access to actual operational FDM information by an air carrier. The data were crucial in evaluating the proposed model against real-world flight operations, comparing theory to reality. Without access to such closely held information, the research for this dissertation would not have been possible

    Enhancing Query Processing on Stock Market Cloud-based Database

    Get PDF
    Cloud computing is rapidly expanding because it allows users to save the development and implementation time on their work. It also reduces the maintenance and operational costs of the used systems. Furthermore, it enables the elastic use of any resource rather than estimating workload, which may be inaccurate, as database systems can benefit from such a trend. In this paper, we propose an algorithm that allocates the materialized view over cloud-based replica sets to enhance the database system\u27s performance in stock market using a Peer-to-Peer architecture. The results show that the proposed model improves the query processing time and network transfer cost by distributing the materialized views over cloud-based replica sets. Also, it has a significant effect on decision-making and achieving economic returns

    Scaling kNN queries using statistical learning

    Get PDF
    The k-Nearest Neighbour (kNN) method is a fundamental building block for many sophisticated statistical learning models and has a wide application in different fields; for instance, in kNN regression, kNN classification, multi-dimensional items search, location-based services, spatial analytics, etc. However, nowadays with the unprecedented spread of data generated by computing and communicating devices has resulted in a plethora of low-dimensional large-scale datasets and their users' community, the need for efficient and scalable kNN processing is pressing. To this end, several parallel and distributed approaches and methodologies for processing exact kNN in low-dimensional large-scale datasets have been proposed; for example Hadoop-MapReduce-based kNN query processing approaches such as Spatial-Hadoop (SHadoop), and Spark-based approaches like Simba. This thesis contributes with a variety of methodologies for kNN query processing based on statistical and machine learning techniques over large-scale datasets. This study investigates the exact kNN query performance behaviour of the well-known Big Data Systems, SHadoop and Simba, that proposes building multi-dimensional Global and Local Indexes over low dimensional large-scale datasets. The rationale behind such methods is that when executing exact kNN query, the Global and Local indexes access a small subset of a large-scale dataset stored in a distributed file system. The Global Index is used to prune out irrelevant subsets of the dataset; while the multiple distributed Local Indexes are used to prune out unnecessary data elements of a partition (subset). The kNN execution algorithm of SHadoop and Simba involves loading data elements that reside in the relevant partitions from disks/network points to memory. This leads to significantly high kNN query response times; so, such methods are not suitable for low-latency applications and services. An extensive literature review showed that not enough attention has been given to access relatively small-sized but relevant data using kNN query only. Based on this limitation, departing from the traditional kNN query processing methods, this thesis contributes two novel solutions: Coordinator With Index (COWI) and Coordinator with No Index(CONI) approaches. The essence of both approaches rests on adopting a coordinator-based distributed processing algorithm and a way to structure computation and index the stored datasets that ensures that only a very small number of pieces of data are retrieved from the underlying data centres, communicated over the network, and processed by the coordinator for every kNN query. The expected outcome is that scalability is ensured and kNN queries can be processed in just tens of milliseconds. Both approaches are implemented using a NoSQL Database (HBase) achieving up to three orders of magnitude of performance gain compared with state of the art methods -SHadoop and Simba. It is common practice that the current state-of-the-art approaches for exact kNN query processing in low-dimensional space use Tree-based multi-dimensional Indexing methods to prune out irrelevant data during query processing. However, as data sizes continue to increase, (nowadays it is not uncommon to reach several Petabytes), the storage cost of Tree-based Index methods becomes exceptionally high, especially when opted to partition a dataset into smaller chunks. In this context, this thesis contributes with a novel perspective on how to organise low-dimensional large-scale datasets based on data space transformations deriving a Space Transformation Organisation Structure (STOS). STOS facilitates kNN query processing as if underlying datasets were uniformly distributed in the space. Such an approach bears significant advantages: first, STOS enjoys a minute memory footprint that is many orders of magnitude smaller than Index-based approaches found in the literature. Second, the required memory for such meta-data information over large-scale datasets, unlike related work, increases very slowly with dataset size. Hence, STOS enjoys significantly higher scalability. Third, STOS is relatively efficient to compute, outperforming traditional multivariate Index building times, and comparable, if not better, query response times. In the literature, the exact kNN query in a large-scale dataset was limited to low-dimensional space; this is because the query response time and memory space requirement of the Tree-based index methods increase with dimension. Unable to solve such exponential dependency on the dimension, researchers assume that no efficient solution exists and propose approximation kNN in high dimensional space. Unlike the approximated kNN query that tries to retrieve approximated nearest neighbours from large-scale datasets, in this thesis a new type of kNN query referred to as ‘estimated kNN query’ is proposed. The estimated kNN query processing methodology attempts to estimate the nearest neighbours based on the marginal cumulative distribution of underlying data using statistical copulas. This thesis showcases the performance trade-off of exact kNN and the estimate kNN queries in terms of estimation error and scalability. In contrast, kNN regression predicts that a value of a target variable based on kNN; but, particularly in a high dimensional large-scale dataset, a query response time of kNN regression, can be a significantly high due to the curse of dimensionality. In an effort to tackle this issue, a new probabilistic kNN regression method is proposed. The proposed method statistically predicts the values of a target variable of kNN without computing distance. In different contexts, a kNN as missing value algorithm in high dimensional space in Pytha, a distributed/parallel missing value imputation framework, is investigated. In Pythia, a different way of indexing a high-dimensional large-scale dataset is proposed by the group (not the work of the author of this thesis); by using such indexing methods, scaling-out of kNN in high dimensional space was ensured. Pythia uses Adaptive Resonance Theory (ART) -a machine learning clustering algorithm- for building a data digest (aka signatures) of large-scale datasets distributed across several data machines. The major idea is that given an input vector, Pythia predicts the most relevant data centres to get involved in processing, for example, kNN. Pythia does not retrieve exact kNN. To this end, instead of accessing the entire dataset that resides in a data-node, in this thesis, accessing only relevant clusters that reside in appropriate data-nodes is proposed. As we shall see later, such method has comparable accuracy to that of the original design of Pythia but has lower imputation time. Moreover, the imputation time does not significantly grow with a size of a dataset that resides in a data node or with the number of data nodes in Pythia. Furthermore, as Pythia depends utterly on the data digest built by ART to predict relevant data centres, in this thesis, the performance of Pythia is investigated by comparing different signatures constructed by a different clustering algorithms, the Self-Organising Maps. In this thesis, the performance advantages of the proposed approaches via extensive experimentation with multi-dimensional real and synthetic datasets of different sizes and context are substantiated and quantified
    • …
    corecore