10 research outputs found

    A NOVEL REFACTORING APPRAOCH TO REMOVE THE SMELS PRESENT IN PROGRAMMING LANGUAGES

    Get PDF
    Mash ups are the one which is created as a new source by using the knowledge of already existing sources. The Mash ups are used to create and develop new innovative ideas from the already existing knowledgeā€™s. However the mash up creation by users may tend to some deficiencies which need to be concentrated to avoid the complexity. There may be a possibility of creation of smells during the mash up creation process which need to be addressed well in order to avoid the software failure. This is avoided by introducing the methodology called refactoring approach which tends to find and eliminate the possible smells which may occur at the time of mash up creation. However in programming cannot support the programming languages with different object behavior in different places. The refactoring cannot be applicable for the smells identified in the new type of programming language. It is overcome in our work by analyzing the object behavior when it is used in different places. Based on this knowledge, refactoring has been applied with the consideration of the moving the objects across different classes and modules and predict the behavior changes occurred in the programming languages. By doing so, efficient refactoring can be done which can be used for any type of programming languages in order to avoid the smells at the time of mash up creation. The experimental conducted were proves that the proposed methodology lead to better performance than the existing approach in terms of mash up creation

    Implications of storage subsystem interactions on processing efficiency in data intensive computing

    Get PDF
    Includes bibliographical references.2015 Fall.Processing frameworks such as MapReduce allow development of programs that operate on voluminous on-disk data. These frameworks typically include support for multiple file/storage subsystems. This decoupling of processing frameworks from the underlying storage subsystem provides a great deal of flexibility in application development. However, as we demonstrate, this flexibility often exacts a price: performance. Given the data volumes, storage subsystems (such as HDFS, MongoDB, and HBase) disperse datasets over a collection of machines. Storage subsystems manage complexity relating to preservation of consistency, redundancy, failure recovery, throughput, and load balancing. Preserving these properties involve message exchanges between distributed subsystem components, updates to in-memory data structures, data movements, and coordination as datasets are staged and system conditions change. Storage subsystems prioritize these properties differently, leading to vastly different network, disk, memory, and CPU footprints for staging and accessing the same dataset. This thesis proposes a methodology for comparing and identifying the storage subsystem suited for the processing that is being performed on a dataset. We profile the network I/O, disk I/O, memory, and CPU costs introduced by a storage subsystem during data staging, data processing, and generation of results. We perform this analysis with different storage subsystems and applications with different disk-I/O to CPU processing ratios

    Spatial Data Mining Analytical Environment for Large Scale Geospatial Data

    Get PDF
    Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a ā€œBig Dataā€ infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users

    Towards a big data reference architecture

    Get PDF

    On Fork-Join Queues and Maximum Ratio Cliques

    Get PDF
    This dissertation consists of two parts. The ļ¬rst part delves into the problem of response time estimation in fork-join queueing networks. These systems have been seen in literature for more than thirty years. The estimation of the mean response time in these systems has been found to be notoriously hard for most forms of these queueing systems. In this work, simple expressions for the mean response time are proposed as conjectures. Extensive experiments demonstrate the remarkable accuracy of these conjectures. Algorithms for the estimation of response time using these conjectures are proposed. For many of the networks studied in this dissertation, no approximations are known in literature for estimation of their response time. Therefore, the contribution of this dissertation in this direction marks signiļ¬cant progress in the analysis of fork-join queues. The second part of this dissertation introduces a fractional version of the classical maximum weight clique problem, the maximum ratio clique problem, which is to ļ¬nd a maximal clique that has the largest ratio of beneļ¬t and cost weights associated with the cliques vertices. This problem is formulated to model networks in which the vertices have a beneļ¬t as well as a cost associated with them. The maximum ratio clique problem ļ¬nds applications in a wide range of areas including social networks, stock market graphs and wind farm location. NP-completeness of the decision version of the problem is established, and three solution methods are proposed. The results of numerical experiments with standard graph instances, as well as with real-life instances arising in ļ¬nance and energy systems, are reported

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named ā€œBase-9ā€, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9ā€™s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named ā€œElias-Fibonacci of order 3ā€, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein
    corecore