198 research outputs found

    Content-aware partial compression for textual big data analysis in Hadoop

    Get PDF
    A substantial amount of information in companies and on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. Compression as an effective means to reduce data size has been employed by many emerging data analytic platforms, whom the main purpose of data compression is to save storage space and reduce data transmission cost over the network. Since general purpose compression methods endeavour to achieve higher compression ratios by leveraging data transformation techniques and contextual data, this context-dependency forces the access to the compressed data to be sequential. Processing such compressed data in parallel, such as desirable in a distributed environment, is extremely challenging. This work proposes techniques for more efficient textual big data analysis with an emphasis on content-aware compression schemes suitable for the Hadoop analytic platform. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of public and private real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements

    Content-aware compression for big textual data analysis

    Get PDF
    A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements

    Impacts of COVID-19 pandemic on renewable energy production in China: transmission mechanism and policy implications

    Get PDF
    The renewable energy industry, in particular, has experienced an immense amount of pressure that has stemmed from the novel COVID-19 pandemic. This study, however, investigates the renewable energy production initiatives that have taken come into place as a reaction to the COVID-19 pandemic, using a time series data of China in particular. The study uses the robust ARDL bounds testing approach in order to get sound parameter estimates. The findings of the study reveal that COVID-19 pandemic has significantly reduced the renewable energy production in China, both in the short and long run. In addition to this, the GDP and trade tend to positively impact the incidence of renewable energy production in the wake of the Covid-19 pandemic. In the same context, it has been observed that the energy price has a significant and negative impact on renewable energy production, particularly in the long-run, during the pandemic period. Keeping these observations in consideration, it can be asserted that the government should ideally adopt a short-term policy, while mid-term and long-term action plans should be formulated, so as to achieve the renewable energy targets in the future. In this regard, the research implications and future directions have thoroughly been discussed in the paper

    Growth of Large Domain Epitaxial Graphene on the C-Face of SiC

    Full text link
    Growth of epitaxial graphene on the C-face of SiC has been investigated. Using a confinement controlled sublimation (CCS) method, we have achieved well controlled growth and been able to observe propagation of uniform monolayer graphene. Surface patterns uncover two important aspects of the growth, i.e. carbon diffusion and stoichiometric requirement. Moreover, a new "stepdown" growth mode has been discovered. Via this mode, monolayer graphene domains can have an area of hundreds of square micrometers, while, most importantly, step bunching is avoided and the initial uniformly stepped SiC surface is preserved. The stepdown growth provides a possible route towards uniform epitaxial graphene in wafer size without compromising the initial flat surface morphology of SiC.Comment: 18 pages, 8 figure

    Self-organising, self-managing frameworks and strategies

    Get PDF
    A novel, general framework that can be used for constructing a self-organising and self-managing system is introduced. This framework is independent of the application domain. It embodies directed evolution, can be parameterised with different strategies, and supports both local and global goals. This framework is then used to apply the principles of self-organisation and self-management to resource management within the CloudLightning architecture

    On Optimal Power Control for Delay-Constrained Communication Over Fading Channels

    Full text link
    corecore