198 research outputs found
Content-aware partial compression for textual big data analysis in Hadoop
A substantial amount of information in companies and on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. Compression as an effective means to reduce data size has been employed by many emerging data analytic platforms, whom the main purpose of data compression is to save storage space and reduce data transmission cost over the network. Since general purpose compression methods endeavour to achieve higher compression ratios by leveraging data transformation techniques and contextual data, this context-dependency forces the access to the compressed data to be sequential. Processing such compressed data in parallel, such as desirable in a distributed environment, is extremely challenging. This work proposes techniques for more efficient textual big data analysis with an emphasis on content-aware compression schemes suitable for the Hadoop analytic platform. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of public and private real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements
Content-aware compression for big textual data analysis
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements
Impacts of COVID-19 pandemic on renewable energy production in China: transmission mechanism and policy implications
The renewable energy industry, in particular, has experienced an
immense amount of pressure that has stemmed from the novel
COVID-19 pandemic. This study, however, investigates the renewable energy production initiatives that have taken come into
place as a reaction to the COVID-19 pandemic, using a time series
data of China in particular. The study uses the robust ARDL
bounds testing approach in order to get sound parameter estimates. The findings of the study reveal that COVID-19 pandemic
has significantly reduced the renewable energy production in
China, both in the short and long run. In addition to this, the
GDP and trade tend to positively impact the incidence of renewable energy production in the wake of the Covid-19 pandemic. In
the same context, it has been observed that the energy price has
a significant and negative impact on renewable energy production, particularly in the long-run, during the pandemic period.
Keeping these observations in consideration, it can be asserted
that the government should ideally adopt a short-term policy,
while mid-term and long-term action plans should be formulated,
so as to achieve the renewable energy targets in the future. In
this regard, the research implications and future directions have
thoroughly been discussed in the paper
Growth of Large Domain Epitaxial Graphene on the C-Face of SiC
Growth of epitaxial graphene on the C-face of SiC has been investigated.
Using a confinement controlled sublimation (CCS) method, we have achieved well
controlled growth and been able to observe propagation of uniform monolayer
graphene. Surface patterns uncover two important aspects of the growth, i.e.
carbon diffusion and stoichiometric requirement. Moreover, a new "stepdown"
growth mode has been discovered. Via this mode, monolayer graphene domains can
have an area of hundreds of square micrometers, while, most importantly, step
bunching is avoided and the initial uniformly stepped SiC surface is preserved.
The stepdown growth provides a possible route towards uniform epitaxial
graphene in wafer size without compromising the initial flat surface morphology
of SiC.Comment: 18 pages, 8 figure
Self-organising, self-managing frameworks and strategies
A novel, general framework that can be used for constructing a self-organising and self-managing system is introduced. This framework is independent of the application domain. It embodies directed evolution, can be parameterised with different strategies, and supports both local and global goals. This framework is then used to apply the principles of self-organisation and self-management to resource management within the CloudLightning architecture
- …