122 research outputs found
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Vcluster: A Portable Virtual Computing Library For Cluster Computing
Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries
A Survey of Large Language Models
Language is essentially a complex, intricate system of human expressions
governed by grammatical rules. It poses a significant challenge to develop
capable AI algorithms for comprehending and grasping a language. As a major
approach, language modeling has been widely studied for language understanding
and generation in the past two decades, evolving from statistical language
models to neural language models. Recently, pre-trained language models (PLMs)
have been proposed by pre-training Transformer models over large-scale corpora,
showing strong capabilities in solving various NLP tasks. Since researchers
have found that model scaling can lead to performance improvement, they further
study the scaling effect by increasing the model size to an even larger size.
Interestingly, when the parameter scale exceeds a certain level, these enlarged
language models not only achieve a significant performance improvement but also
show some special abilities that are not present in small-scale language
models. To discriminate the difference in parameter scale, the research
community has coined the term large language models (LLM) for the PLMs of
significant size. Recently, the research on LLMs has been largely advanced by
both academia and industry, and a remarkable progress is the launch of ChatGPT,
which has attracted widespread attention from society. The technical evolution
of LLMs has been making an important impact on the entire AI community, which
would revolutionize the way how we develop and use AI algorithms. In this
survey, we review the recent advances of LLMs by introducing the background,
key findings, and mainstream techniques. In particular, we focus on four major
aspects of LLMs, namely pre-training, adaptation tuning, utilization, and
capacity evaluation. Besides, we also summarize the available resources for
developing LLMs and discuss the remaining issues for future directions.Comment: ongoing work; 51 page
Mycosphaerella leaf disease on eucalypts in Western Australia - The diversity and impact
Eucalyptus plantation forestry in Western Australia (WA) is a relatively young industry and by the end of 2008, the total plantation estate (softwood and hardwood) was over 950 000 ha. The predominant plantation species is Eucalyptus globulus, native to south-eastern Australia. In Western Australia (WA), the most serious foliar disease of eucalypt plantations is Mycosphaerella Leaf Disease (MLD). However, little systematic sampling for MLD has been carried out in WA to determine its impact on plantations, yields, species involved or whether they are introduced or not. The overall aim of this thesis was to investigate MLD in south-western Australia with a particular focus on the species diversity, taxonomy and the impact on early growth on E. globulus.
The increase in the number of Mycosphaerella and Teratosphaeria species associated with Mycosphaerella leaf disease (MLD) in E. globulus plantations in WA in the past decade has raised concern about the possible movement of pathogens between the native forests and plantations and vice versa. A survey of necrotic leaf spots collected from plantation and endemic eucalypts from WA and Queensland was conducted. Overall, ten new Eucalyptus host records for Mycosphaerella/ Teratosphaeria species were isolated from WA and five from Queensland. Significantly, M. nubilosa was isolated from E. grandis x resinifera and E. urophylla x globulus in WA. This is the first time M. nubilosa has been isolated from Eucalyptus hosts within the series Resinifera (see Chapter 2).
An assessment of the number of fungi that may be contributing to MLD in E. globulus plantations in WA was undertaken (Chapter 3) and the changes in the number of species and their incidence since the first surveys were conducted. Four new records of Mycosphaerella were identified in this study; M. ellipsoidea, P. fori, M. suttoniae and M. tasmaniensis. Mycosphaerella ellipsoidea and P. fori are first records for Australia, and M. suttoniae and M. tasmaniensis are first records for WA. The current work shows an increase in the number of Mycosphaerella species associated with plantation eucalypts in WA and Australia. With the exception of M. cryptica, none of these species were known in WA prior to the commencement of large-scale E. globulus plantations, and with M. cryptica as the exception, none have a known impact on the major native eucalypts in the region.
The ITS region of the type material of T. parva, M. grandis and M. gregaria using culture and herbarium specimens was sequenced and compared to existing sequences from GenBank (Chapter 4). This was the first study to examine and sequence the type material of M. grandis, T. parva and M. gregaria. As the sequences of the ITS region of M. grandis and T. parva were identical it was concluded that M. grandis be reduced to synonymy with T. parva. Mycosphaerella aurantia, M. buckinghamiae and M. africana also match the type sequence of M. gregaria. Therefore, these should all be synonymised to M. gregaria. Also, this study was the first to describe ITS sequence variation within the same Mycosphaerella isolate.
The aim of Chapter 5 was to identify the infection pathway at the leaf surface using scanning electron microscopy and to determine the pathogenicity of M. marksii on E. globulus. The use of glycerol as a surfactant and its effect on ascospore viability was also assessed. However, this study was unable to confirm pathogenicity of M. marksii on E. globulus seedlings under laboratory conditions. However, M. marksii ascospores were able to germinate and enter E. globulus stoma 3–6 days after initial infection.
Species-specific primers were successfully designed and tested for three Mycosphaerella species that occur on E. globulus in WA (Chapter 6). Meteorological conditions appeared to determine the defoliation of juvenile foliage and not MLD as levels of MLD remained relatively low throughout the trial period. The MLD levels increased throughout spring as warm wet conditions favoured the development of disease especially on the flush of new juvenile foliage. Also, new foliage emerged after late summer rainfall. As disease pressure mounted, the trees responded through defoliation. As temperatures increased and the juvenile foliage aged, there is likely to have been an increase in the defoliation of leaves. Therefore, by mid-summer defoliation levels reached a similar level to disease and insect damage. Following leaf defoliation and the emergence of new juvenile and adult leaves, the relative amount of disease on the trees decreased. This is because most of the disease was present on the older juvenile foliage which was shed. Field observations can be a reliable indication of disease progression. Although field observations at a branch level over exaggerated levels of MLD when there was a higher level of foliage, there was still a similar trend in the amount of disease when compared to the ASSESS program. Some experience in disease monitoring would indicate a more accurate assessment of MLD. It is interesting to note that the assessors tended to overestimate disease when MLD was at a higher level, and this also included the author.
Infection studies of Uwebraunia dekkeri were conducted to confirm how this species enters E. globulus leaves and to determine its pathogenicity (Chapter 7). This study demonstrated that conidia of U. dekkeri could infect E. globulus leaves and that it is not a hyperparasite of M. cryptica or M. nubilosa. Conidiogenesis was both percurrent and sympodial and the phenomenon of anastomosis was observed for the first time on the leaf surface.
The impact that MLD has on the wood volume has previously not been investigated in WA (Chapter 8). Through the application of pesticides and fungicides in the early stages of establishment at two plantations near Albany, tree volumes were significantly increased. However, the increase in wood volume would be offset by the pesticide and application costs. This study demonstrated that monitoring for pests and disease would be more effective than spraying of chemical treatments for the first three years. The regular use of chemical treatments is expensive to maintain and is proving to be environmentally unacceptable by some communities. This study also showed that spraying for low levels of MLD had little effect on disease incidence and/ or volume increase in E. globulus plantations in WA. The most important factors for a healthy plantation appear to be site selection, preparation and tree genetics.
This study was the first to investigate the impact of MLD on the growth of Eucalyptus globulus plantations in WA. As part of this study, the biology, taxonomy and pathogenicity of the main species present in WA were investigated. The key findings were: i) the number, abundance and distribution of Mycosphaerella/ Teratosphaeria species in WA is not static and plantations should be continually monitored for the presence of new potentially threatening species; ii) spraying for MLD, although effective in reducing the prevalence and impact on growth, was not economically viable; and iii) intragenomic variation of the ribosomal genome may explain sequence variation observed in single spore isolates of Mycosphaerella/ Teratosphaeria and this has taxonomic implications. Further work would identify the impact the new records are having on the plantation estate and also if these species have the potential to spread into the neighbouring endemic forests. This study has provided a broader understanding of MLD in WA and the development of tools that could be used for further study
- …