776,642 research outputs found
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation
Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows
Scientific workflows are designed as directed acyclic graphs (DAGs) and
consist of multiple dependent task definitions. They are executed over a large
amount of data, often resulting in thousands of tasks with heterogeneous
compute requirements and long runtimes, even on cluster infrastructures. In
order to optimize the workflow performance, enough resources, e.g., CPU and
memory, need to be provisioned for the respective tasks. Typically, workflow
systems rely on user resource estimates which are known to be highly
error-prone and can result in over- or underprovisioning. While resource
overprovisioning leads to high resource wastage, underprovisioning can result
in long runtimes or even failed tasks.
In this paper, we propose two different reinforcement learning approaches
based on gradient bandits and Q-learning, respectively, in order to minimize
resource wastage by selecting suitable CPU and memory allocations. We provide a
prototypical implementation in the well-known scientific workflow management
system Nextflow, evaluate our approaches with five workflows, and compare them
against the default resource configurations and a state-of-the-art feedback
loop baseline. The evaluation yields that our reinforcement learning approaches
significantly reduce resource wastage compared to the default configuration.
Further, our approaches also reduce the allocated CPU hours compared to the
state-of-the-art feedback loop by 6.79% and 24.53%.Comment: Paper accepted in 2022 IEEE International Conference on Big Data
Workshop BPOD 202
Soft peer review: social software and distributed scientific evaluation
The debate on the prospects of peer-review in the Internet age and the
increasing criticism leveled against the dominant role of impact factor
indicators are calling for new measurable criteria to assess scientific quality.
Usage-based metrics offer a new avenue to scientific quality assessment but
face the same risks as first generation search engines that used unreliable
metrics (such as raw traffic data) to estimate content quality. In this article I
analyze the contribution that social bookmarking systems can provide to the
problem of usage-based metrics for scientific evaluation. I suggest that
collaboratively aggregated metadata may help fill the gap between traditional
citation-based criteria and raw usage factors. I submit that bottom-up,
distributed evaluation models such as those afforded by social bookmarking
will challenge more traditional quality assessment models in terms of coverage,
efficiency and scalability. Services aggregating user-related quality indicators
for online scientific content will come to occupy a key function in the scholarly
communication system
Development of a framework for the evaluation of the environmental benefits of controlled traffic farming
Although controlled traffic farming (CTF) is an environmentally friendly soil management system, no quantitative evaluation of environmental benefits is available. This paper aims at establishing a framework for quantitative evaluation of the environmental benefits of CTF, considering a list of environmental benefits, namely, reducing soil compaction, runoff/erosion, energy requirement and greenhouse gas emission (GHG), conserving organic matter, enhancing soil biodiversity and fertiliser use efficiency. Based on a comprehensive literature review and the European Commission Soil Framework Directive, the choice of and the weighting of the impact of each of the environmental benefits were made. The framework was validated using data from three selected farms. For Colworth farm (Unilever, UK), the framework predicted the largest overall environmental benefit of 59.3% of the theoretically maximum achievable benefits (100%), as compared to the other two farms in Scotland (52%) and Australia (47.3%). This overall benefit could be broken down into: reducing soil compaction (24%), tillage energy requirement (10%) and GHG emissions (3%), enhancing soil biodiversity (7%) and erosion control (6%), conserving organic matter (6%), and improving fertiliser use efficiency (3%). Similar evaluation can be performed for any farm worldwide, providing that data on soil properties, topography, machinery, and weather are available
Energy rating of a water pumping station using multivariate analysis
Among water management policies, the preservation and the saving of energy demand in water supply and treatment systems play key roles. When focusing on energy, the customary metric to determine the performance of water supply systems is linked to the definition of component-based energy indicators. This approach is unfit to account for interactions occurring among system elements or between the system and its environment. On the other hand, the development of information technology has led to the availability of increasing large amount of data, typically gathered from distributed sensor networks in so-called smart grids. In this context, data intensive methodologies address the possibility of using complex network modeling approaches, and advocate the issues related to the interpretation and analysis of large amount of data produced by smart sensor networks.
In this perspective, the present work aims to use data intensive techniques in the energy analysis of a water management network.
The purpose is to provide new metrics for the energy rating of the system and to be able to provide insights into the dynamics of its operations. The study applies neural network as a tool to predict energy demand, when using flowrate and vibration data as predictor variables
Large-Scale Analysis of the Accuracy of the Journal Classification Systems of Web of Science and Scopus
Journal classification systems play an important role in bibliometric
analyses. The two most important bibliographic databases, Web of Science and
Scopus, each provide a journal classification system. However, no study has
systematically investigated the accuracy of these classification systems. To
examine and compare the accuracy of journal classification systems, we define
two criteria on the basis of direct citation relations between journals and
categories. We use Criterion I to select journals that have weak connections
with their assigned categories, and we use Criterion II to identify journals
that are not assigned to categories with which they have strong connections. If
a journal satisfies either of the two criteria, we conclude that its assignment
to categories may be questionable. Accordingly, we identify all journals with
questionable classifications in Web of Science and Scopus. Furthermore, we
perform a more in-depth analysis for the field of Library and Information
Science to assess whether our proposed criteria are appropriate and whether
they yield meaningful results. It turns out that according to our
citation-based criteria Web of Science performs significantly better than
Scopus in terms of the accuracy of its journal classification system
Holistic management – a critical review of Allan Savory’s grazing method
Allan Savory is the man behind holistic grazing and the founder of the Savory Institute. Savory claims that holistic grazing can stop desertification and reduce atmoÂspheric carbon dioxide levels to pre-industrial levels in a few decades. In this report, we review the literature on holistic grazing in order to evaluate the scientific support behind these statements
- …