3,076 research outputs found
JISC Final Report: IncReASe (Increasing Repository Content through Automation and Services)
The IncReASe (Increasing Repository Content through Automation and Services) was an eighteen month project (subsequently extended to twenty months) to enhance White Rose Research Online (WRRO)1. WRRO is a shared repository of research outputs (primarily publications) from the Universities of Leeds, Sheffield and York; it runs on the EPrints open source repository platform. The repository was created in 2004 and had steady growth but, in common with many other similar repositories, had difficulty in achieving a “critical mass” of content and in becoming truly embedded within researchers’ workflows. The main aim of the IncReASe project was to assess ingestion routes into WRRO with a view to lowering barriers to deposit. We reviewed the feasibility of bulk import of pre-existing metadata and/or full-text research outputs, hoping this activity would have a positive knock-on effect on repository growth and embedding. Prior to the project, we had identified researchers’ reluctance to duplicate effort in metadata creation as a significant barrier to WRRO uptake; we investigated how WRRO might share data with internal and external IT systems. This work included a review of how WRRO, as an institutional based repository, might interact with the subject repository of the Economic and Social Research Council (ESRC). The project addressed four main areas: (i) researcher behaviour: we investigated researcher awareness, motivation and workflow through a survey of archiving activity on the university web sites, a questionnaire and discussions with researchers (ii) bulk import: we imported data from local systems, including York’s submission data for the 2008 Research Assessment Exercise (RAE), and developed an import plug-in for use with the arXiv2 repository (iii) interoperability: we looked at how WRRO might interact with university and departmental publication databases and ESRC’s repository. (iv) metadata: we assessed metadata issues raised by importing publication data from a variety of sources. A number of outputs from the project have been made available from the IncReASe project web site http://eprints.whiterose.ac.uk/increase/. The project highlighted the low levels of researcher awareness of WRRO - and of broader open access issues, including research funders’ deposit requirements. We designed some new publicity materials to start to address this. Departmental publication databases provided a useful jumping off point for advocacy and liaison; this activity was helpful in promoting awareness of WRRO. Bulk import proved time consuming – both in terms of adjusting EPrints plug-ins to incorporate different datasets and in the staff time required to improve publication metadata. A number of deposit scenarios were developed in the context of our work with ESRC; we concentrated on investigating how a local deposit of a research paper and attendant metadata in WRRO might be used to populate ESRC’s repository. This work improved our understanding of researcher workflows and of the SWORD protocol as a potential (if partial) solution to the single deposit, multiple destination model we wish to develop; we think the prospect of institutional repository / ESRC data sharing is now a step closer. IncReASe experienced some staff recruitment difficulties. It was also necessary to adapt the project to the changing IT landscape at the three partner institutions – in particular, the introduction of a centralised publication management system at the University of Leeds. Although these factors had some impact on deliverables, the aims and objectives of the project were largely achieved
The value of research data to the nation
Executive Director’s report
Ross Wilkinson, ANDS
How can Australia address the challenge of living in bushfire prone city fringes? How can Australia most effectively farm and preserve our precious soil? How can Australia understand the Great Barrier Reef? No single discipline can answer these questions, but to address these challenges data is needed from a range of sources and disciplines.
Research data that is well organised and available allows research to make substantial contributions vital to Australia’s future. For example, by drawing upon data that is able to be used by soil scientists, geneticists, plant scientists, climate analysts, and others, it is possible to conduct the multidisciplinary investigations necessary to tackle truly difficult and important challenges. The data might be provided by a Terrestrial Ecosystems Research Network OzFluz tower, insect observations recorded by a citizen through the Atlas of Living Australia, genetic sequencing of insects through a Bioplatforms Australia facility, weather observations by the Bureau of Meteorology, or historical data generated by CSIRO over many decades. Each will provide a part of the jigsaw, but the pieces must be able to be put together. This requires careful collection and organisation, which together deliver enormous value to the country.
However, nationally significant problems are often tackled by international cooperation, so Australia’s data assets enable Australian researchers to work with the best in the world, solving problems of both national and international significance. Australia’s data assets and research data infrastructure provide Australian researchers with an excellent platform for international collaboration.
Australia has world-leading research data infrastructure: our ability to store, compute, discover, explore, analyse and publish is the best in the world. The ability to capture data through a wide range of capabilities, from the Australian Synchrotron to Integrated Marine Observation System [IMOS: imos.org.au] ocean gliders, the combination of national storage and computation through RDSI, NCI and Pawsey initiatives, the ability to publish and discover data through ANDS, the ability to analyse and explore data through Nectar, and state and local eResearch capabilities, highlights just some of the capabilities that Australian researchers are able to access.
Importantly, their international partners are able to work with them using many of these resources. As well, Australian research organisations are assembling many resources to support their research. These include policies, procedures, practical infrastructure, and very importantly – people! The eResearch team and the data librarians are always keen to help. This issue of Share highlights how the data resources of Australia are providing a very substantial national benefit, and how that benefit is being realised
A Practitioner Survey Exploring the Value of Forensic Tools, AI, Filtering, & Safer Presentation for Investigating Child Sexual Abuse Material
For those investigating cases of Child Sexual Abuse Material (CSAM), there is the potential harm of experiencing trauma after illicit content exposure over a period of time. Research has shown that those working on such cases can experience psychological distress. As a result, there has been a greater effort to create and implement technologies that reduce exposure to CSAM. However, not much work has explored gathering insight regarding the functionality, effectiveness, accuracy, and importance of digital forensic tools and data science technologies from practitioners who use them. This study focused specifically on examining the value practitioners give to the tools and technologies they utilize to investigate CSAM cases. General findings indicated that implementing filtering technologies is more important than safe-viewing technologies; false positives are a greater concern than false negatives; resources such as time, personnel, and money continue to be a concern; and an improved workflow is highly desirable. Results also showed that practitioners are not well-versed in data science and Artificial Intelligence (AI), which is alarming given that tools already implement these techniques and that practitioners face large amounts of data during investigations. Finally, the data exemplified that practitioners are generally not taking advantage of tools that implement data science techniques, and that the biggest need for them is in automated child nudity detection, age estimation and skin tone detection
Automation for digital forensics: towards a definition for the community
With the increasing amount of digital evidence per case, the automation of investigative tasks is of utmost importance to the digital forensics community. Consequently, tools are published, frameworks are released, and artificial intelligence is explored. However, as the foundation, i.e., a definition, classification, and common terminology, is missing, this resembles the wild west: some consider keyword searches or file carving as automation while others do not. We, therefore, reviewed automation literature (in the domain of digital forensics as well as other domains), performed three practitioner interviews, and discussed the topic with domain experts from academia. On this basis, we propose a definition and then showcase several considerations with respect to automation for digital forensics, e.g., what we classify as no/basic automation as well as full automation (autonomous). We conclude that it requires these foundational discussions to promote and progress the discipline through a common understanding
Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts
The ever increasing volume of data in digital forensic investigation is one
of the most discussed challenges in the field. Usually, most of the file
artefacts on seized devices are not pertinent to the investigation. Manually
retrieving suspicious files relevant to the investigation is akin to finding a
needle in a haystack. In this paper, a methodology for the automatic
prioritisation of suspicious file artefacts (i.e., file artefacts that are
pertinent to the investigation) is proposed to reduce the manual analysis
effort required. This methodology is designed to work in a human-in-the-loop
fashion. In other words, it predicts/recommends that an artefact is likely to
be suspicious rather than giving the final analysis result. A supervised
machine learning approach is employed, which leverages the recorded results of
previously processed cases. The process of features extraction, dataset
generation, training and evaluation are presented in this paper. In addition, a
toolkit for data extraction from disk images is outlined, which enables this
method to be integrated with the conventional investigation process and work in
an automated fashion
Integrating Clinical Trial Imaging Data Resources Using Service-Oriented Architecture and Grid Computing
Clinical trials which use imaging typically require data management and workflow integration across several parties. We identify opportunities for all parties involved to realize benefits with a modular interoperability model based on service-oriented architecture and grid computing principles. We discuss middleware products for implementation of this model, and propose caGrid as an ideal candidate due to its healthcare focus; free, open source license; and mature developer tools and support
Local Motion Planner for Autonomous Navigation in Vineyards with a RGB-D Camera-Based Algorithm and Deep Learning Synergy
With the advent of agriculture 3.0 and 4.0, researchers are increasingly
focusing on the development of innovative smart farming and precision
agriculture technologies by introducing automation and robotics into the
agricultural processes. Autonomous agricultural field machines have been
gaining significant attention from farmers and industries to reduce costs,
human workload, and required resources. Nevertheless, achieving sufficient
autonomous navigation capabilities requires the simultaneous cooperation of
different processes; localization, mapping, and path planning are just some of
the steps that aim at providing to the machine the right set of skills to
operate in semi-structured and unstructured environments. In this context, this
study presents a low-cost local motion planner for autonomous navigation in
vineyards based only on an RGB-D camera, low range hardware, and a dual layer
control algorithm. The first algorithm exploits the disparity map and its depth
representation to generate a proportional control for the robotic platform.
Concurrently, a second back-up algorithm, based on representations learning and
resilient to illumination variations, can take control of the machine in case
of a momentaneous failure of the first block. Moreover, due to the double
nature of the system, after initial training of the deep learning model with an
initial dataset, the strict synergy between the two algorithms opens the
possibility of exploiting new automatically labeled data, coming from the
field, to extend the existing model knowledge. The machine learning algorithm
has been trained and tested, using transfer learning, with acquired images
during different field surveys in the North region of Italy and then optimized
for on-device inference with model pruning and quantization. Finally, the
overall system has been validated with a customized robot platform in the
relevant environment
- …