Search CORE

384 research outputs found

The Parallel Distributed Image Search Engine (ParaDISE)

Author: Garcia Seco De Herrera Alba
Markonis Dimitrios
Müller Henning
Schaer Roger
Publication venue: 'Center for Open Science'
Publication date: 19/01/2017
Field of study

Image retrieval is a complex task that differs according to the context and the user requirements in any specific field, for example in a medical environment. Search by text is often not possible or optimal and retrieval by the visual content does not always succeed in modelling high-level concepts that a user is looking for. Modern image retrieval techniques consists of multiple steps and aim to retrieve information from large–scale datasets and not only based on global image appearance but local features and if possible in a connection between visual features and text or semantics. This paper presents the Parallel Distributed Image Search Engine (ParaDISE), an image retrieval system that combines visual search with text–based retrieval and that is available as open source and free of charge. The main design concepts of ParaDISE are flexibility, expandability, scalability and interoperability. These concepts constitute the system, able to be used both in real–world applications and as an image retrieval research platform. Apart from the architecture and the implementation of the system, two use cases are described, an application of ParaDISE in retrieval of images from the medical literature and a visual feature evaluation for medical image retrieval. Future steps include the creation of an open source community that will contribute and expand this platform based on the existing parts

University of Essex Research Repository

arXiv.org e-Print Archive

Big Data in multiscale modelling:from medical image processing to personalized models

Author: Filipović Nenad
Geroski Tijana
Jakovljević Djordje
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/05/2023
Field of study

Coventry University Pure Portal

Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster

Author: Gangodkar Durgaprasad
Mittal Ankush
Parsola Jyoti
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2018
Field of study

Rapid advancement in technology and in-expensive camera has raised the necessity of monitoring systems for surveillance applications. As a result data acquired from numerous cameras deployed for surveillance is tremendous. When an event is triggered then, manually investigating such a massive data is a complex task. Thus it is essential to explore an approach that, can store massive multi-stream video data as well as, process them to find useful information. To address the challenge of storing and processing multi-stream video data, we have used Hadoop, which has grown into a leading computing model for data intensive applications. In this paper we propose a novel technique for performing post event investigation on stored surveillance video data. Our algorithm stores video data in HDFS in such a way that it efficiently identifies the location of data from HDFS based on the time of occurrence of event and perform further processing. To prove efficiency of our proposed work, we have performed event detection in the video based on the time period provided by the user. In order to estimate the performance of our approach, we evaluated the storage and processing of video data by varying (i) pixel resolution of video frame (ii) size of video data (iii) number of reducers (workers) executing the task (iv) the number of nodes in the cluster. The proposed framework efficiently achieve speed up of 5.9 for large files of 1024X1024 pixel resolution video frames thus makes it appropriate for the feasible practical deployment in any applications

Crossref

Institute of Advanced Engineering and Science

Framework for Map Reducing Technique Using Correlation for Duplicate Image Identi?cation Process

Author: Mr. Deshmukh Amol Sahebrao, Prof. P. D. Lambhate
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2016
Field of study

The duplicate image identification is an image deduplication System which avoids duplicate copies of images from storing in the storage server and reduces Storage space. This technique is used to improve storage utilization by avoiding duplicate images to store in storage server and reduce the time complexity by using Map Reduce technique. With explosive growth of digitization bulk of digital data may uploaded on server every day, deduplication schemes are widely used in backup and recovery System to minimize network and storage overhead by detecting and avoiding redundancy among data. Traditional deduplication schemes work if and only if the second image having the same content as first, so this restricts the performance of many applications as exact images need to be there if want to succeed and these all schemes are suffering from huge time complexity problem to deal with huge amount of data. In this paper, we propose the duplicate image identification system using MapReduce technique which improves the scalability and efficiency of system. Our approach reduce the time required to identify the duplicate image in storage server using MapReducing technique that is been powered with correlation technique

International Journal on Recent and Innovation Trends in Computing and Communication

Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

Author: Mondal Amit Kumar 1987-
Publication venue: 'University of Saskatchewan Library'
Publication date: 24/04/2018
Field of study

With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

eCommons@USASK

University of Saskatchewan Research Archive

Atas das Oitavas Jornadas de Informática da Universidade de Évora

Author: Caldeira Carlos Pampulim
Coelho Francisco
Publication venue: 'Universidade de Evora'
Publication date: 01/03/2018
Field of study

Atas das Oitavas Jornadas de Informática da Universidade de Évora realizadas em Março de 2018

Repositório Científico da Universidade de Évora

Methods and Applications of Synthetic Data Generation

Author: Anderson Jason
Publication venue: Clemson University Libraries
Publication date: 01/12/2021
Field of study

The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset. In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems. First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack. We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware. Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning

Clemson University: TigerPrints