181 research outputs found

    USER INTERFACE CONCEPTION FOR ASSET MANAGEMENT SYSTEM

    Get PDF
    Data as a critical resource nowadays, it is growing expotentially. As a consequence, the issue of data storing, filtering, processing and analysis have attracted a lot of attention in the database industry, since they can not be handled by the traditional database systems. The new situation has been discussed as a concept of Big Data. To be able to handle and process the Big Data, one muct be capable of deal with large volume, high velocity and various types data. With this trend, Relational Database also succeeded in integrating parts of the functions which are required to handle Big Data. So it becomes that those two techniques advance side by side. Here in this work both Big Data and Relational Database are discussed based on the current database industry development situation. Moreover, data mining techniques and communication standards are also studied and discussed to give directions for further implementation work. An interfacing work, which includes both software interfacing for third party user and user interface design and implementation, is done to provide Wärtsilä internal users access to their remote monitoring data. The desing work is done based on the needs of the different user groups of the personnel of Wärtsilä.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

    Enhanced image encryption scheme with new mapreduce approach for big size images

    Get PDF
    Achieving a secured image encryption (IES) scheme for sensitive and confidential data communications, especially in a Hadoop environment is challenging. An accurate and secure cryptosystem for colour images requires the generation of intricate secret keys that protect the images from diverse attacks. To attain such a goal, this work proposed an improved shuffled confusion-diffusion based colour IES using a hyper-chaotic plain image. First, five different sequences of random numbers were generated. Then, two of the sequences were used to shuffle the image pixels and bits, while the remaining three were used to XOR the values of the image pixels. Performance of the developed IES was evaluated in terms of various measures such as key space size, correlation coefficient, entropy, mean squared error (MSE), peak signal to noise ratio (PSNR) and differential analysis. Values of correlation coefficient (0.000732), entropy (7.9997), PSNR (7.61), and MSE (11258) were determined to be better (against various attacks) compared to current existing techniques. The IES developed in this study was found to have outperformed other comparable cryptosystems. It is thus asserted that the developed IES can be advantageous for encrypting big data sets on parallel machines. Additionally, the developed IES was also implemented on a Hadoop environment using MapReduce to evaluate its performance against known attacks. In this process, the given image was first divided and characterized in a key-value format. Next, the Map function was invoked for every key-value pair by implementing a mapper. The Map function was used to process data splits, represented in the form of key-value pairs in parallel modes without any communication between other map processes. The Map function processed a series of key/value pairs and subsequently generated zero or more key/value pairs. Furthermore, the Map function also divided the input image into partitions before generating the secret key and XOR matrix. The secret key and XOR matrix were exploited to encrypt the image. The Reduce function merged the resultant images from the Map tasks in producing the final image. Furthermore, the value of PSNR did not exceed 7.61 when the developed IES was evaluated against known attacks for both the standard dataset and big data size images. As can be seen, the correlation coefficient value of the developed IES did not exceed 0.000732. As the handling of big data size images is different from that of standard data size images, findings of this study suggest that the developed IES could be most beneficial for big data and big size images

    Spatial frequency based video stream analysis for object classification and recognition in clouds

    Get PDF
    The recent rise in multimedia technology has made it easier to perform a number of tasks. One of these tasks is monitoring where cheap cameras are producing large amount of video data. This video data is then processed for object classification to extract useful information. However, the video data obtained by these cheap cameras is often of low quality and results in blur video content. Moreover, various illumination effects caused by lightning conditions also degrade the video quality. These effects present severe challenges for object classification. We present a cloud-based blur and illumination invariant approach for object classification from images and video data. The bi-dimensional empirical mode decomposition (BEMD) has been adopted to decompose a video frame into intrinsic mode functions (IMFs). These IMFs further undergo to first order Reisz transform to generate monogenic video frames. The analysis of each IMF has been carried out by observing its local properties (amplitude, phase and orientation) generated from each monogenic video frame. We propose a stack based hierarchy of local pattern features generated from the amplitudes of each IMF which results in blur and illumination invariant object classification. The extensive experimentation on video streams as well as publically available image datasets reveals that our system achieves high accuracy from 0.97 to 0.91 for increasing Gaussian blur ranging from 0.5 to 5 and outperforms state of the art techniques under uncontrolled conditions. The system also proved to be scalable with high throughput when tested on a number of video streams using cloud infrastructure

    Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review

    Get PDF
    The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    Improving Content Based Video Retrieval Performance by Using Hadoop-MapReduce Model

    Get PDF
    In this paper, we present a distributed Content- Based Video Retrieval (CBVR) system based on MapReduce pro- gramming model. A CBVR system called Bounded Coordinate of Motion Histogram (BCMH) has been implemented as case study by using Hadoop framework. Our work consists of proposing a distributed model to extract videos signatures and compute similarity with the BCMH system based on a set of Mapreduce jobs assigned to multiple nodes of the Hadoop cluster in order to reduce computation time of training process. The proposed approach is tested on HOLLYWOOD2 dataset and the obtained results demonstrate efficiency of the proposed approach
    corecore