Search CORE

7,325 research outputs found

Recommended from our members

A MapReduce architecture for web site user behaviour monitoring in real time

Author: Karakostas B.
Theodoulidis B.
Publication venue
Publication date
Field of study

Monitoring the behaviour of large numbers of web site users in real time poses significant performance challenges, due to the decentralised location and volume of generated data. This paper proposes a MapReduce-style architecture where the processing of event series from the Web users is performed by a number of cascading mappers, reducers and rereducers, local to the event origin. With the use of static analysis and a prototype implementation, we show how this architecture is capable to carry out time series analysis in real time for very large web data sets, based on the actual events, instead of resorting to sampling or other extrapolation techniques

City Research Online

Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces

Author: Costin Andrei
Francillon Aurélien
Zarras Apostolis
Publication venue
Publication date: 11/11/2015
Field of study

Embedded devices are becoming more widespread, interconnected, and web-enabled than ever. However, recent studies showed that these devices are far from being secure. Moreover, many embedded systems rely on web interfaces for user interaction or administration. Unfortunately, web security is known to be difficult, and therefore the web interfaces of embedded systems represent a considerable attack surface. In this paper, we present the first fully automated framework that applies dynamic firmware analysis techniques to achieve, in a scalable manner, automated vulnerability discovery within embedded firmware images. We apply our framework to study the security of embedded web interfaces running in Commercial Off-The-Shelf (COTS) embedded devices, such as routers, DSL/cable modems, VoIP phones, IP/CCTV cameras. We introduce a methodology and implement a scalable framework for discovery of vulnerabilities in embedded web interfaces regardless of the vendor, device, or architecture. To achieve this goal, our framework performs full system emulation to achieve the execution of firmware images in a software-only environment, i.e., without involving any physical embedded devices. Then, we analyze the web interfaces within the firmware using both static and dynamic tools. We also present some interesting case-studies, and discuss the main challenges associated with the dynamic analysis of firmware images and their web interfaces and network services. The observations we make in this paper shed light on an important aspect of embedded devices which was not previously studied at a large scale. We validate our framework by testing it on 1925 firmware images from 54 different vendors. We discover important vulnerabilities in 185 firmware images, affecting nearly a quarter of vendors in our dataset. These experimental results demonstrate the effectiveness of our approach

arXiv.org e-Print Archive

Maastricht University Research Portal

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

Author: Zhu Yifeng
Publication venue: DigitalCommons@UMaine
Publication date: 04/06/2010
Field of study

The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

University of Maine

SIEM Network Behaviour Monitoring Framework using Deep Learning Approach for Campus Network Infrastructure

Author: Abu Bakar Sajak Aznida
Adib Khairuddin Mohammad
Afizi Mohd Shukran Mohd
Azmi Bin Mustafa Sulaiman Mohd
Nazri Ismail Mohd
Rizal Mohd Isa Mohd
Publication venue: 'Faculty of Electrical Engineering, Computer Science and Information Technology Osijek'
Publication date: 01/01/2021
Field of study

One major problem faced by network users is an attack on the security of the network especially if the network is vulnerable due to poor security policies. Network security is largely an exercise to protect not only the network itself but most importantly, the data. This exercise involves hardware and software technology. Secure and effective access management falls under the purview of network security. It focuses on threats both internally and externally, intending to protect and stop the threats from entering or spreading into the network. A specialized collection of physical devices, such as routers, firewalls, and anti-malware tools, is required to address and ensure a secure network. Almost all agencies and businesses employ highly qualified information security analysts to execute security policies and validate the policies’ effectiveness on regular basis. This research paper presents a significant and flexible way of providing centralized log analysis between network devices. Moreover, this paper proposes a novel method for compiling and displaying all potential threats and alert information in a single dashboard using a deep learning approach for campus network infrastructure

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

USER INTERFACE CONCEPTION FOR ASSET MANAGEMENT SYSTEM

Author: Xue Xiaoguo
Publication venue
Publication date: 01/01/2015
Field of study

Data as a critical resource nowadays, it is growing expotentially. As a consequence, the issue of data storing, filtering, processing and analysis have attracted a lot of attention in the database industry, since they can not be handled by the traditional database systems. The new situation has been discussed as a concept of Big Data. To be able to handle and process the Big Data, one muct be capable of deal with large volume, high velocity and various types data. With this trend, Relational Database also succeeded in integrating parts of the functions which are required to handle Big Data. So it becomes that those two techniques advance side by side. Here in this work both Big Data and Relational Database are discussed based on the current database industry development situation. Moreover, data mining techniques and communication standards are also studied and discussed to give directions for further implementation work. An interfacing work, which includes both software interfacing for third party user and user interface design and implementation, is done to provide Wärtsilä internal users access to their remote monitoring data. The desing work is done based on the needs of the different user groups of the personnel of Wärtsilä.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

Osuva

Analytical study and computational modeling of statistical methods for data mining

Author: Atkotiya Kishorchandra H.
Publication venue
Publication date: 01/06/2008
Field of study

Today, there is tremendous increase of the information available on electronic form. Day by day it is increasing massively. There are enough opportunities for research to retrieve knowledge from the data available in this information. Data mining and app

Etheses - A Saurashtra University Library Service

Big Data Reference Architecture for e-Learning Analytical Systems

Author: K. Palanivel, T. Chithralekha
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2018
Field of study

The recent advancements in technology have produced big data and become the necessity for researcher to analyze the data in order to make it meaningful. Massive amounts of data are collected across social media sites, mobile communications, business environments and institutions. In order to efficiently analyze this large quantity of raw data, the concept of big data was introduced. In this regard, big data analytic is needed in order to provide techniques to analyze the data. This new concept is expected to help education in the near future, by changing the way we approach the e-Learning process, by encouraging the interaction between learners and teachers, by allowing the fulfilment of the individual requirements and goals of learners. The learning environment generates massive knowledge by means of the various services provided in massive open online courses. Such knowledge is produced via learning actor interactions. Also, data analytics can be a valuable tool to help e-Learning organizations deliver better services to the public. It can provide important insights into consumer behavior and better predict demand for goods and services, thereby allowing for better resource management. This result motivates to put forward solutions for big data usage to the educational field. This research article unfolds a big data reference architecture for e-Learning analytical systems to make a unified analysis of the massive data generated by learning actors. This reference architecture makes the process of the massive data produced in big data e-learning system. Finally, the BiDRA for e-Learning analytical systems was evaluated based on the quality of maintainability, modularity, reusability, performance, and scalability

International Journal on Recent and Innovation Trends in Computing and Communication

Contributions to Lifelogging Protection In Streaming Environments

Author: Pàmies Estrems David
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 10/09/2020
Field of study

Tots els dies, més de cinc mil milions de persones generen algun tipus de dada a través d'Internet. Per accedir a aquesta informació, necessitem utilitzar serveis de recerca, ja siguin motors de cerca web o assistents personals. A cada interacció amb ells, el nostre registre d'accions, logs, s'utilitza per oferir una millor experiència. Per a les empreses, també són molt valuosos, ja que ofereixen una forma de monetitzar el servei. La monetització s'aconsegueix venent dades a tercers, però, els logs de consultes podrien exposar informació confidencial de l'usuari (identificadors, malalties, tendències sexuals, creences religioses) o usar-se per al que es diu "life-logging ": Un registre continu de les activitats diàries. La normativa obliga a protegir aquesta informació. S'han proposat prèviament sistemes de protecció per a conjunts de dades tancats, la majoria d'ells treballant amb arxius atòmics o dades estructurades. Desafortunadament, aquests sistemes no s'adapten quan es fan servir en el creixent entorn de dades no estructurades en temps real que representen els serveis d'Internet. Aquesta tesi té com objectiu dissenyar tècniques per protegir la informació confidencial de l'usuari en un entorn no estructurat d’streaming en temps real, garantint un equilibri entre la utilitat i la protecció de dades. S'han fet tres propostes per a una protecció eficaç dels logs. La primera és un nou mètode per anonimitzar logs de consultes, basat en k-anonimat probabilística i algunes eines de desanonimització per determinar fuites de dades. El segon mètode, s'ha millorat afegint un equilibri configurable entre privacitat i usabilitat, aconseguint una gran millora en termes d'utilitat de dades. La contribució final es refereix als assistents personals basats en Internet. La informació generada per aquests dispositius es pot considerar "life-logging" i pot augmentar els riscos de privacitat de l'usuari. Es proposa un esquema de protecció que combina anonimat de logs i signatures sanitizables.Todos los días, más de cinco mil millones de personas generan algún tipo de dato a través de Internet. Para acceder a esa información, necesitamos servicios de búsqueda, ya sean motores de búsqueda web o asistentes personales. En cada interacción con ellos, nuestro registro de acciones, logs, se utiliza para ofrecer una experiencia más útil. Para las empresas, también son muy valiosos, ya que ofrecen una forma de monetizar el servicio, vendiendo datos a terceros. Sin embargo, los logs podrían exponer información confidencial del usuario (identificadores, enfermedades, tendencias sexuales, creencias religiosas) o usarse para lo que se llama "life-logging": Un registro continuo de las actividades diarias. La normativa obliga a proteger esta información. Se han propuesto previamente sistemas de protección para conjuntos de datos cerrados, la mayoría de ellos trabajando con archivos atómicos o datos estructurados. Desafortunadamente, esos sistemas no se adaptan cuando se usan en el entorno de datos no estructurados en tiempo real que representan los servicios de Internet. Esta tesis tiene como objetivo diseñar técnicas para proteger la información confidencial del usuario en un entorno no estructurado de streaming en tiempo real, garantizando un equilibrio entre utilidad y protección de datos. Se han hecho tres propuestas para una protección eficaz de los logs. La primera es un nuevo método para anonimizar logs de consultas, basado en k-anonimato probabilístico y algunas herramientas de desanonimización para determinar fugas de datos. El segundo método, se ha mejorado añadiendo un equilibrio configurable entre privacidad y usabilidad, logrando una gran mejora en términos de utilidad de datos. La contribución final se refiere a los asistentes personales basados en Internet. La información generada por estos dispositivos se puede considerar “life-logging” y puede aumentar los riesgos de privacidad del usuario. Se propone un esquema de protección que combina anonimato de logs y firmas sanitizables.Every day, more than five billion people generate some kind of data over the Internet. As a tool for accessing that information, we need to use search services, either in the form of Web Search Engines or through Personal Assistants. On each interaction with them, our record of actions via logs, is used to offer a more useful experience. For companies, logs are also very valuable since they offer a way to monetize the service. Monetization is achieved by selling data to third parties, however query logs could potentially expose sensitive user information: identifiers, sensitive data from users (such as diseases, sexual tendencies, religious beliefs) or be used for what is called ”life-logging”: a continuous record of one’s daily activities. Current regulations oblige companies to protect this personal information. Protection systems for closed data sets have previously been proposed, most of them working with atomic files or structured data. Unfortunately, those systems do not fit when used in the growing real-time unstructured data environment posed by Internet services. This thesis aims to design techniques to protect the user’s sensitive information in a non-structured real-time streaming environment, guaranteeing a trade-off between data utility and protection. In this regard, three proposals have been made in efficient log protection. The first is a new method to anonymize query logs, based on probabilistic k-anonymity and some de-anonymization tools to determine possible data leaks. A second method has been improved in terms of a configurable trade-off between privacy and usability, achieving a great improvement in terms of data utility. Our final contribution concerns Internet-based Personal Assistants. The information generated by these devices is likely to be considered life-logging, and it can increase the user’s privacy risks. The proposal is a protection scheme that combines log anonymization and sanitizable signatures

Tesis Doctorals en Xarxa