Search CORE

473 research outputs found

Some Contribution of Statistical Techniques in Big Data: A Review

Author: R. D. Patil, Omprakash S. Jadhav
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

Big Data is a popular topic in research work. Everyone is talking about big data, and it is believed that science, business, industry, government, society etc. will undergo a through change with the impact of big data.Big data is used to refer to very huge data set having large, more complex, hidden pattern, structured and unstructured nature of data with the difficulties to collect, storage, analysing for process or result. So proper advanced techniques to use to gain knowledge about big data. In big data research big challenge is created in storage, process, search, sharing, transfer, analysis and visualizing. To deeply discuss on introduction of big data, issue, management and all used big data techniques. Also in this paper present a review of various advanced statistical techniques to handling the key application of big data have large data set. These advanced techniques handle the structure as well as unstructured big data in different area

International Journal on Recent and Innovation Trends in Computing and Communication

From Social Data Mining to Forecasting Socio-Economic Crisis

Author: A. Barabasi
A. Barabasi
A. Diekmann
A. Halevy
A. Monreale
A. Szalay
A. Vespignani
B. Kluger
B.-C. Chen
B.B. Mandelbrot
B.C. Chen
C. Cattuto
C. Doctorow
C. Lynch
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Lazer
E. Ostrom
E. Ravenstein
E.F. Fama
E.J. Candes
G. Sugihara
G. Ziegler
G.K. Zipf
I. Foster
Ioannidis P.A. John
J. Danãelsson
J. Krumm
J.H. Fowler
K.P. Smith
L. Molgedey
L. Odling-Smee
M. Atzori
M. Scheffer
M.J. Salganik
M.M. Gaber
N. Eagle
N.A. Christakis
N.A. Christakis
N.A. Christakis
N.F. Johnson
P. Bajaria
R.K. Merton
S. Balietti
S. Nelson
S.E. Asch
S.H. Muggleton
S.V. Buldyrev
W.S. Bainbridge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2011
Field of study

Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

Author: Ahmed Syed
Bagherzadeh Mehdi
Khatchadourian Raffi T
Tang Yiming
Publication venue: CUNY Academic Works
Publication date: 23/07/2019
Field of study

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages and platforms. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite, e.g., collections, and infinite data structures. However, using this API efficiently involves subtle considerations such as determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. Also, streams may not run all operations in parallel depending on particular collectors used in reductions. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions and transformations for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel, unorder or de-parallelize already parallel streams, and optimize streams involving complex reductions. The approach was implemented as a plug-in to the popular Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ∼642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential

City University of New York

PRINS : scalable model inference for component-based system logs

Author: Bianculli D.
Briand L.
Shin D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS, follows a divide-and-conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models

arXiv.org e-Print Archive

PubMed Central

White Rose Research Online

Open Repository and Bibliography - Luxembourg

Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

Author: Bhattacharjee Shameek
Das Sajal
Ghosh Nirnay
Melodia Tommaso
Restuccia Francesco
Publication venue
Publication date: 06/09/2017
Field of study

Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

arXiv.org e-Print Archive

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

PRINS: Scalable Model Inference for Component-based System Logs

Author: Bianculli Domenico
Briand Lionel
Shin Donghwan
Publication venue
Publication date: 12/04/2022
Field of study

Open Repository and Bibliography - Luxembourg

Data-driven modelling and probabilistic analysis of interactive software usage

Author: Andrei Oana
Calder Muffy
Publication venue: 'Elsevier BV'
Publication date: 01/11/2018
Field of study

This paper answers the research question: how can we model and understand the ways in which users actually interact with software, given that usage styles vary from user to user, and even from use to use for an individual user. Our first contribution is to introduce two new probabilistic, admixture models, inferred from sets of logged user traces, which include observed and latent states. The models encapsulate the temporal and stochastic aspects of usage, the heterogeneous and dynamic nature of users, and the temporal aspects of the time interval over which the data was collected (e.g. one day, one month, etc.). A key concept is activity patterns, which encapsulate common observed temporal behaviours shared across a set of logged user traces. Each activity pattern is a discrete-time Markov chain in which observed variables label the states; latent states specify the activity patterns. The second contribution is how we use parametrised, probabilistic, temporal logic properties to reason about hypothesised behaviours within an activity pattern, and between activity patterns. Different combinations of inferred model and hypothesised property afford a rich set of techniques for understanding software usage. The third contribution is a demonstration of the models and temporal logic properties by application to user traces from a software application that has been used by tens of thousands of users worldwide

Enlighten

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive