Search CORE

2,191 research outputs found

Recommended from our members

A MapReduce architecture for web site user behaviour monitoring in real time

Author: Karakostas B.
Theodoulidis B.
Publication venue
Publication date
Field of study

Monitoring the behaviour of large numbers of web site users in real time poses significant performance challenges, due to the decentralised location and volume of generated data. This paper proposes a MapReduce-style architecture where the processing of event series from the Web users is performed by a number of cascading mappers, reducers and rereducers, local to the event origin. With the use of static analysis and a prototype implementation, we show how this architecture is capable to carry out time series analysis in real time for very large web data sets, based on the actual events, instead of resorting to sampling or other extrapolation techniques

City Research Online

PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data

Author: Barnwal Rajesh P
Das Sajal K
Ghosh Nirnay
Ghosh Soumya K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2018
Field of study

Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is the sparsity of notifications owing to lack of active participation, thus inhibiting large scale real-life experiments for the research community. On the flip side, research community always needs ground truth to validate the efficacy of the proposed models and algorithms. Most of the PS applications involve human mobility and report generation following sensing of any event of interest in the adjacent environment. This work is an attempt to study and empirically model human participation behavior and event occurrence distributions through development of a location-sensitive data simulation framework, called PS-Sim. From extensive experiments it has been observed that the synthetic data generated by PS-Sim replicates real participation and event occurrence behaviors in PS applications, which may be considered for validation purpose in absence of the groundtruth. As a proof-of-concept, we have used real-life dataset from a vehicular traffic management application to train the models in PS-Sim and cross-validated the simulated data with other parts of the same dataset.Comment: Published and Appeared in Proceedings of IEEE International Conference on Smart Computing (SMARTCOMP-2018

arXiv.org e-Print Archive

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Population Density-based Hospital Recommendation with Mobile LBS Big Data

Author: Cao Yuan
Chao Hanqing
Shan Hongming
Xia Fen
Zhang Junping
Zhou Ye
Publication venue
Publication date: 02/08/2017
Field of study

The difficulty of getting medical treatment is one of major livelihood issues in China. Since patients lack prior knowledge about the spatial distribution and the capacity of hospitals, some hospitals have abnormally high or sporadic population densities. This paper presents a new model for estimating the spatiotemporal population density in each hospital based on location-based service (LBS) big data, which would be beneficial to guiding and dispersing outpatients. To improve the estimation accuracy, several approaches are proposed to denoise the LBS data and classify people by detecting their various behaviors. In addition, a long short-term memory (LSTM) based deep learning is presented to predict the trend of population density. By using Baidu large-scale LBS logs database, we apply the proposed model to 113 hospitals in Beijing, P. R. China, and constructed an online hospital recommendation system which can provide users with a hospital rank list basing the real-time population density information and the hospitals' basic information such as hospitals' levels and their distances. We also mine several interesting patterns from these LBS logs by using our proposed system

arXiv.org e-Print Archive

Crossref

A MapReduce solution for associative classification of big data

Author: BECHINI ALESSIO
MARCELLONI FRANCESCO
SEGATORI ARMANDO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time

Crossref

Archivio della Ricerca - Università di Pisa

Distributed context discovering for predictive modeling

Author: Rong Z.
Publication venue
Publication date: 01/01/2013
Field of study

Click prediction has applications in various areas such as advertising, search and online sales. Usually user-intent information such as query terms and previous click history is used in click prediction. However, this information is not always available. For example, there are no queries from users on the webpages of content publishers, such as personal blogs. The available information for click prediction in this scenario are implicitly derived from users, such as visiting time and IP address. Thus, the existing approaches utilizing user-intent information may be inapplicable in this scenario; and the click prediction problem in this scenario remains unexplored to our knowledge. In addition, the challenges in handling skewed data streams also exist in prediction, since there is often a heavy traffic on webpages and few visitors click on them. In this thesis, we propose to use the pattern-based classification approach to tackle the click prediction problem. Attributes in webpage visits are combined by a pattern mining algorithm to enhance their power in prediction. To make the pattern-based classification handle skewed data streams, we adopt a sliding window to capture recent data, and an undersampling technique to handle the skewness. As a side problem raised by the pattern-based approach, mining patterns from large datasets is addressed by a distributed pattern sampling algorithm proposed by us. This algorithm shows its scalability in experiments. We validate our pattern-based approach in click prediction on a real-world dataset from a Dutch portal website. The experiments show our pattern-based approach can achieve an average AUC of 0.675 over a period of 36 days with a 5-day sized sliding window, which surpasses the baseline, a statically trained classification model without patterns by 0.002. Besides, the average weighted F-measure of our approach is 0.009 higher than the baseline. Therefore, our proposed approach can slightly improve classification performance; yet whether this improvement worth deployment in real scenarios remains a question. Click prediction has applications in various areas such as advertising, search and online sales. Usually user-intent information such as query terms and previous click history is used in click prediction. However, this information is not always available. For example, there are no queries from users on the webpages of content publishers, such as personal blogs. The available information for click prediction in this scenario are implicitly derived from users, such as visiting time and IP address. Thus, the existing approaches utilizing user-intent information may be inapplicable in this scenario; and the click prediction problem in this scenario remains unexplored to our knowledge. In addition, the challenges in handling skewed data streams also exist in prediction, since there is often a heavy traffic on webpages and few visitors click on them. In this thesis, we propose to use the pattern-based classification approach to tackle the click prediction problem. Attributes in webpage visits are combined by a pattern mining algorithm to enhance their power in prediction. To make the pattern-based classification handle skewed data streams, we adopt a sliding window to capture recent data, and an undersampling technique to handle the skewness. As a side problem raised by the pattern-based approach, mining patterns from large datasets is addressed by a distributed pattern sampling algorithm proposed by us. This algorithm shows its scalability in experiments. We validate our pattern-based approach in click prediction on a real-world dataset from a Dutch portal website. The experiments show our pattern-based approach can achieve an average AUC of 0.675 over a period of 36 days with a 5-day sized sliding window, which surpasses the baseline, a statically trained classification model without patterns by 0.002. Besides, the average weighted F-measure of our approach is 0.009 higher than the baseline. Therefore, our proposed approach can slightly improve classification performance; yet whether this improvement worth deployment in real scenarios remains a question

Repository TU/e

Pure OAI Repository