147,680 research outputs found
Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus
This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft
A Generalized Framework on Beamformer Design and CSI Acquisition for Single-Carrier Massive MIMO Systems in Millimeter Wave Channels
In this paper, we establish a general framework on the reduced dimensional
channel state information (CSI) estimation and pre-beamformer design for
frequency-selective massive multiple-input multiple-output MIMO systems
employing single-carrier (SC) modulation in time division duplex (TDD) mode by
exploiting the joint angle-delay domain channel sparsity in millimeter (mm)
wave frequencies. First, based on a generic subspace projection taking the
joint angle-delay power profile and user-grouping into account, the reduced
rank minimum mean square error (RR-MMSE) instantaneous CSI estimator is derived
for spatially correlated wideband MIMO channels. Second, the statistical
pre-beamformer design is considered for frequency-selective SC massive MIMO
channels. We examine the dimension reduction problem and subspace (beamspace)
construction on which the RR-MMSE estimation can be realized as accurately as
possible. Finally, a spatio-temporal domain correlator type reduced rank
channel estimator, as an approximation of the RR-MMSE estimate, is obtained by
carrying out least square (LS) estimation in a proper reduced dimensional
beamspace. It is observed that the proposed techniques show remarkable
robustness to the pilot interference (or contamination) with a significant
reduction in pilot overhead
Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling
We evaluate the impact of probabilistically-constructed digital identity data
collected from Sep. to Dec. 2017 (approx.), in the context of
Lookalike-targeted campaigns. The backbone of this study is a large set of
probabilistically-constructed "identities", represented as small bags of
cookies and mobile ad identifiers with associated metadata, that are likely all
owned by the same underlying user. The identity data allows to generate
"identity-based", rather than "identifier-based", user models, giving a fuller
picture of the interests of the users underlying the identifiers. We employ
off-policy techniques to evaluate the potential of identity-powered lookalike
models without incurring the risk of allowing untested models to direct large
amounts of ad spend or the large cost of performing A/B tests. We add to
historical work on off-policy evaluation by noting a significant type of
"finite-sample bias" that occurs for studies combining modestly-sized datasets
and evaluation metrics involving rare events (e.g., conversions). We illustrate
this bias using a simulation study that later informs the handling of inverse
propensity weights in our analyses on real data. We demonstrate significant
lift in identity-powered lookalikes versus an identity-ignorant baseline: on
average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for
identifiers having little data themselves, but that can be inferred to belong
to users with substantial data to aggregate across identifiers. This implies
that identity-powered user modeling is especially important in the context of
identifiers having very short lifespans (i.e., frequently churned cookies). Our
work motivates and informs the use of probabilistically-constructed identities
in marketing. It also deepens the canon of examples in which off-policy
learning has been employed to evaluate the complex systems of the internet
economy.Comment: Accepted by WSDM 201
- …