124,113 research outputs found

    Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

    Get PDF
    This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft

    The stellar mass function of galaxies in Planck-selected clusters at 0.5 < z < 0.7: new constraints on the timescale and location of satellite quenching

    Get PDF
    We study the abundance of star-forming and quiescent galaxies in a sample of 21 massive clusters at 0.5<z<0.7, detected with the Planck satellite. We measure the cluster galaxy stellar mass function (SMF), which is a fundamental observable to study and constrain the formation and evolution of galaxies. Our measurements are based on homogeneous and deep multi-band photometry spanning u- to the Ks-band for each cluster and are supported by spectroscopic data from different programs. The galaxy population is separated between quiescent and star-forming galaxies based on their rest-frame U-V and V-J colours. The SMF is compared to that of field galaxies at the same redshifts, using data from the COSMOS/UltraVISTA survey. We find that the shape of the SMF of star-forming galaxies does not depend on environment, while the SMF of quiescent galaxies has a significantly steeper low-mass slope in the clusters compared to the field. We estimate the environmental quenching efficiency (f_EQ), i.e. the probability for a galaxy that would normally be star forming in the field, to be quenched due to its environment. The f_EQ shows no stellar-mass dependence in any environment, but it increases from 40% in the cluster outskirts to ~90% in the cluster centres. The radial signature of f_EQ provides constraints on where the dominant quenching mechanism operates in these clusters and on what timescale. Exploring these using a simple model based on galaxy orbits obtained from an N-body simulation, we find a clear degeneracy between both parameters. For example, the quenching process may either be triggered on a long (~3 Gyr) time scale at large radii (r~8R_500), or happen well within 1 Gyr at r<R_500. The radius where quenching is triggered is at least r_quench> 0.67R_500 (95%CL). The ICM density at this location suggests that ram-pressure stripping of the cold gas is a likely cause of quenching. [Abridged]Comment: 16 pages, 12 figures, accepted for publication in A&

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms
    • …
    corecore