4,065 research outputs found

    Using bi-clustering algorithm for analyzing online users activity in a virtual campus

    Get PDF
    Data mining algorithms have been proved to be useful for the processing of large data sets in order to extract relevant information and knowledge. Such algorithms are also important for analyzing data collected from the users' activity users. One family of such data analysis is that of mining of log files of online applications that register the actions of online users during long periods of time. A relevant objective in this case is to study the behavior of online users and feedback the design processes of online applications to provide better usability and adaption to users' preferences. The context of this work is that of a virtual campus in which thousands of students and tutors carry out the learning and teaching activity using online applications. The information stored in log files of virtual campuses tend to be large, complex and heterogeneous in nature. Hence, their mining requires both efficient and intelligent processing and analysis of user interaction data during long-term learning activities. In this paper, we present a bi-clustering algorithm for processing large log data sets from the online daily activity of students in a real virtual campus. Our approach is useful to extract relevant knowledge about user activity such as navigation patterns, activities performed as well as to study time parameters related to such activities. The extracted information can be useful not only to students and tutors to stimulate and improve their experience when interacting with the system but also to the designers and developers of the virtual campus in order to better support the online teaching and learning.Peer ReviewedPostprint (published version

    Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

    Get PDF
    This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft

    Scalability, memory issues and challenges in mining large data sets

    Get PDF
    (c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Data mining is an active field of research and development aiming to automatically extract "knowledge" from analyzing data sets. Knowledge can be defined in different ways such as discovering (structured, frequent, approximate, etc.) patterns in data, grouping/clustering/bi-clustering data according to one or more criteria, finding association rules, etc. Such knowledge is then fed-back to decision support systems enabling end-users (actors) to make more informed decisions, which in economic terms could lead to advantages as compared to traditional decision support systems. It should be noted however, that data mining algorithms and frameworks have been proposed prior to the "Big Data" explosion. While data mining algorithms have considered efficiency and computational complexity as an important requirement, they did not take into account features of Big Data such as very large size, velocity with which data is generated, variety, etc. On the other hand, these features are indeed posing issues and challenges to data mining algorithms and frameworks. In this paper we analyse some of the issues in mining large data sets such as scalability and in-memory needs. We also show some computational results pointing out to such issues.Peer ReviewedPostprint (author's final draft

    A Unified And Green Platform For Smartphone Sensing

    Get PDF
    Smartphones have become key communication and entertainment devices in people\u27s daily life. Sensors on (or attached to) smartphones can enable attractive sensing applications in different domains, including environmental monitoring, social networking, healthcare, transportation, etc. Most existing smartphone sensing systems are application-specific. How to leverage smartphones\u27 sensing capability to make them become unified information providers for various applications has not yet been fully explored. This dissertation presents a unified and green platform for smartphone sensing, which has the following desirable features: 1) It can support various smartphone sensing applications; 2) It is personalizable; 2) It is energy-efficient; and 3) It can be easily extended to support new sensors. Two novel sensing applications are built and integrated into this unified platform: SOR and LIPS. SOR is a smartphone Sensing based Objective Ranking (SOR) system. Different from a few subjective online review and recommendation systems (such as Yelp and TripAdvisor), SOR ranks a target place based on data collected via smartphone sensing. LIPS is a system that learns the LIfestyles of mobile users via smartPhone Sensing (LIPS). Combining both unsupervised and supervised learning, a hybrid scheme is proposed to characterize lifestyle and predict future activities of mobile users. This dissertation also studies how to use the cloud as a coordinator to assist smartphones for sensing collaboratively with the objective of reducing sensing energy consumption. A novel probabilistic model is built to address the GPS-less energy-efficient crowd sensing problem. Provably-good approximation algorithms are presented to enable smartphones to sense collaboratively without accurate locations such that sensing coverage requirements can be met with limited energy consumption

    The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis

    Full text link
    In recent years, mobile devices (e.g., smartphones and tablets) have met an increasing commercial success and have become a fundamental element of the everyday life for billions of people all around the world. Mobile devices are used not only for traditional communication activities (e.g., voice calls and messages) but also for more advanced tasks made possible by an enormous amount of multi-purpose applications (e.g., finance, gaming, and shopping). As a result, those devices generate a significant network traffic (a consistent part of the overall Internet traffic). For this reason, the research community has been investigating security and privacy issues that are related to the network traffic generated by mobile devices, which could be analyzed to obtain information useful for a variety of goals (ranging from device security and network optimization, to fine-grained user profiling). In this paper, we review the works that contributed to the state of the art of network traffic analysis targeting mobile devices. In particular, we present a systematic classification of the works in the literature according to three criteria: (i) the goal of the analysis; (ii) the point where the network traffic is captured; and (iii) the targeted mobile platforms. In this survey, we consider points of capturing such as Wi-Fi Access Points, software simulation, and inside real mobile devices or emulators. For the surveyed works, we review and compare analysis techniques, validation methods, and achieved results. We also discuss possible countermeasures, challenges and possible directions for future research on mobile traffic analysis and other emerging domains (e.g., Internet of Things). We believe our survey will be a reference work for researchers and practitioners in this research field.Comment: 55 page

    Visual Analytics Methods for Exploring Geographically Networked Phenomena

    Get PDF
    abstract: The connections between different entities define different kinds of networks, and many such networked phenomena are influenced by their underlying geographical relationships. By integrating network and geospatial analysis, the goal is to extract information about interaction topologies and the relationships to related geographical constructs. In the recent decades, much work has been done analyzing the dynamics of spatial networks; however, many challenges still remain in this field. First, the development of social media and transportation technologies has greatly reshaped the typologies of communications between different geographical regions. Second, the distance metrics used in spatial analysis should also be enriched with the underlying network information to develop accurate models. Visual analytics provides methods for data exploration, pattern recognition, and knowledge discovery. However, despite the long history of geovisualizations and network visual analytics, little work has been done to develop visual analytics tools that focus specifically on geographically networked phenomena. This thesis develops a variety of visualization methods to present data values and geospatial network relationships, which enables users to interactively explore the data. Users can investigate the connections in both virtual networks and geospatial networks and the underlying geographical context can be used to improve knowledge discovery. The focus of this thesis is on social media analysis and geographical hotspots optimization. A framework is proposed for social network analysis to unveil the links between social media interactions and their underlying networked geospatial phenomena. This will be combined with a novel hotspot approach to improve hotspot identification and boundary detection with the networks extracted from urban infrastructure. Several real world problems have been analyzed using the proposed visual analytics frameworks. The primary studies and experiments show that visual analytics methods can help analysts explore such data from multiple perspectives and help the knowledge discovery process.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Detection of Crypto-Ransomware Attack Using Deep Learning

    Get PDF
    The number one threat to the digital world is the exponential increase in ransomware attacks. Ransomware is malware that prevents victims from accessing their resources by locking or encrypting the data until a ransom is paid. With individuals and businesses growing dependencies on technology and the Internet, researchers in the cyber security field are looking for different measures to prevent malicious attackers from having a successful campaign. A new ransomware variant is being introduced daily, thus behavior-based analysis of detecting ransomware attacks is more effective than the traditional static analysis. This paper proposes a multi-variant classification to detect ransomware I/O operations from benign applications. The deep learning models implemented in the proposed approach are Bi-directional Long Short-Term Memory (Bi-LSTM) and Convolutional Neural Networks (CNN). The deep learning models are compared against a classic machine learning model such as Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF). The ransomware samples contain 70 binaries from 30 different ransomware extracted during the encryption of an extensive network shared directory. The benign samples came from network traffic traces recorded in a campus LAN where staff users access files from shared servers. A sample contains I/O operations (short Control Commands, bytes being read, and written) per second over a period of T seconds. The proposed deep learning models are tested with Zero-day ransomware samples as well. Both Bi-LSTM and CNN achieved above 98% in accurately classifying ransomware and benign samples

    Advances on Smart Cities and Smart Buildings

    Get PDF
    Modern cities are facing the challenge of combining competitiveness at the global city scale and sustainable urban development to become smart cities. A smart city is a high-tech, intensive and advanced city that connects people, information, and city elements using new technologies in order to create a sustainable, greener city; competitive and innovative commerce; and an increased quality of life. This Special Issue collects the recent advancements in smart cities and covers different topics and aspects
    • …
    corecore