44 research outputs found
iWAP: ASingle Pass Approach for Web Access Sequential Pattern Mining
With the explosive growth of data availability on the World Wide Web, web usage mining becomes very essential for improving designs of websites, analyzing system performance as well as network communications, understanding user reaction, motivation and building adaptive websites. Web Access Pattern mining (WAP-mine) is a sequential pattern mining technique for discovering frequent web log access sequences. It first stores the frequent part of original web access sequence database on a prefix tree called WAP-tree and mines the frequent sequences from that tree according to a user given minimum support threshold. Therefore, this method is not applicable for incremental and interactive mining. In this paper, we propose an algorithm, improved Web Access Pattern (iWAP) mining, to find web access patterns from web logs more efficiently than the WAP-mine algorithm. Our proposed approach can discover all web access sequential patterns with a single pass of web log databases. Moreover, it is applicable for interactive and incremental mining which are not provided by the earlier one. The experimental and performance studies show that the proposed algorithm is in general an order of magnitude faster than the existing WAP-mine algorithm
A Deep Dive into the Google Cluster Workload Traces: Analyzing the Application Failure Characteristics and User Behaviors
Large-scale cloud data centers have gained popularity due to their high
availability, rapid elasticity, scalability, and low cost. However, current
data centers continue to have high failure rates due to the lack of proper
resource utilization and early failure detection. To maximize resource
efficiency and reduce failure rates in large-scale cloud data centers, it is
crucial to understand the workload and failure characteristics. In this paper,
we perform a deep analysis of the 2019 Google Cluster Trace Dataset, which
contains 2.4TiB of workload traces from eight different clusters around the
world. We explore the characteristics of failed and killed jobs in Google's
production cloud and attempt to correlate them with key attributes such as
resource usage, job priority, scheduling class, job duration, and the number of
task resubmissions. Our analysis reveals several important characteristics of
failed jobs that contribute to job failure and hence, could be used for
developing an early failure prediction system. Also, we present a novel usage
analysis to identify heterogeneity in jobs and tasks submitted by users. We are
able to identify specific users who control more than half of all collection
events on a single cluster. We contend that these characteristics could be
useful in developing an early job failure prediction system that could be
utilized for dynamic rescheduling of the job scheduler and thus improving
resource utilization in large-scale cloud data centers while reducing failure
rates
Recommended from our members
Stress and productivity patterns of interrupted, synergistic, and antagonistic office activities.
We describe a controlled experiment, aiming to study productivity and stress effects of email interruptions and activity interactions in the modern office. The measurement set includes multimodal data for n = 63 knowledge workers who volunteered for this experiment and were randomly assigned into four groups: (G1/G2) Batch email interruptions with/without exogenous stress. (G3/G4) Continual email interruptions with/without exogenous stress. To provide context, the experiment's email treatments were surrounded by typical office tasks. The captured variables include physiological indicators of stress, measures of report writing quality and keystroke dynamics, as well as psychometric scores and biographic information detailing participants' profiles. Investigations powered by this dataset are expected to lead to personalized recommendations for handling email interruptions and a deeper understanding of synergistic and antagonistic office activities. Given the centrality of email in the modern office, and the importance of office work to people's lives and the economy, the present data have a valuable role to play
The impact of the COVID-19 pandemic on the education of medical, dental and non-medical healthcare professionals in Bangladesh : findings and connotation
Lockdown measures in response to the COVID-19 pandemic had an appreciable impact on the education of all medical, dental, and non-medical healthcare professional (HCP) students. These included the closure of universities necessitating a rapid move to e-learning and new approaches to practical’s. However initially, there was a lack of knowledge and expertise regarding e-learning approaches and the affordability of internet bundles and equipment. We initially con-ducted two pilot studies to assess such current challenges, replaced by a two-stage approach including a full investigation involving 32 private and public universities during the early stages of the pandemic followed by a later study assessing the current environment brought about by the forced changes. Top challenges at the start of the pandemic included a lack of familiarity with e-learning approaches, cost of the internet, lack of IT equipment and the quality of the classes. Universities offered support to staff and students to a varying degree to address identified challenges. Since then, e-learning approaches have widened the possibilities for teaching and learning at convenient times. However, challenges remain. In conclusion, there were considerable challenges at the start of them pandemic. Several key issues have been addressed with hybrid learning here to stay. Remaining challenges include a lack of ICT equipment. However, new innovations will continue
Stress and productivity patterns of interrupted, synergistic, and antagonistic office activities.
An Automated Framework to Debug System-Level Concurrency Failures
The ever-increasing parallelism in computer systems has made software more prone to concurrency failures, causing problems during both pre- and post-development. Debugging concurrent programs is difficult because of the non-deterministic behavior and the specific sequences of interleaving in the execution flow. Debugging is a technique where programmers reproduce the bug, identify the root cause, and then take necessary steps to remove the bug from the system. The failure information may come from the bug reports of the testing phase or the production runs. In both cases, there should be steps taken to reproduce and localize the failure. However, reproducing and localizing such failures ishard because concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), and may show different behavior in every execution with the same input and environment.This dissertation presents three research works to reproduce and localize system-level concurrency bugs. Specifically, The first framework is designed to automatically reproduce system-level concurrency failures using only default logs collected from the field. This technique uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. The second framework is designed to localize system-level concurrency bugs by using a large system-call trace. This technique uses data mining, statistical anomaly detection technique, and dynamic analysis of programs to identify and localizethe failure-inducing system calls. The third research work is conducted to automatically localize system-level concurrency bugs by using the bug reports. It combines natural language processing, data mining, and structured information retrieval to automatically extract relevant concurrency entities (i.e., system call names) from bug reports and localize them in the source code by performing static analysis on the code. The goal of this dissertation is to make the debugging process of concurrency applications fully automated so that the subtle and intermittent concurrent faults can be easily diagnosed and fixed