281 research outputs found
Mining Sequences of Developer Interactions in Visual Studio for Usage Smells
In this paper, we present a semi-automatic approach for mining a large-scale dataset of IDE interactions to extract usage smells, i.e., inefficient IDE usage patterns exhibited by developers in the field. The approach outlined in this paper first mines frequent IDE usage patterns, filtered via a set of thresholds and by the authors, that are subsequently supported (or disputed) using a developer survey, in order to form usage smells. In contrast with conventional mining of IDE usage data, our approach identifies time-ordered sequences of developer actions that are exhibited by many developers in the field. This pattern mining workflow is resilient to the ample noise present in IDE datasets due to the mix of actions and events that these datasets typically contain. We identify usage patterns and smells that contribute to the understanding of the usability of Visual Studio for debugging, code search, and active file navigation, and, more broadly, to the understanding of developer behavior during these software development activities. Among our findings is the discovery that developers are reluctant to use conditional breakpoints when debugging, due to perceived IDE performance problems as well as due to the lack of error checking in specifying the conditional
Workload Prediction for Efficient Performance Isolation and System Reliability
In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers
Privacy in trajectory micro-data publishing : a survey
We survey the literature on the privacy of trajectory micro-data, i.e.,
spatiotemporal information about the mobility of individuals, whose collection
is becoming increasingly simple and frequent thanks to emerging information and
communication technologies. The focus of our review is on privacy-preserving
data publishing (PPDP), i.e., the publication of databases of trajectory
micro-data that preserve the privacy of the monitored individuals. We classify
and present the literature of attacks against trajectory micro-data, as well as
solutions proposed to date for protecting databases from such attacks. This
paper serves as an introductory reading on a critical subject in an era of
growing awareness about privacy risks connected to digital services, and
provides insights into open problems and future directions for research.Comment: Accepted for publication at Transactions for Data Privac
Recommended from our members
Probabilistic Modeling for Whole Metagenome Profiling
To address the shortcomings in existing Markov model implementations in handling large amount of metagenomic data with comparable or better accuracy in classification, we developed a new algorithm based on pseudo-count supplemented standard Markov model (SMM), which leverages the power of higher order models to more robustly classify reads at different taxonomic levels. Assessment on simulated metagenomic datasets demonstrated that overall SMM was more accurate in classifying reads to their respective taxa at all ranks compared to the interpolated methods. Higher order SMMs (9th order or greater) also outperformed BLAST alignments in assigning taxonomic labels to metagenomic reads at different taxonomic ranks (genus and higher) on tests that masked the read originating species (genome models) in the database. Similar results were obtained by masking at other taxonomic ranks in order to simulate the plausible scenarios of non-representation of the source of a read at different taxonomic levels in the genome database. The performance gap became more pronounced with higher taxonomic levels. To eliminate contaminations in datasets and to further improve our alignment-free approach, we developed a new framework based on a genome segmentation and clustering algorithm. This framework allowed removal of adapter sequences and contaminant DNA, as well as generation of clusters of similar segments, which were then used to sample representative read fragments to constitute training datasets. The parameters of a logistic regression model were learnt from these training datasets using a Bayesian optimization procedure. This allowed us to establish thresholds for classifying metagenomic reads by SMM. This led to the development of a Python-based frontend that combines our SMM algorithm with the logistic regression optimization, named POSMM (Python Optimized Standard Markov Model). POSMM provides a much-needed alternative to metagenome profiling programs. Our algorithm that builds the genome models on the fly, and thus obviates the need to build a database, complements alignment-based classification and can thus be used in concert with alignment-based classifiers to raise the bar in metagenome profiling
A Survey of Artificial Intelligence Techniques Employed for Adaptive Educational Systems within E-Learning Platforms
Abstract
The adaptive educational systems within e-learning platforms are built in response to the fact that the learning process is different for each and every learner. In order to provide adaptive e-learning services and study materials that are tailor-made for adaptive learning, this type of educational approach seeks to combine the ability to comprehend and detect a person’s specific needs in the context of learning with the expertise required to use appropriate learning pedagogy and enhance the learning process. Thus, it is critical to create accurate student profiles and models based upon analysis of their affective states, knowledge level, and their individual personality traits and skills. The acquired data can then be efficiently used and exploited to develop an adaptive learning environment. Once acquired, these learner models can be used in two ways. The first is to inform the pedagogy proposed by the experts and designers of the adaptive educational system. The second is to give the system dynamic self-learning capabilities from the behaviors exhibited by the teachers and students to create the appropriate pedagogy and automatically adjust the e-learning environments to suit the pedagogies. In this respect, artificial intelligence techniques may be useful for several reasons, including their ability to develop and imitate human reasoning and decision-making processes (learning-teaching model) and minimize the sources of uncertainty to achieve an effective learning-teaching context. These learning capabilities ensure both learner and system improvement over the lifelong learning mechanism. In this paper, we present a survey of raised and related topics to the field of artificial intelligence techniques employed for adaptive educational systems within e-learning, their advantages and disadvantages, and a discussion of the importance of using those techniques to achieve more intelligent and adaptive e-learning environments.</jats:p
- …