3,843 research outputs found
Computing at massive scale: Scalability and dependability challenges
Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications
Intelligent data leak detection through behavioural analysis
In this paper we discuss a solution to detect data leaks in an intelligent and furtive way through a real time analysis of the user’s behaviour while handling classified information. Data is based on experiences with real world use cases and a variety of data preparation and data analysis techniques have been tried. Results show the feasibility of the approach, but also the necessity to correlate with other security events to improve the precision.UID/CEC/00319/201
A Multi-Gene Genetic Programming Application for Predicting Students Failure at School
Several efforts to predict student failure rate (SFR) at school accurately
still remains a core problem area faced by many in the educational sector. The
procedure for forecasting SFR are rigid and most often times require data
scaling or conversion into binary form such as is the case of the logistic
model which may lead to lose of information and effect size attenuation. Also,
the high number of factors, incomplete and unbalanced dataset, and black boxing
issues as in Artificial Neural Networks and Fuzzy logic systems exposes the
need for more efficient tools. Currently the application of Genetic Programming
(GP) holds great promises and has produced tremendous positive results in
different sectors. In this regard, this study developed GPSFARPS, a software
application to provide a robust solution to the prediction of SFR using an
evolutionary algorithm known as multi-gene genetic programming. The approach is
validated by feeding a testing data set to the evolved GP models. Result
obtained from GPSFARPS simulations show its unique ability to evolve a suitable
failure rate expression with a fast convergence at 30 generations from a
maximum specified generation of 500. The multi-gene system was also able to
minimize the evolved model expression and accurately predict student failure
rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap
with arXiv:1403.0623 by other author
Bringing ultra-large-scale software repository mining to the masses with Boa
Mining software repositories provides developers and researchers a
chance to learn from previous development activities and apply that
knowledge to the future. Ultra-large-scale open source repositories
(e.g., SourceForge with 350,000+ projects, GitHub with 250,000+
projects, and Google Code with 250,000+ projects) provide an extremely
large corpus to perform such mining tasks on. This large corpus allows
researchers the opportunity to test new mining techniques and
empirically validate new approaches on real-world data. However, the
barrier to entry is often extremely high. Researchers interested in
mining must know a large number of techniques, languages, tools, etc,
each of which is often complex. Additionally, performing mining at
the scale proposed above adds additional complexity and often is
difficult to achieve.
The Boa language and infrastructure was developed to solve these
problems. We provide users a domain-specific language tailored for
software repository mining and allow them to submit queries via our
web-based interface. These queries are then automatically
parallelized and executed on a cluster, analyzing a dataset containing
almost 700,000 projects, history information from millions of
revisions, millions of Java source files, and billions of AST nodes.
The language also provides an easy to comprehend visitor syntax to
ease writing source code mining queries. The underlying
infrastructure contains several optimizations, including query
optimizations to make single queries faster as well as a fusion
optimization to group queries from multiple users into a single query.
The latter optimization is important as Boa is intended to be a
shared, community resource. Finally, we show the potential benefit of
Boa to the community by reproducing a previously published case
study and performing a new case study on the adoption of Java language
features
Fire now, fire later: alarm-based systems for prescriptive process monitoring
Predictive process monitoring is a family of techniques to analyze events produced during the execution of a business process in order to predict the future state or the final outcome of running process instances. Existing techniques in this field are able to predict, at each step of a process instance, the likelihood that it will lead to an undesired outcome. These techniques, however, focus on generating predictions and do not prescribe when and how process workers should intervene to decrease the cost of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring with the ability to generate alarms that trigger interventions to prevent an undesired outcome or mitigate its effect. The framework incorporates a parameterized cost model to assess the cost–benefit trade-off of generating alarms. We show how to optimize the generation of alarms given an event log of past process executions and a set of cost model parameters. The proposed approaches are empirically evaluated using a range of real-life event logs. The experimental results show that the net cost of undesired outcomes can be minimized by changing the threshold for generating alarms, as the process instance progresses. Moreover, introducing delays for triggering alarms, instead of triggering them as soon as the probability of an undesired outcome exceeds a threshold, leads to lower net costs.Estonian Research Competency Council
http://dx.doi.org/10.13039/501100005189H2020 European Research Council
http://dx.doi.org/10.13039/100010663Peer Reviewe
- …