1,532 research outputs found
MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions
Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules
Developing a distributed electronic health-record store for India
The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India
Evaluating Resilience of Cyber-Physical-Social Systems
Nowadays, protecting the network is not the only security concern. Still, in cyber security,
websites and servers are becoming more popular as targets due to the ease with which
they can be accessed when compared to communication networks. Another threat in
cyber physical social systems with human interactions is that they can be attacked and
manipulated not only by technical hacking through networks, but also by manipulating
people and stealing users’ credentials. Therefore, systems should be evaluated beyond cy-
ber security, which means measuring their resilience as a piece of evidence that a system
works properly under cyber-attacks or incidents. In that way, cyber resilience is increas-
ingly discussed and described as the capacity of a system to maintain state awareness for
detecting cyber-attacks. All the tasks for making a system resilient should proactively
maintain a safe level of operational normalcy through rapid system reconfiguration to
detect attacks that would impact system performance. In this work, we broadly studied
a new paradigm of cyber physical social systems and defined a uniform definition of it.
To overcome the complexity of evaluating cyber resilience, especially in these inhomo-
geneous systems, we proposed a framework including applying Attack Tree refinements
and Hierarchical Timed Coloured Petri Nets to model intruder and defender behaviors
and evaluate the impact of each action on the behavior and performance of the system.Hoje em dia, proteger a rede não é a única preocupação de segurança. Ainda assim, na
segurança cibernética, sites e servidores estão se tornando mais populares como alvos
devido à facilidade com que podem ser acessados quando comparados às redes de comu-
nicação. Outra ameaça em sistemas sociais ciberfisicos com interações humanas é que eles
podem ser atacados e manipulados não apenas por hackers técnicos através de redes, mas
também pela manipulação de pessoas e roubo de credenciais de utilizadores. Portanto, os
sistemas devem ser avaliados para além da segurança cibernética, o que significa medir
sua resiliência como uma evidência de que um sistema funciona adequadamente sob
ataques ou incidentes cibernéticos. Dessa forma, a resiliência cibernética é cada vez mais
discutida e descrita como a capacidade de um sistema manter a consciência do estado para
detectar ataques cibernéticos. Todas as tarefas para tornar um sistema resiliente devem
manter proativamente um nível seguro de normalidade operacional por meio da reconfi-
guração rápida do sistema para detectar ataques que afetariam o desempenho do sistema.
Neste trabalho, um novo paradigma de sistemas sociais ciberfisicos é amplamente estu-
dado e uma definição uniforme é proposta. Para superar a complexidade de avaliar a
resiliência cibernética, especialmente nesses sistemas não homogéneos, é proposta uma
estrutura que inclui a aplicação de refinamentos de Árvores de Ataque e Redes de Petri
Coloridas Temporizadas Hierárquicas para modelar comportamentos de invasores e de-
fensores e avaliar o impacto de cada ação no comportamento e desempenho do sistema
Perfomance Analysis and Resource Optimisation of Critical Systems Modelled by Petri Nets
Un sistema crítico debe cumplir con su misión a pesar de la presencia de problemas de seguridad. Este tipo de sistemas se suele desplegar en entornos heterogéneos, donde pueden ser objeto de intentos de intrusión, robo de información confidencial u otro tipo de ataques. Los sistemas, en general, tienen que ser rediseñados después de que ocurra un incidente de seguridad, lo que puede conducir a consecuencias graves, como el enorme costo de reimplementar o reprogramar todo el sistema, así como las posibles pérdidas económicas. Así, la seguridad ha de ser concebida como una parte integral del desarrollo de sistemas y como una necesidad singular de lo que el sistema debe realizar (es decir, un requisito no funcional del sistema). Así pues, al diseñar sistemas críticos es fundamental estudiar los ataques que se pueden producir y planificar cómo reaccionar frente a ellos, con el fin de mantener el cumplimiento de requerimientos funcionales y no funcionales del sistema. A pesar de que los problemas de seguridad se consideren, también es necesario tener en cuenta los costes incurridos para garantizar un determinado nivel de seguridad en sistemas críticos. De hecho, los costes de seguridad puede ser un factor muy relevante ya que puede abarcar diferentes dimensiones, como el presupuesto, el rendimiento y la fiabilidad. Muchos de estos sistemas críticos que incorporan técnicas de tolerancia a fallos (sistemas FT) para hacer frente a las cuestiones de seguridad son sistemas complejos, que utilizan recursos que pueden estar comprometidos (es decir, pueden fallar) por la activación de los fallos y/o errores provocados por posibles ataques. Estos sistemas pueden ser modelados como sistemas de eventos discretos donde los recursos son compartidos, también llamados sistemas de asignación de recursos. Esta tesis se centra en los sistemas FT con recursos compartidos modelados mediante redes de Petri (Petri nets, PN). Estos sistemas son generalmente tan grandes que el cálculo exacto de su rendimiento se convierte en una tarea de cálculo muy compleja, debido al problema de la explosión del espacio de estados. Como resultado de ello, una tarea que requiere una exploración exhaustiva en el espacio de estados es incomputable (en un plazo prudencial) para sistemas grandes. Las principales aportaciones de esta tesis son tres. Primero, se ofrecen diferentes modelos, usando el Lenguaje Unificado de Modelado (Unified Modelling Language, UML) y las redes de Petri, que ayudan a incorporar las cuestiones de seguridad y tolerancia a fallos en primer plano durante la fase de diseño de los sistemas, permitiendo así, por ejemplo, el análisis del compromiso entre seguridad y rendimiento. En segundo lugar, se proporcionan varios algoritmos para calcular el rendimiento (también bajo condiciones de fallo) mediante el cálculo de cotas de rendimiento superiores, evitando así el problema de la explosión del espacio de estados. Por último, se proporcionan algoritmos para calcular cómo compensar la degradación de rendimiento que se produce ante una situación inesperada en un sistema con tolerancia a fallos
Basic Research Needs for Geosciences: Facilitating 21st Century Energy Systems
Executive Summary
Serious challenges must be faced in this century as the world seeks to meet global energy needs and at the same time reduce emissions of greenhouse gases to the atmosphere. Even with a growing energy supply from alternative sources, fossil carbon resources will remain in heavy use and will generate large volumes of carbon dioxide (CO2). To reduce the atmospheric impact of this fossil energy use, it is necessary to capture and sequester a substantial fraction of the produced CO2. Subsurface geologic formations offer a potential location for long-term storage of the requisite large volumes of CO2. Nuclear energy resources could also reduce use of carbon-based fuels and CO2 generation, especially if nuclear energy capacity is greatly increased. Nuclear power generation results in spent nuclear fuel and other radioactive materials that also must be sequestered underground. Hence, regardless of technology choices, there will be major increases in the demand to store materials underground in large quantities, for long times, and with increasing efficiency and safety margins.
Rock formations are composed of complex natural materials and were not designed by nature as storage vaults. If new energy technologies are to be developed in a timely fashion while ensuring public safety, fundamental improvements are needed in our understanding of how these rock formations will perform as storage systems.
This report describes the scientific challenges associated with geologic sequestration of large volumes of carbon dioxide for hundreds of years, and also addresses the geoscientific aspects of safely storing nuclear waste materials for thousands to hundreds of thousands of years. The fundamental crosscutting challenge is to understand the properties and processes associated with complex and heterogeneous subsurface mineral assemblages comprising porous rock formations, and the equally complex fluids that may reside within and flow through those formations. The relevant physical and chemical interactions occur on spatial scales that range from those of atoms, molecules, and mineral surfaces, up to tens of kilometers, and time scales that range from picoseconds to millennia and longer. To predict with confidence the transport and fate of either CO2 or the various components of stored nuclear materials, we need to learn to better describe fundamental atomic, molecular, and biological processes, and to translate those microscale descriptions into macroscopic properties of materials and fluids. We also need fundamental advances in the ability to simulate multiscale systems as they are perturbed during sequestration activities and for very long times afterward, and to monitor those systems in real time with increasing spatial and temporal resolution. The ultimate objective is to predict accurately the performance of the subsurface fluid-rock storage systems, and to verify enough of the predicted performance with direct observations to build confidence that the systems will meet their design targets as well as environmental protection goals.
The report summarizes the results and conclusions of a Workshop on Basic Research Needs for Geosciences held in February 2007. Five panels met, resulting in four Panel Reports, three Grand Challenges, six Priority Research Directions, and three Crosscutting Research Issues. The Grand Challenges differ from the Priority Research Directions in that the former describe broader, long-term objectives while the latter are more focused
Recommended from our members
Enabling Resilience in Cyber-Physical-Human Water Infrastructures
Rapid urbanization and growth in urban populations have forced community-scale infrastructures (e.g., water, power and natural gas distribution systems, and transportation networks) to operate at their limits. Aging (and failing) infrastructures around the world are becoming increasingly vulnerable to operational degradation, extreme weather, natural disasters and cyber attacks/failures. These trends have wide-ranging socioeconomic consequences and raise public safety concerns. In this thesis, we introduce the notion of cyber-physical-human infrastructures (CPHIs) - smart community-scale infrastructures that bridge technologies with physical infrastructures and people. CPHIs are highly dynamic stochastic systems characterized by complex physical models that exhibit regionwide variability and uncertainty under disruptions. Failures in these distributed settings tend to be difficult to predict and estimate, and expensive to repair. Real-time fault identification is crucial to ensure continuity of lifeline services to customers at adequate levels of quality. Emerging smart community technologies have the potential to transform our failing infrastructures into robust and resilient future CPHIs.In this thesis, we explore one such CPHI - community water infrastructures. Current urban water infrastructures, that are decades (sometimes over a 100 years) old, encompass diverse geophysical regimes. Water stress concerns include the scarcity of supply and an increase in demand due to urbanization. Deterioration and damage to the infrastructure can disrupt water service; contamination events can result in economic and public health consequences. Unfortunately, little investment has gone into modernizing this key lifeline.To enhance the resilience of water systems, we propose an integrated middleware framework for quick and accurate identification of failures in complex water networks that exhibit uncertain behavior. Our proposed approach integrates IoT-based sensing, domain-specific models and simulations with machine learning methods to identify failures (pipe breaks, contamination events). The composition of techniques results in cost-accuracy-latency tradeoffs in fault identification, inherent in CPHIs due to the constraints imposed by cyber components, physical mechanics and human operators. Three key resilience problems are addressed in this thesis; isolation of multiple faults under a small number of failures, state estimation of the water systems under extreme events such as earthquakes, and contaminant source identification in water networks using human-in-the-loop based sensing. By working with real world water agencies (WSSC, DC and LADWP, LA), we first develop an understanding of operations of water CPHI systems. We design and implement a sensor-simulation-data integration framework AquaSCALE, and apply it to localize multiple concurrent pipe failures. We use a mixture of infrastructure measurements (i.e., historical and live water pressure/flow), environmental data (i.e., weather) and human inputs (i.e., twitter feeds), combined and enhanced with the domain model and supervised learning techniques to locate multiple failures at fine levels of granularity (individual pipeline level) with detection time reduced by orders of magnitude (from hours/days to minutes). We next consider the resilience of water infrastructures under extreme events (i.e., earthquakes) - the challenge here is the lack of apriori knowledge and the increased number and severity of damages to infrastructures. We present a graphical model based approach for efficient online state estimation, where the offline graph factorization partitions a given network into disjoint subgraphs, and the belief propagation based inference is executed on-the-fly in a distributed manner on those subgraphs. Our proposed approach can isolate 80% broken pipes and 99% loss-of-service to end-users during an earthquake.Finally, we address issues of water quality - today this is a human-in-the-loop process where operators need to gather water samples for lab tests. We incorporate the necessary abstractions with event processing methods into a workflow, which iteratively selects and refines the set of potential failure points via human-driven grab sampling. Our approach utilizes Hidden Markov Model based representations for event inference, along with reinforcement learning methods for further refining event locations and reducing the cost of human efforts.The proposed techniques are integrated into a middleware architecture, which enables components to communicate/collaborate with one another. We validate our approaches through a prototype implementation with multiple real-world water networks, supply-demand patterns from water utilities and policies set by the U.S. EPA. While our focus here is on water infrastructures in a community, the developed end-to-end solution is applicable to other infrastructures and community services which operate in disruptive and resource-constrained environments
Data Mining
The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining
Bayesian Prognostic Framework for High-Availability Clusters
Critical services from domains as diverse as finance, manufacturing and healthcare are often delivered by complex enterprise applications (EAs). High-availability clusters (HACs) are software-managed IT infrastructures that enable these EAs to operate with minimum downtime. To that end, HACs monitor the health of EA layers (e.g., application servers and databases) and resources (i.e., components), and attempt to reinitialise or restart failed resources swiftly. When this is unsuccessful, HACs try to failover (i.e., relocate) the resource group to which the failed resource belongs to another server. If the resource group failover is also unsuccessful, or when a system-wide critical failure occurs, HACs initiate a complete system failover.
Despite the availability of multiple commercial and open-source HAC solutions, these HACs (i) disregard important sources of historical and runtime information, and (ii) have limited reasoning capabilities. Therefore, they may conservatively perform unnecessary resource group or system failovers or delay justified failovers for longer than necessary.
This thesis introduces the first HAC taxonomy, uses it to carry out an extensive survey of current HAC solutions, and develops a novel Bayesian prognostic (BP) framework that addresses the significant HAC limitations that are mentioned above and are identified by the survey. The BP framework comprises four \emph{modules}. The first module is a technique for modelling high availability using a combination of established and new HAC characteristics. The second is a suite of methods for obtaining and maintaining the information required by the other modules. The third is a HAC-independent Bayesian decision network (BDN) that predicts whether resource failures can be managed locally (i.e., without failovers). The fourth is a method for constructing a HAC-specific Bayesian network for the fast prediction of resource group and system failures. Used together, these modules reduce the downtime of HAC-protected EAs significantly. The experiments presented in this thesis show that the BP framework can deliver downtimes between 5.5 and 7.9 times smaller than those obtained with an established open-source HAC
Resilience of critical structures, infrastructure, and communities
In recent years, the concept of resilience has been introduced to the field of engineering as it relates to disaster mitigation and management. However, the built environment is only one element that supports community functionality. Maintaining community functionality during and after a disaster, defined as resilience, is influenced by multiple components. This report summarizes the research activities of the first two years of an ongoing collaboration between the Politecnico di Torino and the University of California, Berkeley, in the field of disaster resilience. Chapter 1 focuses on the economic dimension of disaster resilience with an application to the San Francisco Bay Area; Chapter 2 analyzes the option of using base-isolation systems to improve the resilience of hospitals and school buildings; Chapter 3 investigates the possibility to adopt discrete event simulation models and a meta-model to measure the resilience of the emergency department of a hospital; Chapter 4 applies the meta-model developed in Chapter 3 to the hospital network in the San Francisco Bay Area, showing the potential of the model for design purposes Chapter 5 uses a questionnaire combined with factorial analysis to evaluate the resilience of a hospital; Chapter 6 applies the concept of agent-based models to analyze the performance of socio-technical networks during an emergency. Two applications are shown: a museum and a train station; Chapter 7 defines restoration fragility functions as tools to measure uncertainties in the restoration process; and Chapter 8 focuses on modeling infrastructure interdependencies using temporal networks at different spatial scales
- …