136 research outputs found

    On-the-fly tracing for data-centric computing : parallelization, workflow and applications

    Get PDF
    As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization methods put an emphasis on formalism and may not achieve optimal performance with the given limited computing resources. In this work we propose a cross-platform programming paradigm, called on-the-fly data tracing , to provide source-to-source transformation where the same framework also provides the functionality of workflow optimization on larger applications. Using a big-data approximation computations related to large-scale data input are identified in the code and workflow and a simplified core dependence graph is built based on the computational load taking in to account big data. The code can then be partitioned into sections for efficient parallelization; and at the workflow level, optimization can be performed by adjusting the scheduling for big-data considerations, including the I/O performance of the machine. Regarding each unit in both source code and workflow as a model, this framework enables model-based parallel programming that matches the available computing resources. The techniques used in model-based parallel programming as well as the design of the software framework for both parallelization and workflow optimization as well as its implementations with multiple programming languages are presented in the dissertation. Then, the following experiments are performed to validate the framework: i) the benchmarking of parallelization speed-up using typical examples in data analysis and machine learning (e.g. naive Bayes, k-means) and ii) three real-world applications in data-centric computing with the framework are also described to illustrate the efficiency: pattern detection from hurricane and storm surge simulations, road traffic flow prediction and text mining from social media data. In the applications, it illustrates how to build scalable workflows with the framework along with performance enhancements

    Assisting Tourism Supply Chain Performance in Thailand through Big Data Analytics: Moderating Role of IT Capability

    Get PDF
    Oxygen of Thailand’s economy is its tourism industry which has also geared up all related sectors too. Country is making a lot of efforts to achieve sustainable tourism supply chain performance through Big Data Analytics (BDA) as without BDA, it was never possible to move towards sustainability for such a big industry. However, IT capability seems a missing piece in some cases which can decrease the effectiveness of BDA dimensions. Author has aimed to analyze the impact of BDA planning, investment, coordination and control on sustainable tourism development in moderating role of IT capability. Sample was consisted of those hotels and hospitality industry units which already have BDA system in operational phase and their employees were surveyed through questionnaire and responses were then analyzed on SPSS and AMOS for hypotheses testing. Results have enlightened that BDA dimensions have significant positive impact on sustainable tourism supply chain performance while IT capability was also found as positive and significant moderator too. None of the previous studies have taken IT capability as moderator but this originality in this study has long lasting implications for tourism and manufacturing industry to induce IT capability in their organizations to achieve sustainability

    Decentralized Convex Optimization for Wireless Sensor Networks

    Get PDF
    Many real-world applications arising in domains such as large-scale machine learning, wired and wireless networks can be formulated as distributed linear least-squares over a large network. These problems often have their data naturally distributed. For instance applications such as seismic imaging, smart grid have the sensors geographically distributed and the current algorithms to analyze these data rely on centralized approach. The data is either gathered manually, or relayed by expensive broadband stations, and then processed at a base station. This approach is time-consuming (weeks to months) and hazardous as the task involves manual data gathering in extreme conditions. To obtain the solution in real-time, we require decentralized algorithms that do not rely on a fusion center, cluster heads, or multi-hop communication. In this thesis, we propose several decentralized least squares optimization algorithm that are suitable for performing real-time seismic imaging in a sensor network. The algorithms are evaluated and tested using both synthetic and real-data traces. The results validate that our distributed algorithm is able to obtain a satisfactory image similar to centralized computation under constraints of network resources, while distributing the computational burden to sensor nodes

    Probabilistic Inference Using Partitioned Bayesian Networks:Introducing a Compositional Framework

    Get PDF
    Probability theory offers an intuitive and formally sound way to reason in situations that involve uncertainty. The automation of probabilistic reasoning has many applications such as predicting future events or prognostics, providing decision support, action planning under uncertainty, dealing with multiple uncertain measurements, making a diagnosis, and so forth. Bayesian networks in particular have been used to represent probability distributions that model the various applications of uncertainty reasoning. However, present-day automated reasoning approaches involving uncertainty struggle when models increase in size and complexity to fit real-world applications.In this thesis, we explore and extend a state-of-the-art automated reasoning method, called inference by Weighted Model Counting (WMC), when applied to increasingly complex Bayesian network models. WMC is comprised of two distinct phases: compilation and inference. The computational cost of compilation has limited the applicability of WMC. To overcome this limitation we have proposed theoretical and practical solutions that have been tested extensively in empirical studies using real-world Bayesian network models.We have proposed a weighted variant of OBDDs, called Weighted Positive Binary Decision Diagrams (WPBDD), which in turn is based on the new notion of positive Shannon decomposition. WPBDDs are particularly well suited to represent discrete probabilistic models. The conciseness of WPBDDs leads to a reduction in the cost of probabilistic inference.We have introduced Compositional Weighted Model Counting (CWMC), a language-agnostic framework for probabilistic inference that partitions a Bayesian network into subproblems. These subproblems are then compiled and subsequently composed in order to perform inference. This approach significantly reduces the cost of compilation, yet increases the cost of inference. The best results are obtained by seeking a partitioning that allows compilation to (barely) become feasible, but no more, as compilation cost can be amortized over multiple inference queries.Theoretical concepts have been implemented in a readily available open-source tool called ParaGnosis. Further implementational improvements have been found through parallelism, by exploiting independencies that are introduced by CWMC. The proposed methods combined push the boundaries of WMC, allowing this state-of-the-art method to be used on much larger models than before

    Information sharing in supply chains: a review of risks and opportunities using the Systematic Literature Network Analysis (SLNA)

    Get PDF
    Purpose – The purpose of this paper is to identify and discuss the most important research areas on information sharing in supply chains and related risks, taking into account their evolution over time. This paper sheds light on what is happening today and what the trajectories for the future are, with particular respect to the implications for supply chain management. Design/Methodology/Approach – The dynamic literature review method called Systematic Literature Network Analysis (SLNA) was adopted. It combines the Systematic Literature Review approach and bibliographic network analyses, and it relies on objective measures and algorithms to perform quantitative literature-based detection of emerging topics. Findings-The focus of the literature seems to be on threats internal to the extended supply chain rather than external attacks, such as viruses, traditionally related to information technology (IT). The main arising risk appears to be the intentional or non-intentional leakage of information. Also, papers analyse the implications for information sharing coming from " soft " factors such as trust and collaboration among supply chain partners. Opportunities are also highlighted and include how information sharing can be leveraged to confront disruptions and increase resilience. Research limitations/implications – The adopted methodology allows providing an original perspective on the investigated topic, i.e. how information sharing in supply chains and related risks are evolving over time due to the turbulent advances in technology. Practical implications-Emergent and highly critical risks related to information sharing are highlighted to support the design of supply chain risks strategies. Also, critical areas to the development of " beyond-the-dyad " initiatives to manage information sharing risks emerge. Opportunities coming from information sharing that are less known and exploited by companies are provided. Originality/value – This study focuses on the supply chain perspective rather than the traditional IT-based view of information sharing. According to this perspective, this study provides a dynamic representation of the literature on the investigated topic. This is an important contribution to the topic of information sharing in supply chains, which is continuously evolving and shaping new supply chain models

    Logging practices in software engineering : A systematic mapping study

    Get PDF
    Background: Logging practices provide the ability to record valuable runtime information of software systems to support operations tasks such as service monitoring and troubleshooting. However, current logging practices face common challenges. On the one hand, although the importance of logging practices has been broadly recognized, most of them are still conducted in an arbitrary or ad-hoc manner, ending up with questionable or inadequate support to perform these tasks. On the other hand, considerable research effort has been carried out on logging practices, however, few of the proposed techniques or methods have been widely adopted in industry. Objective: This study aims to establish a comprehensive understanding of the research state of logging practices, with a focus on unveiling possible problems and gaps which further shed light on the potential future research directions. Method: We carried out a systematic mapping study on logging practices with 56 primary studies. Results: This study provides a holistic report of the existing research on logging practices by systematically synthesizing and analyzing the focus and inter-relationship of the existing research in terms of issues, research topics and solution approaches. Using 3W1H — Why to log , Where to log , What to log and How well is the logging —as the categorization standard, we find that: (1) the best known issues in logging practices have been repeatedly investigated; (2) the issues are often studied separately without considering their intricate relationships; (3) the Where and What questions have attracted the majority of research attention while little research effort has been made on the Why and How well questions; and (4) the relationships between issues, research topics, and approaches regarding logging practices appear many-to-many, which indicates a lack of profound understanding of the issues in practice and how they should be appropriately tackled. Conclusions: This study indicates a need to advance the state of research on logging practices. For example, more research effort should be invested on why to log to set the anchor of logging practices as well as on how well is the logging to close the loop. In addition, a holistic process perspective should be taken into account in both the research and the adoption related to logging practices

    TOWARDS GENERIC SYSTEM OBSERVATION MANAGEMENT

    Get PDF
    Едно от най-големите предизвикателства на информатиката е да създава правилно работещи компютърни системи. За да се гарантира коректността на една система, по време на дизайн могат де се прилагат формални методи за моделиране и валидация. Този подход е за съжаление труден и скъп за приложение при мнозинството компютърни системи. Алтернативният подход е да се наблюдава и анализира поведението на системата по време на изпълнение след нейното създаване. В този доклад представям научната си работа по въпроса за наблюдение на копютърните системи. Предлагам един общ поглед на три основни страни на проблема: как трябва да се наблюдават компютърните системи, как се използват наблюденията при недетерминистични системи и как се работи по отворен, гъвкав и възпроизводим начин с наблюдения.One of the biggest challenges in computer science is to produce correct computer systems. One way of ensuring system correction is to use formal techniques to validate the system during its design. This approach is compulsory for critical systems but difficult and expensive for most computer systems. The alternative consists in observing and analyzing systems' behavior during execution. In this thesis, I present my research on system observation. I describe my contributions on generic observation mechanisms, on the use of observations for debugging nondeterministic systems and on the definition of an open, flexible and reproducible management of observations.Un des plus grands défis de l'informatique est de produire des systèmes corrects. Une manière d'assurer la correction des systèmes est d'utiliser des méthodes formelles de modélisation et de validation.Obligatoire dans le domaine des systèmes critiques, cette approche est difficile et coûteuse à mettre en place dans la plupart des systèmes informatiques.L'alternative est de vérifier le comportement des systèmes déjà développés en observant et analysant leur comportement à l'exécution.Ce mémoire présente mes contributions autour de l'observation des systèmes. Il discute de la définition de mécanismes génériques d'observation, de l'exploitation des observations pour le débogage de systèmes non déterministes et de la gestion ouverte, flexible et reproductible d'observations

    Location estimation and collective inference in indoor spaces using smartphones

    Get PDF
    In the last decade, indoor localization-based smart, innovative services have become very popular in public spaces (retail spaces, malls, museums, and warehouses). We have state-of-art RSSI techniques to more accurate CSI techniques to infer indoor location. Since the past year, the pandemic has raised an important challenge of determining if a pair of individuals are ``social-distancing,'' separated by more than 6ft. Most solutions have used `presence'-if one device can hear another--- which is a poor proxy for distance since devices can be heard well beyond 6 ft social distancing radius and across aisles and walls. Here we ask the key question: what needs to be added to our current indoor localization solutions to deploy them towards scenarios like reliable contact tracing solutions easily. And we identified three main limitations---deployability, accuracy, and privacy. Location solutions need to deploy on ubiquitous devices like smartphones. They should be accurate under different environmental conditions. The solutions need to respect a person's privacy settings. Our main contributions are twofold -a new statistical feature for localization, Packet Reception Probability (PRP) which correlates with distance and is different from other physical measures of distance like CSI or RSSI. PRP can easily deploy on smartphones (unlike CSI) and is more accurate than RSSI. Second, we develop a crowd tool to audit the level of location surveillance in space which is the first step towards achieving privacy. Specifically, we first solve a location estimation problem with the help of infrastructure devices (mainly Bluetooth Low Energy or BLE devices). BLE has turned out to be a key contact tracing technology during the pandemic. We have identified three fundamental limitations with BLE RSSI---biased RSSI Estimates due to packet loss, mean RSSI de-correlated with distance due to high packet loss in BLE, and well-known multipath effects. We built the new localization feature, Packet Reception Probability (PRP), to solve the packet loss problem in RSSI. PRP measures the probability that a receiver successfully receives packets from the transmitter. We have shown through empirical experiments that PRP encodes distance. We also incorporated a new stack-based model of multipath in our framework. We have evaluated B-PRP in two real-world public places, an academic library setting and a real-world retail store. PRP gives significantly lower errors than RSSI. Fusion of PRP and RSSI further improves the overall localization accuracy over PRP. Next, we solved a peer-to-peer distance estimation problem that uses minimal infrastructure. Most apps like aarogya setu, bluetrace have solved peer-to-peer distances through the presence of Bluetooth Low-Energy (BLE) signals. Apps that rely on pairwise measurements like RSSI suffer from latent factors like device relative positioning on the human body, the orientation of the people carrying the devices, and the environmental multipath effect. We have proposed two solutions in this work---using known distances and collaboration to solve distances more robustly. First, if we have a few infrastructure devices installed at known locations in an environment, we can make more measurements with the devices. We can also use the known distances between the devices to constrain the unknown distances in a triangle inequality framework. Second, in an outdoor environment where we cannot install infrastructure devices, we can collaborate between people to jointly constrain many unknown distances. Finally, we solve a collaborative tracking estimation problem where people audit the properties of localization infrastructure. While people want services, they do not want to be surveilled. Further, people using an indoor location system do not know the current surveillance level. The granularity of the location information that the system collects about people depends on the nature of the infrastructure. Our system, the CrowdEstimator, provides a tool to people to harness their collective power and collect traces for inferring the level of surveillance. We further propose the insight that surveillance is not a single number, instead of a spatial map. We introduce active learning algorithms to infer all parts of the spatial map with uniform accuracy. Auditing the location infrastructure is the first step towards achieving the bigger goal of declarative privacy, where a person can specify their comfortable level of surveillance
    corecore