1,741 research outputs found

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    A Hierarchical Filtering-Based Monitoring Architecture for Large-scale Distributed Systems

    Get PDF
    On-line monitoring is essential for observing and improving the reliability and performance of large-scale distributed (LSD) systems. In an LSD environment, large numbers of events are generated by system components during their execution and interaction with external objects (e.g. users or processes). These events must be monitored to accurately determine the run-time behavior of an LSD system and to obtain status information that is required for debugging and steering applications. However, the manner in which events are generated in an LSD system is complex and represents a number of challenges for an on-line monitoring system. Correlated events axe generated concurrently and can occur at multiple locations distributed throughout the environment. This makes monitoring an intricate task and complicates the management decision process. Furthermore, the large number of entities and the geographical distribution inherent with LSD systems increases the difficulty of addressing traditional issues, such as performance bottlenecks, scalability, and application perturbation. This dissertation proposes a scalable, high-performance, dynamic, flexible and non-intrusive monitoring architecture for LSD systems. The resulting architecture detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is disseminated to management applications, such as reactive control and debugging tools. The monitoring architecture employs a novel hierarchical event filtering approach that distributes the monitoring load and limits event propagation. This significantly improves scalability and performance while minimizing the monitoring intrusiveness. The architecture provides dynamic monitoring capabilities through: subscription policies that enable applications developers to add, delete and modify monitoring demands on-the-fly, an adaptable configuration that accommodates environmental changes, and a programmable environment that facilitates development of self-directed monitoring tasks. Increased flexibility is achieved through a declarative and comprehensive monitoring language, a simple code instrumentation process, and automated monitoring administration. These elements substantially relieve the burden imposed by using on-line distributed monitoring systems. In addition, the monitoring system provides techniques to manage the trade-offs between various monitoring objectives. The proposed solution offers improvements over related works by presenting a comprehensive architecture that considers the requirements and implied objectives for monitoring large-scale distributed systems. This architecture is referred to as the HiFi monitoring system. To demonstrate effectiveness at debugging and steering LSD systems, the HiFi monitoring system has been implemented at the Old Dominion University for monitoring the Interactive Remote Instruction (IRI) system. The results from this case study validate that the HiFi system achieves the objectives outlined in this thesis

    Runtime Adaptation of Scientific Service Workflows

    Get PDF
    Software landscapes are rather subject to change than being complete after having been built. Changes may be caused by a modified customer behavior, the shift to new hardware resources, or otherwise changed requirements. In such situations, several challenges arise. New architectural models have to be designed and implemented, existing software has to be integrated, and, finally, the new software has to be deployed, monitored, and, where appropriate, optimized during runtime under realistic usage scenarios. All of these situations often demand manual intervention, which causes them to be error-prone. This thesis addresses these types of runtime adaptation. Based on service-oriented architectures, an environment is developed that enables the integration of existing software (i.e., the wrapping of legacy software as web services). A workflow modeling tool that aims at an easy-to-use approach by separating the role of the workflow expert and the role of the domain expert. After the development of workflows, tools that observe the executing infrastructure and perform automatic scale-in and scale-out operations are presented. Infrastructure-as-a-Service providers are used to scale the infrastructure in a transparent and cost-efficient way. The deployment of necessary middleware tools is automatically done. The use of a distributed infrastructure can lead to communication problems. In order to keep workflows robust, these exceptional cases need to treated. But, in this way, the process logic of a workflow gets mixed up and bloated with infrastructural details, which yields an increase in its complexity. In this work, a module is presented that can deal automatically with infrastructural faults and that thereby allows to keep the separation of these two layers. When services or their components are hosted in a distributed environment, some requirements need to be addressed at each service separately. Although techniques as object-oriented programming or the usage of design patterns like the interceptor pattern ease the adaptation of service behavior or structures. Still, these methods require to modify the configuration or the implementation of each individual service. On the other side, aspect-oriented programming allows to weave functionality into existing code even without having its source. Since the functionality needs to be woven into the code, it depends on the specific implementation. In a service-oriented architecture, where the implementation of a service is unknown, this approach clearly has its limitations. The request/response aspects presented in this thesis overcome this obstacle and provide a SOA-compliant and new methods to weave functionality into the communication layer of web services. The main contributions of this thesis are the following: Shifting towards a service-oriented architecture: The generic and extensible Legacy Code Description Language and the corresponding framework allow to wrap existing software, e.g., as web services, which afterwards can be composed into a workflow by SimpleBPEL without overburdening the domain expert with technical details that are indeed handled by a workflow expert. Runtime adaption: Based on the standardized Business Process Execution Language an automatic scheduling approach is presented that monitors all used resources and is able to automatically provision new machines in case a scale-out becomes necessary. If the resource's load drops, e.g., because of less workflow executions, a scale-in is also automatically performed. The scheduling algorithm takes the data transfer between the services into account in order to prevent scheduling allocations that eventually increase the workflow's makespan due to unnecessary or disadvantageous data transfers. Furthermore, a multi-objective scheduling algorithm that is based on a genetic algorithm is able to additionally consider cost, in a way that a user can define her own preferences rising from optimized execution times of a workflow and minimized costs. Possible communication errors are automatically detected and, according to certain constraints, corrected. Adaptation of communication: The presented request/response aspects allow to weave functionality into the communication of web services. By defining a pointcut language that only relies on the exchanged documents, the implementation of services must neither be known nor be available. The weaving process itself is modeled using web services. In this way, the concept of request/response aspects is naturally embedded into a service-oriented architecture

    Segmentation of Moving Object with Uncovered Background, Temporary Poses and GMOB

    Get PDF
    AbstractVideo has to be segmented into objects for content-based processing. A number of video object segmentation algorithms have been proposed such as semiautomatic and automatic. Semiautomatic methods adds burden to users and also not suitable for some applications. Automatic segmentation systems are still a challenge, although they are required by many applications. The proposed work aims at contributing to identify the gaps that are present in the current segmentation system and also to give the possible solutions to overcome those gaps so that the accurate and efficient video segmentation system can be developed. The proposed system aims to resolve the issue of uncovered background, Temporary poses and Global motion of background

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Toxicity in Evolving Twitter Topics - Employing a novel Dynamic Topic volution Model (DyTEM) onTwitter data

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis thesis presents an extensive investigation into the evolution of topics and their association with speech toxicity on Twitter, based on a large corpus of tweets, providing crucial insights for monitoring online discourse and potentially informing interventions to combat toxic behavior in digital communities. A Dynamic Topic Evolution Model (DyTEM) is introduced, constructed by combining static Topic Modelling techniques and sentence embeddings through the state-of-the-art sentence transformer, sBERT. The DyTEM, tested and validated on a substantial sample of tweets, is represented as a directed graph, encapsulating the inherent dynamism of Twitter discussions. For validating the consistency of DyTEM and providing guidance for hyperparameter selection, a novel, hashtag-based validation method is proposed. The analysis identifies and scrutinizes five distinct Topic Transition Types: Topic Stagnation, Topic Merge, Topic Split, Topic Disappearance, and Topic Emergence. A speech toxicity classification model is employed to delve into the toxicity dynamics within topic evolution. A standout finding of this study is the positive correlation between topic popularity and its toxicity, implying that trending or viral topics tend to contain more inflammatory speech. This insight, along with the methodologies introduced in this study, contributes significantly to the broader understanding of digital discourse dynamics and could guide future strategies aimed at fostering healthier and more constructive online spaces

    Multiprocessor System-on-Chips based Wireless Sensor Network Energy Optimization

    Get PDF
    Wireless Sensor Network (WSN) is an integrated part of the Internet-of-Things (IoT) used to monitor the physical or environmental conditions without human intervention. In WSN one of the major challenges is energy consumption reduction both at the sensor nodes and network levels. High energy consumption not only causes an increased carbon footprint but also limits the lifetime (LT) of the network. Network-on-Chip (NoC) based Multiprocessor System-on-Chips (MPSoCs) are becoming the de-facto computing platform for computationally extensive real-time applications in IoT due to their high performance and exceptional quality-of-service. In this thesis a task scheduling problem is investigated using MPSoCs architecture for tasks with precedence and deadline constraints in order to minimize the processing energy consumption while guaranteeing the timing constraints. Moreover, energy-aware nodes clustering is also performed to reduce the transmission energy consumption of the sensor nodes. Three distinct problems for energy optimization are investigated given as follows: First, a contention-aware energy-efficient static scheduling using NoC based heterogeneous MPSoC is performed for real-time tasks with an individual deadline and precedence constraints. An offline meta-heuristic based contention-aware energy-efficient task scheduling is developed that performs task ordering, mapping, and voltage assignment in an integrated manner. Compared to state-of-the-art scheduling our proposed algorithm significantly improves the energy-efficiency. Second, an energy-aware scheduling is investigated for a set of tasks with precedence constraints deploying Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs. A novel population based algorithm called ARSH-FATI is developed that can dynamically switch between explorative and exploitative search modes at run-time. ARSH-FATI performance is superior to the existing task schedulers developed for homogeneous VFI-NoC-MPSoCs. Third, the transmission energy consumption of the sensor nodes in WSN is reduced by developing ARSH-FATI based Cluster Head Selection (ARSH-FATI-CHS) algorithm integrated with a heuristic called Novel Ranked Based Clustering (NRC). In cluster formation parameters such as residual energy, distance parameters, and workload on CHs are considered to improve LT of the network. The results prove that ARSH-FATI-CHS outperforms other state-of-the-art clustering algorithms in terms of LT.University of Derby, Derby, U
    • 

    corecore