1,362 research outputs found

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    Research-Based on Telecommunication in Mobile Service Provider's Performance using Enhanced Naive Bayes Classifier

    Get PDF
    In recent years, mobile service providers have rapidly expanded across all countries. Considering unpredictable development trends, mobile service providers are essential to knowledge-based service businesses. Performance may be improved by creating and disseminating new information through innovation activities based on the usage of business intelligence. This research examined the performance of mobile service providers across all countries utilizing an enhanced Naive Bayes classifier based on telecommunication. In comparison to quantitative variables, the naive Bayes performs quite well. In the beginning, data is collected and the normalization technique is used for data preprocessing. Feature extraction is carried out using “Term Frequency and Inverse Document Frequency (TF-IDF)”. “Decision Tree algorithm” is used for data analysis. Then the feature is selected using a two-stage Markov blanket algorithm. Enhanced Naïve Bayes Classifier is the proposed algorithm for telecommunication analysis and at last, the performance of the system is analyzed. This proposed algorithm is used to compare the mobile service provider's performances with existing algorithms. The proposed method measures the following metrics as Throughput, Packet loss, Packet duplication, and User quality of experience. The proposed algorithm is more effective and produces better results.&nbsp

    Mining of textual databases within the product development process

    Get PDF

    Machine learning approaches for early DRG classification and resource allocation

    Get PDF
    Recent research has highlighted the need for upstream planning in healthcare service delivery systems, patient scheduling, and resource allocation in the hospital inpatient setting. This study examines the value of upstream planning within hospital-wide resource allocation decisions based on machine learning (ML) and mixed-integer programming (MIP), focusing on prediction of diagnosis-related groups (DRGs) and the use of these predictions for allocating scarce hospital resources. DRGs are a payment scheme employed at patients’ discharge, where the DRG and length of stay determine the revenue that the hospital obtains. We show that early and accurate DRG classification using ML methods, incorporated into an MIP-based resource allocation model, can increase the hospital’s contribution margin, the number of admitted patients, and the utilization of resources such as operating rooms and beds. We test these methods on hospital data containing more than 16,000 inpatient records and demonstrate improved DRG classification accuracy as compared to the hospital’s current approach. The largest improvements were observed at and before admission, when information such as procedures and diagnoses is typically incomplete, but performance was improved even after a substantial portion of the patient’s length of stay, and under multiple scenarios making different assumptions about the available information. Using the improved DRG predictions within our resource allocation model improves contribution margin by 2.9% and the utilization of scarce resources such as operating rooms and beds from 66.3% to 67.3% and from 70.7% to 71.7%, respectively. This enables 9.0% more nonurgent elective patients to be admitted as compared to the baseline

    Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification

    Get PDF
    Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms

    Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

    Get PDF
    Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them
    corecore