1,362 research outputs found
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Research-Based on Telecommunication in Mobile Service Provider's Performance using Enhanced Naive Bayes Classifier
In recent years, mobile service providers have rapidly expanded across all countries. Considering unpredictable development trends, mobile service providers are essential to knowledge-based service businesses. Performance may be improved by creating and disseminating new information through innovation activities based on the usage of business intelligence. This research examined the performance of mobile service providers across all countries utilizing an enhanced Naive Bayes classifier based on telecommunication. In comparison to quantitative variables, the naive Bayes performs quite well. In the beginning, data is collected and the normalization technique is used for data preprocessing. Feature extraction is carried out using “Term Frequency and Inverse Document Frequency (TF-IDF)”. “Decision Tree algorithm” is used for data analysis. Then the feature is selected using a two-stage Markov blanket algorithm. Enhanced Naïve Bayes Classifier is the proposed algorithm for telecommunication analysis and at last, the performance of the system is analyzed. This proposed algorithm is used to compare the mobile service provider's performances with existing algorithms. The proposed method measures the following metrics as Throughput, Packet loss, Packet duplication, and User quality of experience. The proposed algorithm is more effective and produces better results. 
Machine learning approaches for early DRG classification and resource allocation
Recent research has highlighted the need for upstream planning in healthcare service delivery systems, patient scheduling, and resource allocation in the hospital inpatient setting. This study examines the value of upstream planning within hospital-wide resource allocation decisions based on machine learning (ML) and mixed-integer programming (MIP), focusing on prediction of diagnosis-related groups (DRGs) and the use of these predictions for allocating scarce hospital resources. DRGs are a payment scheme employed at patients’ discharge, where the DRG and length of stay determine the revenue that the hospital obtains. We show that early and accurate DRG classification using ML methods, incorporated into an MIP-based resource allocation model, can increase the hospital’s contribution margin, the number of admitted patients, and the utilization of resources such as operating rooms and beds. We test these methods on hospital data containing more than 16,000 inpatient records and demonstrate improved DRG classification accuracy as compared to the hospital’s current approach. The largest improvements were observed at and before admission, when information such as procedures and diagnoses is typically incomplete, but performance was improved even after a substantial portion of the patient’s length of stay, and under multiple scenarios making different assumptions about the available information. Using the improved DRG predictions within our resource allocation model improves contribution margin by 2.9% and the utilization of scarce resources such as operating rooms and beds from 66.3% to 67.3% and from 70.7% to 71.7%, respectively. This enables 9.0% more nonurgent elective patients to be admitted as compared to the baseline
Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification
Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms
Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective
Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them
- …