1,083 research outputs found
Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formula are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, also is more robust than the traditional stepwise variable selection. The application is illustrated with a data set on health care demand in Germany. The proposed techniques have been implemented in an open-source R package mpath
Effect of variable shipping frequency on production-distribution policy in a vendor-buyer integrated system
This paper investigates the effect of variable shipping frequency on production-distribution policy in a vendor-buyer integrated system. In a recent article Chiu et al. [1] derived the optimal replenishment lot size for an economic production quantity problem with multi-delivery and quality assurance, based on an assumption that the number of shipment is a given constant. However, in a vendor-buyer integrated system in supply chain environment, joint determination of replenishment lot size and number of shipments may help such a system to gain significant
competitive advantage in terms of becoming a low-cost producer as well as having tight linkage to customer. For this reason, the present study extends the work of Chiu et al. [1] by considering shipping frequency as one of the decision variables and incorporating customer’s stock holding cost into system cost analysis. Hessian matrix equations are employed to certify the convexity of cost function that contains two decision variables, and the effect of variable shipping frequency on production-distribution policy is investigated. A numerical example is provided to demonstrate practical usage of the research result
Solving finite production rate model with scrap and multiple shipments using algebraic approach
This paper solves a finite production rate (FPR) model with scrap and multiple shipments using an algebraic method. Classic FPR model assumes a continuous inventory issuing policy to satisfy demand and perfect quality production for all items produced. However, in real life vendor-buyer integrated production-inventory system, multiple shipment policy is practically used in lieu of a continuous issuing policy and generation of defective items during production run is inevitable. In this study, it is assumed that all defective items are scrap and the perfect quality items can only be delivered to customers if the whole lot is quality assured at the end of the production run. A conventional approach for solving the FPR model is the use of differential calculus on the long-run average cost function with the need to prove optimality first. This paper demonstrates that optimal lot size and its overall costs
for the aforementioned FPR model can be derived without derivatives. As a result, it enables students or practitioners who have little knowledge of calculus to understand and to handle with ease the real-life FPR model
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations
Large language models (LLMs) can generate fluent natural language texts when
given relevant documents as background context. This ability has attracted
considerable interest in developing industry applications of LLMs. However,
LLMs are prone to generate hallucinations that are not supported by the
provided sources. In this paper, we propose a hierarchical framework to detect
and mitigate such ungrounded hallucination. Our framework uses Chain of Natural
Language Inference (CoNLI) for hallucination detection and hallucination
reduction via post-editing. Our approach achieves state-of-the-art performance
on hallucination detection and enhances text quality through rewrite, using
LLMs without any fine-tuning or domain-specific prompt engineering. We show
that this simple plug-and-play framework can serve as an effective choice for
hallucination detection and reduction, achieving competitive performance across
various contexts.Comment: The source code is available at
https://github.com/microsoft/CoNLI_hallucinatio
Solving finite production rate model with scrap and multiple shipments using algebraic approach
This paper solves a finite production rate (FPR) model with scrap and multiple shipments using an algebraic method. Classic FPR model assumes a continuous inventory issuing policy to satisfy demand and perfect quality production for all items produced. However, in real life vendor-buyer integrated production-inventory system, multiple shipment policy is practically used in lieu of a continuous issuing policy and generation of defective items during production run is inevitable. In this study, it is assumed that all defective items are scrap and the perfect quality items can only be delivered to customers if the whole lot is quality assured at the end of the production run. A conventional approach for solving the FPR model is the use of differential calculus on the long-run average cost function with the need to prove optimality first. This paper demonstrates that optimal lot size and its overall costs
for the aforementioned FPR model can be derived without derivatives. As a result, it enables students or practitioners who have little knowledge of calculus to understand and to handle with ease the real-life FPR model
An Investigation of Telecom Mobile Data Billing Plans
In the recent years, mobile operators have provided many billing alternatives such as limited and unlimited billing plans, and shared and non-shared data plans for the users with different needs. A non-shared data plan is designed for a single user with a limited monthly data allowance. On the other hand, the monthly data allowance of a shared data plan is shared by a group of users with multiple devices. The mobile operators often conduct the primary price study to compare their billing plans, which shows the relationship between the prices of the billing plans against the fixed amounts of data usage. Although the primary price study can easily and quickly draw the conclusions, it only provides rough billing plan suggestions. In reality, the amounts of data usage are not fixed, and therefore should be measured from commercial mobile networks to reflect the user behaviors on data usage. This paper proposes an analytical approach by using the measured data of Chunghwa Telecom Co., Ltd. (CHT), the largest telecommunications company in Taiwan, to derive the expected payments of various billing plans. The results of the analytical model are more accurate than those of the primary price study, and therefore provide better suggestions for billing plan selection. Other mobile operators can easily use our model to analyze the billing alternatives with their measured data
Floating Point Arithmetic Protocols for Constructing Secure Data Analysis Application
AbstractA large variety of data mining and machine learning techniques are applied to a wide range of applications today. There- fore, there is a real need to develop technologies that allows data analysis while preserving the confidentiality of the data. Secure multi-party computation (SMC) protocols allows participants to cooperate on various computations while retaining the privacy of their own input data, which is an ideal solution to this issue. Although there is a number of frameworks developed in SMC to meet this challenge, but they are either tailored to perform only on specific tasks or provide very limited precision. In this paper, we have developed protocols for floating point arithmetic based on secure scalar product protocols, which is re- quired in many real world applications. Our protocols follow most of the IEEE-754 standard, supporting the four fundamental arithmetic operations, namely addition, subtraction, multiplication, and division. We will demonstrate the practicality of these protocols through performing various statistical calculations that is widely used in most data analysis tasks. Our experiments show the performance of our framework is both practical and promising
A Model of Technological Imagination and Creativity: Cognitive Task Analysis
An integrated model of cognitive tasks involved in the process of a technological innovation was proposed based on these theories: 1. CDIO theory of technological innovation, 2. Wallas’s creative thinking processes, 3. Khalr & Simon’s theory of scientific discovery, and 4. the conceptual combination theory of imagination. The central theme of this model is the proposition that three cognitive conditions are necessary for technological imagination and innovation: 1. cross-domain knowledge, 2. simple heuristics, and 3. pattern recognition ability. Although the required domain knowledge and implementation methods are different across domains, heuristics that lead to a breakthrough at each phase of CDIO in a technological innovation are similar, with conceptual combination as the cognitive engine for generating original and imaginative ideas
Sample-Specific Debiasing for Better Image-Text Models
Self-supervised representation learning on image-text data facilitates
crucial medical applications, such as image classification, visual grounding,
and cross-modal retrieval. One common approach involves contrasting
semantically similar (positive) and dissimilar (negative) pairs of data points.
Drawing negative samples uniformly from the training data set introduces false
negatives, i.e., samples that are treated as dissimilar but belong to the same
class. In healthcare data, the underlying class distribution is nonuniform,
implying that false negatives occur at a highly variable rate. To improve the
quality of learned representations, we develop a novel approach that corrects
for false negatives. Our method can be viewed as a variant of debiased
constrastive learning that uses estimated sample-specific class probabilities.
We provide theoretical analysis of the objective function and demonstrate the
proposed approach on both image and paired image-text data sets. Our
experiments demonstrate empirical advantages of sample-specific debiasing
- …