355 research outputs found
Faster Algorithms for Weighted Recursive State Machines
Pushdown systems (PDSs) and recursive state machines (RSMs), which are
linearly equivalent, are standard models for interprocedural analysis. Yet RSMs
are more convenient as they (a) explicitly model function calls and returns,
and (b) specify many natural parameters for algorithmic analysis, e.g., the
number of entries and exits. We consider a general framework where RSM
transitions are labeled from a semiring and path properties are algebraic with
semiring operations, which can model, e.g., interprocedural reachability and
dataflow analysis problems.
Our main contributions are new algorithms for several fundamental problems.
As compared to a direct translation of RSMs to PDSs and the best-known existing
bounds of PDSs, our analysis algorithm improves the complexity for
finite-height semirings (that subsumes reachability and standard dataflow
properties). We further consider the problem of extracting distance values from
the representation structures computed by our algorithm, and give efficient
algorithms that distinguish the complexity of a one-time preprocessing from the
complexity of each individual query. Another advantage of our algorithm is that
our improvements carry over to the concurrent setting, where we improve the
best-known complexity for the context-bounded analysis of concurrent RSMs.
Finally, we provide a prototype implementation that gives a significant
speed-up on several benchmarks from the SLAM/SDV project
Comparing penalization methods for linear models on large observational health data
Objective: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. Materials and Methods: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedman's test and critical difference diagrams. Results: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. Conclusion:L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.</p
Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
Background: There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods: We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results: We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions: Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.</p
Automatic Abstraction for Congruences
One approach to verifying bit-twiddling algorithms is to derive invariants between the bits that constitute the variables of a program. Such invariants can often be described with systems of congruences where in each equation , (unknown variable m)\vec{c}\vec{x}$ is a vector of propositional variables (bits). Because of the low-level nature of these invariants and the large number of bits that are involved, it is important that the transfer functions can be derived automatically. We address this problem, showing how an analysis for bit-level congruence relationships can be decoupled into two parts: (1) a SAT-based abstraction (compilation) step which can be automated, and (2) an interpretation step that requires no SAT-solving. We exploit triangular matrix forms to derive transfer functions efficiently, even in the presence of large numbers of bits. Finally we propose program transformations that improve the analysis results
Graph-Based Shape Analysis Beyond Context-Freeness
We develop a shape analysis for reasoning about relational properties of data
structures. Both the concrete and the abstract domain are represented by
hypergraphs. The analysis is parameterized by user-supplied indexed graph
grammars to guide concretization and abstraction. This novel extension of
context-free graph grammars is powerful enough to model complex data structures
such as balanced binary trees with parent pointers, while preserving most
desirable properties of context-free graph grammars. One strength of our
analysis is that no artifacts apart from grammars are required from the user;
it thus offers a high degree of automation. We implemented our analysis and
successfully applied it to various programs manipulating AVL trees,
(doubly-linked) lists, and combinations of both
CFA2: a Context-Free Approach to Control-Flow Analysis
In a functional language, the dominant control-flow mechanism is function
call and return. Most higher-order flow analyses, including k-CFA, do not
handle call and return well: they remember only a bounded number of pending
calls because they approximate programs with control-flow graphs. Call/return
mismatch introduces precision-degrading spurious control-flow paths and
increases the analysis time. We describe CFA2, the first flow analysis with
precise call/return matching in the presence of higher-order functions and tail
calls. We formulate CFA2 as an abstract interpretation of programs in
continuation-passing style and describe a sound and complete summarization
algorithm for our abstract semantics. A preliminary evaluation shows that CFA2
gives more accurate data-flow information than 0CFA and 1CFA.Comment: LMCS 7 (2:3) 201
Software Model Checking with Explicit Scheduler and Symbolic Threads
In many practical application domains, the software is organized into a set
of threads, whose activation is exclusive and controlled by a cooperative
scheduling policy: threads execute, without any interruption, until they either
terminate or yield the control explicitly to the scheduler. The formal
verification of such software poses significant challenges. On the one side,
each thread may have infinite state space, and might call for abstraction. On
the other side, the scheduling policy is often important for correctness, and
an approach based on abstracting the scheduler may result in loss of precision
and false positives. Unfortunately, the translation of the problem into a
purely sequential software model checking problem turns out to be highly
inefficient for the available technologies. We propose a software model
checking technique that exploits the intrinsic structure of these programs.
Each thread is translated into a separate sequential program and explored
symbolically with lazy abstraction, while the overall verification is
orchestrated by the direct execution of the scheduler. The approach is
optimized by filtering the exploration of the scheduler with the integration of
partial-order reduction. The technique, called ESST (Explicit Scheduler,
Symbolic Threads) has been implemented and experimentally evaluated on a
significant set of benchmarks. The results demonstrate that ESST technique is
way more effective than software model checking applied to the sequentialized
programs, and that partial-order reduction can lead to further performance
improvements.Comment: 40 pages, 10 figures, accepted for publication in journal of logical
methods in computer scienc
Illness Beliefs Predict Mortality in Patients with Diabetic Foot Ulcers
Background - Patients’ illness beliefs have been associated with glycaemic control in diabetes and survival in other conditions.
Objective - We examined whether illness beliefs independently predicted survival in patients with diabetes and foot ulceration.
Methods - Patients (n = 169) were recruited between 2002 and 2007. Data on illness beliefs were collected at baseline. Data on survival were extracted on 1st November 2011. Number of days survived reflected the number of days from date of recruitment to 1st November 2011.
Results - Cox regressions examined the predictors of time to death and identified ischemia and identity beliefs (beliefs regarding symptoms associated with foot ulceration) as significant predictors of time to death.
Conclusions - Our data indicate that illness beliefs have a significant independent effect on survival in patients with diabetes and foot ulceration. These findings suggest that illness beliefs could improve our understanding of mortality risk in this patient group and could also be the basis for future therapeutic interventions to improve survival
- …