3,049 research outputs found
Recommended from our members
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through âcrowd-sourcing.â Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for ânext-generation,â high-coverage lexical terminologies.</p
Linear Amplification in Nonequilibrium Turbulent Boundary Layers
Resolvent analysis is applied to nonequilibrium incompressible adverse pressure gradient (APG) turbulent boundary layers (TBL) and hypersonic boundary layers with high temperature real gas effects, including chemical nonequilibrium. Resolvent analysis is an equation-based, scale-dependent decomposition of the Navier Stokes equations, linearized about a known mean flow field. The decomposition identifies the optimal response and forcing modes, ranked by their linear amplification. To treat the nonequilibrium APG TBL, a biglobal resolvent analysis approach is used to account for the streamwise and wall-normal inhomogeneities in the streamwise developing flow. For the hypersonic boundary layer in chemical nonequilibrium, the resolvent analysis is constructed using a parallel flow assumption, incorporating Nâ, Oâ, NO, N, and O as a mixture of chemically reacting gases.
Biglobal resolvent analysis is first applied to the zero pressure gradient (ZPG) TBL. Scaling relationships are determined for the spanwise wavenumber and temporal frequency that admit self-similar resolvent modes in the inner layer, mesolayer, and outer layer regions of the ZPG TBL. The APG effects on the inner scaling of the biglobal modes are shown to diminish as their self-similarity improves with increased Reynolds number. An increase in APG strength is shown to increase the linear amplification of the large-scale biglobal modes in the outer region, similar to the energization of large scale modes observed in simulation. The linear amplification of these modes grows linearly with the APG history, measured as the streamwise averaged APG strength, and relates to a novel pressure-based velocity scale.
Resolvent analysis is then used to identify the length scales most affected by the high-temperature gas effects in hypersonic TBLs. It is shown that the high-temperature gas effects primarily affect modes localized near the peak mean temperature. Due to the chemical nonequilibrium effects, the modes can be linearly amplified through changes in chemical concentration, which have non-negligible effects on the higher order modes. Correlations in the components of the small-scale resolvent modes agree qualitatively with similar correlations in simulation data.
Finally, efficient strategies for resolvent analysis are presented. These include an algorithm to autonomously sample the large amplification regions using a Bayesian Optimization-like approach and a projection-based method to approximate resolvent analysis through a reduced eigenvalue problem, derived from calculus of variations.</p
Algorithms and complexity for approximately counting hypergraph colourings and related problems
The past decade has witnessed advancements in designing efficient algorithms for approximating the number of solutions to constraint satisfaction problems (CSPs), especially in the local lemma regime. However, the phase transition for the computational tractability is not known. This thesis is dedicated to the prototypical problem of this kind of CSPs, the hypergraph colouring. Parameterised by the number of colours q, the arity of each hyperedge k, and the vertex maximum degree Î, this problem falls into the regime of LovĂĄsz local lemma when Î âČ qá”. In prior, however, fast approximate counting algorithms exist when Î âČ qá”/Âł, and there is no known inapproximability result. In pursuit of this, our contribution is two-folded, stated as follows.
âą When q, k â„ 4 are evens and Î â„ 5·qá”/ÂČ, approximating the number of hypergraph colourings is NP-hard.
âą When the input hypergraph is linear and Î âČ qá”/ÂČ, a fast approximate counting algorithm does exist
A probabilistic approach for acoustic emission based monitoring techniques: with application to structural health monitoring
It has been demonstrated that acoustic-emission (AE), inspection of structures can offer advantages over other types of monitoring techniques in the detection of damage; namely, an increased sensitivity to damage, as well as an ability to localise its source. There are, however, numerous challenges associated with the analysis of AE data. One issue is the high sampling frequencies required to capture AE activity. In just a few seconds, a recording can generate very high volumes of data, of which a significant portion may be of little interest for analysis. Identifying the individual AE events in a recorded time-series is therefore a necessary procedure for reducing the size of the dataset and projecting out the influence of background noise from the signal. In this paper, a state-of-the-art technique is presented that can automatically identify cluster the AE events from a probabilistic perspective. A nonparametric Bayesian approach, based on the Dirichlet process (DP), is employed to overcome some of the challenges associated with this task. Additionally, the developed model is applied for damage detection using AE data collected from an experimental setup. Two main sets of AE data are considered in this work: (1) from a journal bearing in operation, and (2) from an Airbus A320 main landing gear subjected to fatigue testing
Semiparametric posterior corrections
We present a new approach to semiparametric inference using corrected
posterior distributions. The method allows us to leverage the adaptivity,
regularization and predictive power of nonparametric Bayesian procedures to
estimate low-dimensional functionals of interest without being restricted by
the holistic Bayesian formalism. Starting from a conventional nonparametric
posterior, we target the functional of interest by transforming the entire
distribution with a Bayesian bootstrap correction. We provide conditions for
the resulting to possess calibrated frequentist
properties and specialize the results for several canonical examples: the
integrated squared density, the mean of a missing-at-random outcome, and the
average causal treatment effect on the treated. The procedure is
computationally attractive, requiring only a simple, efficient post-processing
step that can be attached onto any arbitrary posterior sampling algorithm.
Using the ACIC 2016 causal data analysis competition, we illustrate that our
approach can outperform the existing state-of-the-art through the propagation
of Bayesian uncertainty.Comment: 53 page
Classifier Calibration: A survey on how to assess and improve predicted class probabilities
This paper provides both an introduction to and a detailed overview of the
principles and practice of classifier calibration. A well-calibrated classifier
correctly quantifies the level of uncertainty or confidence associated with its
instance-wise predictions. This is essential for critical applications, optimal
decision making, cost-sensitive classification, and for some types of context
change. Calibration research has a rich history which predates the birth of
machine learning as an academic field by decades. However, a recent increase in
the interest on calibration has led to new methods and the extension from
binary to the multiclass setting. The space of options and issues to consider
is large, and navigating it requires the right set of concepts and tools. We
provide both introductory material and up-to-date technical details of the main
concepts and methods, including proper scoring rules and other evaluation
metrics, visualisation approaches, a comprehensive account of post-hoc
calibration methods for binary and multiclass classification, and several
advanced topics
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning
Human-annotated data plays a critical role in the fairness of AI systems,
including those that deal with life-altering decisions or moderating
human-created web/social media content. Conventionally, annotator disagreements
are resolved before any learning takes place. However, researchers are
increasingly identifying annotator disagreement as pervasive and meaningful.
They also question the performance of a system when annotators disagree.
Particularly when minority views are disregarded, especially among groups that
may already be underrepresented in the annotator population. In this paper, we
introduce \emph{CrowdOpinion}\footnote{Accepted for publication at ACL 2023},
an unsupervised learning based approach that uses language features and label
distributions to pool similar items into larger samples of label distributions.
We experiment with four generative and one density-based clustering method,
applied to five linear combinations of label distributions and features. We use
five publicly available benchmark datasets (with varying levels of annotator
disagreements) from social media (Twitter, Gab, and Reddit). We also experiment
in the wild using a dataset from Facebook, where annotations come from the
platform itself by users reacting to posts. We evaluate \emph{CrowdOpinion} as
a label distribution prediction task using KL-divergence and a single-label
problem using accuracy measures.Comment: Accepted for Publication at ACL 202
- âŠ