180 research outputs found
Specification of dependence structures and simulation-based estimation for conditionally specified statistical models
Conditionally specified statistical models are frequently constructed from conditional one-parameter exponential family distributions. One way to formulate such a model is to specify the dependence structure among random variables through the use of a Markov random field. When this is done, a common assumption is that dependence is expressed only through pairs of random variables, the \u27pairwise-only dependence\u27 assumption. Using a Markov random field structure and the pairwise-only dependence assumption, Besag (1974) formulated exponential family \u27auto-models\u27, and showed the form that conditional one-parameter exponential family densities must have in such models. Those results are extended under relaxation of the pairwise-only dependence assumption, and a necessary form for conditional one-parameter exponential family densities is given under more general conditions of multiway dependence;A strategy is proposed for maximum likelihood estimation of parameters appearing in the joint distribution of a set of random variables modeled through the specification of full conditional probability density or mass functions. This strategy relies on maximization of a sequence of Monte Carlo approximations to the log likelihood function. The fundamental issue addressed in our strategy is formulation of an importance sampling distribution as a product of marginal functions, where those marginals are chosen in a way that reflects the influence of dependence on the first two moments of the actual statistical model under consideration. We address a number of practical issues in the use of Monte Carlo methods to locate maximum likelihood estimates, including criteria for when an additional sampling distribution should be selected and the selection of appropriate starting values. This estimation strategy is extended to mixture models in which the mixing distributions are identified up to the normalizing constants by the specification of full conditional probability density or mass functions;The large sample theory for the resulting estimates from the proposed strategy is provided under the condition of the continuity of the negpotential function over the compact set. In addition, convergence of the Monte Carlo estimate of log likelihood to the true log likelihood and asymptotic results are given for the theoretical support for one of solutions of practical issues in the estimation strategy
Business Integrity, Public Sector Integrity, Income, and National Competitiveness : A Cross-Country Level Analysis
This paper's main objective is to investigate the new paradigm on combating corruption in both business and public sectors as proposed by Kang and Lee (2003). Utilising a cross-sectional data of 32 countries from the 2002 GCR, combined with the 2002 opacity index and the 2003 CPI, the empirical results are consistent with the proposition that business integrity and public sector integrity lead to economic efficiency, which in turn enhances national competitiveness. We also suggest that per capita real income, business integrity, and public sector integrity are positively inter-related. On the basis of this study, policy makers should choose an objective by comprehensive approach and develop a checks and balances system in both sectors. More specifically, business integrity has a greater effect on both per capita real income and national competitiveness than public sector integrity. It is also evident in cross-country comparisons that national competitiveness has a higher elasticity with respect to both business integrity and public sector integrity in Korea than in the other countries
Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
<p>Abstract</p> <p>Background</p> <p>Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.</p> <p>Results</p> <p>The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC<sub>50 </sub>scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.</p> <p>Conclusion</p> <p>The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.</p
Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation
The paradigm of worst-group loss minimization has shown its promise in
avoiding to learn spurious correlations, but requires costly additional
supervision on spurious attributes. To resolve this, recent works focus on
developing weaker forms of supervision -- e.g., hyperparameters discovered with
a small number of validation samples with spurious attribute annotation -- but
none of the methods retain comparable performance to methods using full
supervision on the spurious attribute. In this paper, instead of searching for
weaker supervisions, we ask: Given access to a fixed number of samples with
spurious attribute annotations, what is the best achievable worst-group loss if
we "fully exploit" them? To this end, we propose a pseudo-attribute-based
algorithm, coined Spread Spurious Attribute (SSA), for improving the
worst-group accuracy. In particular, we leverage samples both with and without
spurious attribute annotations to train a model to predict the spurious
attribute, then use the pseudo-attribute predicted by the trained model as
supervision on the spurious attribute to train a new robust model having
minimal worst-group loss. Our experiments on various benchmark datasets show
that our algorithm consistently outperforms the baseline methods using the same
number of validation samples with spurious attribute annotations. We also
demonstrate that the proposed SSA can achieve comparable performances to
methods using full (100%) spurious attribute supervision, by using a much
smaller number of annotated samples -- from 0.6% and up to 1.5%, depending on
the dataset.Comment: ICLR 2022 camera read
Patch-level Representation Learning for Self-supervised Vision Transformers
Recent self-supervised learning (SSL) methods have shown impressive results
in learning visual representations from unlabeled images. This paper aims to
improve their performance further by utilizing the architectural advantages of
the underlying neural network, as the current state-of-the-art visual pretext
tasks for SSL do not enjoy the benefit, i.e., they are architecture-agnostic.
In particular, we focus on Vision Transformers (ViTs), which have gained much
attention recently as a better architectural choice, often outperforming
convolutional networks for various visual tasks. The unique characteristic of
ViT is that it takes a sequence of disjoint patches from an image and processes
patch-level representations internally. Inspired by this, we design a simple
yet effective visual pretext task, coined SelfPatch, for learning better
patch-level representations. To be specific, we enforce invariance against each
patch and its neighbors, i.e., each patch treats similar neighboring patches as
positive samples. Consequently, training ViTs with SelfPatch learns more
semantically meaningful relations among patches (without using human-annotated
labels), which can be beneficial, in particular, to downstream tasks of a dense
prediction type. Despite its simplicity, we demonstrate that it can
significantly improve the performance of existing SSL methods for various
visual tasks, including object detection and semantic segmentation.
Specifically, SelfPatch significantly improves the recent self-supervised ViT,
DINO, by achieving +1.3 AP on COCO object detection, +1.2 AP on COCO instance
segmentation, and +2.9 mIoU on ADE20K semantic segmentation.Comment: Accepted to CVPR 2022 (Oral). Code is available at
https://github.com/alinlab/SelfPatc
A Self-Supervised Automatic Post-Editing Data Generation Tool
Data building for automatic post-editing (APE) requires extensive and
expert-level human effort, as it contains an elaborate process that involves
identifying errors in sentences and providing suitable revisions. Hence, we
develop a self-supervised data generation tool, deployable as a web
application, that minimizes human supervision and constructs personalized APE
data from a parallel corpus for several language pairs with English as the
target language. Data-centric APE research can be conducted using this tool,
involving many language pairs that have not been studied thus far owing to the
lack of suitable data.Comment: Accepted for DataPerf workshop at ICML 202
QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
With the recent advance in neural machine translation demonstrating its
importance, research on quality estimation (QE) has been steadily progressing.
QE aims to automatically predict the quality of machine translation (MT) output
without reference sentences. Despite its high utility in the real world, there
remain several limitations concerning manual QE data creation: inevitably
incurred non-trivial costs due to the need for translation experts, and issues
with data scaling and language expansion. To tackle these limitations, we
present QUAK, a Korean-English synthetic QE dataset generated in a fully
automatic manner. This consists of three sub-QUAK datasets QUAK-M, QUAK-P, and
QUAK-H, produced through three strategies that are relatively free from
language constraints. Since each strategy requires no human effort, which
facilitates scalability, we scale our data up to 1.58M for QUAK-P, H and 6.58M
for QUAK-M. As an experiment, we quantitatively analyze word-level QE results
in various ways while performing statistical analysis. Moreover, we show that
datasets scaled in an efficient way also contribute to performance improvements
by observing meaningful performance gains in QUAK-M, P when adding data up to
1.58M
Interhemispheric asymmetry of c-Fos expression in glomeruli and the olfactory tubercle following repeated odor stimulation
Odor adaptation allows the olfactory system to regulate sensitivity to different stimulus intensities, which is essential for preventing saturation of the cell-transducing machinery and maintaining high sensitivity to persistent and repetitive odor stimuli. Although many studies have investigated the structure and mechanisms of the mammalian olfactory system that responds to chemical sensation, few studies have considered differences in neuronal activation that depend on the manner in which the olfactory system is exposed to odorants, or examined activity patterns of olfactory-related regions in the brain under different odor exposure conditions. To address these questions, we designed three different odor exposure conditions that mimicked diverse odor environments and analyzed c-Fos-expressing cells (c-Fos+ cells) in the odor columns of the olfactory bulb (OB). We then measured differences in the proportions of c-Fos-expressing cell types depending on the odor exposure condition. Surprisingly, under the specific odor condition in which the olfactory system was repeatedly exposed to the odorant for 1 min at 5-min intervals, one of the lateral odor columns and the ipsilateral hemisphere of the olfactory tubercle had more c-Fos+ cells than the other three odor columns and the contralateral hemisphere of the olfactory tubercle. However, this interhemispheric asymmetry of c-Fos expression was not observed in the anterior piriform cortex. To confirm whether the anterior olfactory nucleus pars externa (AONpE), which connects the left and right OB, contributes to this asymmetry, AONpE-lesioned mice were analyzed under the specific odor exposure condition. Asymmetric c-Fos expression was not observed in the OB or the olfactory tubercle. These data indicate that the c-Fos expression patterns of the olfactory-related regions in the brain are influenced by the odor exposure condition and that asymmetric c-Fos expression in these regions was observed under a specific odor exposure condition due to synaptic linkage via the AONpE. © 2020 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.1
Ice Velocity Mapping of Ross Ice Shelf, Antarctica by Matching Surface Undulations Measured by Icesat Laser Altimetry
We present a novel method for estimating the surface horizontal velocity on ice shelves using laser altimetrydata from the Ice Cloud and land Elevation Satellite (ICESat; 20032009). The method matches undulations measured at crossover points between successive campaigns
- …