47 research outputs found
On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks
Deep convolutional neural networks (CNNs) trained with logistic and softmax
losses have made significant advancement in visual recognition tasks in
computer vision. When training data exhibit class imbalances, the class-wise
reweighted version of logistic and softmax losses are often used to boost
performance of the unweighted version. In this paper, motivated to explain the
reweighting mechanism, we explicate the learning property of those two loss
functions by analyzing the necessary condition (e.g., gradient equals to zero)
after training CNNs to converge to a local minimum. The analysis immediately
provides us explanations for understanding (1) quantitative effects of the
class-wise reweighting mechanism: deterministic effectiveness for binary
classification using logistic loss yet indeterministic for multi-class
classification using softmax loss; (2) disadvantage of logistic loss for
single-label multi-class classification via one-vs.-all approach, which is due
to the averaging effect on predicted probabilities for the negative class
(e.g., non-target classes) in the learning process. With the disadvantage and
advantage of logistic loss disentangled, we thereafter propose a novel
reweighted logistic loss for multi-class classification. Our simple yet
effective formulation improves ordinary logistic loss by focusing on learning
hard non-target classes (target vs. non-target class in one-vs.-all) and turned
out to be competitive with softmax loss. We evaluate our method on several
benchmark datasets to demonstrate its effectiveness.Comment: AAAI2020. Previously this appeared as arXiv:1906.04026v2, which was
submitted as a replacement by acciden
Learning Compact Features via In-Training Representation Alignment
Deep neural networks (DNNs) for supervised learning can be viewed as a
pipeline of the feature extractor (i.e., last hidden layer) and a linear
classifier (i.e., output layer) that are trained jointly with stochastic
gradient descent (SGD) on the loss function (e.g., cross-entropy). In each
epoch, the true gradient of the loss function is estimated using a mini-batch
sampled from the training set and model parameters are then updated with the
mini-batch gradients. Although the latter provides an unbiased estimation of
the former, they are subject to substantial variances derived from the size and
number of sampled mini-batches, leading to noisy and jumpy updates. To
stabilize such undesirable variance in estimating the true gradients, we
propose In-Training Representation Alignment (ITRA) that explicitly aligns
feature distributions of two different mini-batches with a matching loss in the
SGD training process. We also provide a rigorous analysis of the desirable
effects of the matching loss on feature representation learning: (1) extracting
compact feature representation; (2) reducing over-adaption on mini-batches via
an adaptive weighting mechanism; and (3) accommodating to multi-modalities.
Finally, we conduct large-scale experiments on both image and text
classifications to demonstrate its superior performance to the strong
baselines.Comment: 11 pages, 4 figures, 6 tables. Accepted for publication by AAAI-23.
arXiv admin note: text overlap with arXiv:2002.0991
GOODAT: Towards Test-time Graph Out-of-Distribution Detection
Graph neural networks (GNNs) have found widespread application in modeling
graph data across diverse domains. While GNNs excel in scenarios where the
testing data shares the distribution of their training counterparts (in
distribution, ID), they often exhibit incorrect predictions when confronted
with samples from an unfamiliar distribution (out-of-distribution, OOD). To
identify and reject OOD samples with GNNs, recent studies have explored graph
OOD detection, often focusing on training a specific model or modifying the
data on top of a well-trained GNN. Despite their effectiveness, these methods
come with heavy training resources and costs, as they need to optimize the
GNN-based models on training data. Moreover, their reliance on modifying the
original GNNs and accessing training data further restricts their universality.
To this end, this paper introduces a method to detect Graph Out-of-Distribution
At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play
solution that operates independently of training data and modifications of GNN
architecture. With a lightweight graph masker, GOODAT can learn informative
subgraphs from test samples, enabling the capture of distinct graph patterns
between OOD and ID samples. To optimize the graph masker, we meticulously
design three unsupervised objective functions based on the graph information
bottleneck principle, motivating the masker to capture compact yet informative
subgraphs for OOD detection. Comprehensive evaluations confirm that our GOODAT
method outperforms state-of-the-art benchmarks across a variety of real-world
datasets. The code is available at Github: https://github.com/Ee1s/GOODATComment: 9 pages, 5 figure
Phonon-assisted radiofrequency absorption by gold nanoparticles resulting in hyperthermia
It is suggested that in gold nanoparticles (GNPs) of about 5 nm sizes used in
the radiofrequency (RF) hyperthermia, an absorption of the RF photon by the
Fermi electron occurs with involvement of the longitudinal acoustic vibrational
mode (LAVM), the dominating one in the distribution of vibrational density of
states (VDOS). This physical mechanism helps to explain two observed phenomena:
the size dependence of the heating rate (HR) in GNPs and reduced heat
production in aggregated GNPs. The argumentation proceeds within the
one-electron approximation, taking into account the discretenesses of energies
and momenta of both electrons and LAVMs. The heating of GNPs is thought to
consist of two consecutive processes: first, the Fermi electron absorbs
simultaneously the RF photon and the LAVM available in the GNP; hereafter the
excited electron gets relaxed within the GNP's boundary, exciting a LAVM with
the energy higher than that of the previously absorbed LAVM. GNPs containing
the Ta and/or Fe impurities are proposed for the RF hyperthermia as promising
heaters with enhanced HRs, and GNPs with rare-earth impurity atoms are also
brought into consideration. It is shown why the maximum HR values should be
expected in GNPs with about 5-7 nm size.Comment: proceedings at the NATO Advanced Research workshop FANEM-2015 (Minsk,
May 25-27, 2015). To be published in the final form in: "Fundamental and
Applied NanoElectroMagnetics" (Springer Science + Business Media B.V.
Autonomous Overlapping Community Detection in Temporal Networks: A Dynamic Bayesian Nonnegative Matrix Factorization Approach.
A wide variety of natural or artificial systems can be modeled as time-varying or temporal networks. To understand the structural and functional properties of these time-varying networked systems, it is desirable to detect and analyze the evolving community structure. In temporal networks, the identified communities should reflect the current snapshot network, and at the same time be similar to the communities identified in history or say the previous snapshot networks. Most of the existing approaches assume that the number of communities is known or can be obtained by some heuristic methods. This is unsuitable and complicated for most real world networks, especially temporal networks. In this paper, we propose a Bayesian probabilistic model, named Dynamic Bayesian Nonnegative Matrix Factorization (DBNMF), for automatic detection of overlapping communities in temporal networks. Our model can not only give the overlapping community structure based on the probabilistic memberships of nodes in each snapshot network but also automatically determines the number of communities in each snapshot network based on automatic relevance determination. Thereafter, a gradient descent algorithm is proposed to optimize the objective function of our DBNMF model. The experimental results using both synthetic datasets and real-world temporal networks demonstrate that the DBNMF model has superior performance compared with two widely used methods, especially when the number of communities is unknown and when the network is highly sparse
Adjuvant Chemotherapy Versus Adjuvant Concurrent Chemoradiotherapy After Radical Surgery for Early-Stage Cervical Cancer: A Randomized, Non-Inferiority, Multicenter Trial
We conducted a prospective study to assess the non-inferiority of adjuvant chemotherapy alone versus adjuvant concurrent chemoradiotherapy (CCRT) as an alternative strategy for patients with early-stage (FIGO 2009 stage IB-IIA) cervical cancer having risk factors after surgery. The condition was assessed in terms of prognosis, adverse effects, and quality of life. This randomized trial involved nine centers across China. Eligible patients were randomized to receive adjuvant chemotherapy or CCRT after surgery. The primary end-point was progression-free survival (PFS). From December 2012 to December 2014, 337 patients were subjected to randomization. Final analysis included 329 patients, including 165 in the adjuvant chemotherapy group and 164 in the adjuvant CCRT group. The median follow-up was 72.1 months. The three-year PFS rates were both 91.9%, and the five-year OS was 90.6% versus 90.0% in adjuvant chemotherapy and CCRT groups, respectively. No significant differences were observed in the PFS or OS between groups. The adjusted HR for PFS was 0.854 (95% confidence interval 0.415-1.757; P = 0.667) favoring adjuvant chemotherapy, excluding the predefined non-inferiority boundary of 1.9. The chemotherapy group showed a tendency toward good quality of life. In comparison with post-operative adjuvant CCRT, adjuvant chemotherapy treatment showed non-inferior efficacy in patients with early-stage cervical cancer having pathological risk factors. Adjuvant chemotherapy alone is a favorable alternative post-operative treatment
Genome-Wide Bovine H3K27me3 Modifications and the Regulatory Effects on Genes Expressions in Peripheral Blood Lymphocytes
Gene expression of lymphocytes was found to be influenced by histone methylation in mammals and trimethylation of lysine 27 on histone H3 (H3K27me3) normally represses genes expressions. Peripheral blood lymphocytes are the main source of somatic cells in the milk of dairy cows that vary frequently in response to the infection or injury of mammary gland and number of parities.The genome-wide status of H3K27me3 modifications on blood lymphocytes in lactating Holsteins was performed via ChIP-Seq approach. Combined with digital gene expression (DGE) technique, the regulation effects of H3K27me3 on genes expressions were analyzed.The ChIP-seq results showed that the peaks of H3K27me3 in cows lymphocytes were mainly enriched in the regions of up20K (~50%), down20K (~30%) and intron (~28%) of the genes. Only ~3% peaks were enriched in exon regions. Moreover, the highest H3K27me3 modification levels were mainly around the 2 Kb upstream of transcriptional start sites (TSS) of the genes. Using conjoint analysis with DGE data, we found that H3K27me3 marks tended to repress target genes expressions throughout whole gene regions especially acting on the promoter region. A total of 53 differential expressed genes were detected in third parity cows compared to first parity, and the 25 down-regulated genes (PSEN2 etc.) were negatively correlated with H3K27me3 levels on up2Kb to up1Kb of the genes, while the up-regulated genes were not showed in this relationship.The first blueprint of bovine H3K27me3 marks that mediates gene silencing was generated. H3K27me3 plays its repressed role mainly in the regulatory region in bovine lymphocytes. The up2Kb to up1Kb region of the down-regulated genes in third parity cows could be potential target of H3K27me3 regulation. Further studies are warranted to understand the regulation mechanisms of H3K27me3 on somatic cell count increases and milk losses in latter parities of cows