612 research outputs found
Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data
Operational network data, management data such as customer care call logs and
equipment system logs, is a very important source of information for network
operators to detect problems in their networks. Unfortunately, there is lack of
efficient tools to automatically track and detect anomalous events on
operational data, causing ISP operators to rely on manual inspection of this
data. While anomaly detection has been widely studied in the context of network
data, operational data presents several new challenges, including the
volatility and sparseness of data, and the need to perform fast detection
(complicating application of schemes that require offline processing or
large/stable data sets to converge).
To address these challenges, we propose Tiresias, an automated approach to
locating anomalous events on hierarchical operational data. Tiresias leverages
the hierarchical structure of operational data to identify high-impact
aggregates (e.g., locations in the network, failure modes) likely to be
associated with anomalous events. To accommodate different kinds of operational
network data, Tiresias consists of an online detection algorithm with low time
and space complexity, while preserving high detection accuracy. We present
results from two case studies using operational data collected at a large
commercial IP network operated by a Tier-1 ISP: customer care call logs and
set-top box crash logs. By comparing with a reference set verified by the ISP's
operational group, we validate that Tiresias can achieve >94% accuracy in
locating anomalies. Tiresias also discovered several previously unknown
anomalies in the ISP's customer care cases, demonstrating its effectiveness
Environmental Controls on Multi-Scale Dynamics of Net Carbon Dioxide Exchange From an Alpine Peatland on the Eastern Qinghai-Tibet Plateau
Peatlands are characterized by their large carbon storage capacity and play an essential role in the global carbon cycle. However, the future of the carbon stored in peatland ecosystems under a changing climate remains unclear. In this study, based on the eddy covariance technique, we investigated the net ecosystem CO2 exchange (NEE) and its controlling factors of the Hongyuan peatland, which is a part of the Ruoergai peatland on the eastern Qinghai-Tibet Plateau (QTP). Our results show that the Hongyuan alpine peatland was a CO2 sink with an annual NEE of -226.61 and -185.35 g C m(-2) in 2014 and 2015, respectively. While, the non-growing season NEE was 53.35 and 75.08 g C m(-2) in 2014 and 2015, suggesting that non-growing seasons carbon emissions should not be neglected. Clear diurnal variation in NEE was observed during the observation period, with the maximum CO2 uptake appearing at 12:30 (Beijing time, UTC+8). The Q(10) value of the non-growing season in 2014 and 2015 was significantly higher than that in the growing season, which suggested that the CO2 flux in the non-growing season was more sensitive to warming than that in the growing season. We investigated the multi-scale temporal variations in NEE during the growing season using wavelet analysis. On daily timescales, photosynthetically active radiation was the primary driver of NEE. Seasonal variation in NEE was mainly driven by soil temperature. The amount of precipitation was more responsible for annual variation of NEE. The increasing number of precipitation event was associated with increasing annual carbon uptake. This study highlights the need for continuous eddy covariance measurements and time series analysis approaches to deepen our understanding of the temporal variability in NEE and multi-scale correlation between NEE and environmental factors
Improving Multi-Task Generalization via Regularizing Spurious Correlation
Multi-Task Learning (MTL) is a powerful learning paradigm to improve
generalization performance via knowledge sharing. However, existing studies
find that MTL could sometimes hurt generalization, especially when two tasks
are less correlated. One possible reason that hurts generalization is spurious
correlation, i.e., some knowledge is spurious and not causally related to task
labels, but the model could mistakenly utilize them and thus fail when such
correlation changes. In MTL setup, there exist several unique challenges of
spurious correlation. First, the risk of having non-causal knowledge is higher,
as the shared MTL model needs to encode all knowledge from different tasks, and
causal knowledge for one task could be potentially spurious to the other.
Second, the confounder between task labels brings in a different type of
spurious correlation to MTL. We theoretically prove that MTL is more prone to
taking non-causal knowledge from other tasks than single-task learning, and
thus generalize worse. To solve this problem, we propose Multi-Task Causal
Representation Learning framework, aiming to represent multi-task knowledge via
disentangled neural modules, and learn which module is causally related to each
task via MTL-specific invariant regularization. Experiments show that it could
enhance MTL model's performance by 5.5% on average over Multi-MNIST, MovieLens,
Taskonomy, CityScape, and NYUv2, via alleviating spurious correlation problem.Comment: Published on NeurIPS 202
Decoupled Contrastive Learning
Contrastive learning (CL) is one of the most successful paradigms for
self-supervised learning (SSL). In a principled way, it considers two augmented
"views" of the same image as positive to be pulled closer, and all other images
as negative to be pushed further apart. However, behind the impressive success
of CL-based techniques, their formulation often relies on heavy-computation
settings, including large sample batches, extensive training epochs, etc. We
are thus motivated to tackle these issues and establish a simple, efficient,
yet competitive baseline of contrastive learning. Specifically, we identify,
from theoretical and empirical studies, a noticeable negative-positive-coupling
(NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning
efficiency concerning the batch size. By removing the NPC effect, we propose
decoupled contrastive learning (DCL) loss, which removes the positive term from
the denominator and significantly improves the learning efficiency. DCL
achieves competitive performance with less sensitivity to sub-optimal
hyperparameters, requiring neither large batches in SimCLR, momentum encoding
in MoCo, or large epochs. We demonstrate with various benchmarks while
manifesting robustness as much less sensitive to suboptimal hyperparameters.
Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch
size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by
6.4%. Further, DCL can be combined with the SOTA contrastive learning method,
NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400
epochs, which represents a new SOTA in contrastive learning. We believe DCL
provides a valuable baseline for future contrastive SSL studies.Comment: Accepted by ECCV202
Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)
Industry recommender systems usually suffer from highly-skewed long-tail item
distributions where a small fraction of the items receives most of the user
feedback. This skew hurts recommender quality especially for the item slices
without much user feedback. While there have been many research advances made
in academia, deploying these methods in production is very difficult and very
few improvements have been made in industry. One challenge is that these
methods often hurt overall performance; additionally, they could be complex and
expensive to train and serve. In this work, we aim to improve tail item
recommendations while maintaining the overall performance with less training
and serving cost. We first find that the predictions of user preferences are
biased under long-tail distributions. The bias comes from the differences
between training and serving data in two perspectives: 1) the item
distributions, and 2) user's preference given an item. Most existing methods
mainly attempt to reduce the bias from the item distribution perspective,
ignoring the discrepancy from user preference given an item. This leads to a
severe forgetting issue and results in sub-optimal performance.
To address the problem, we design a novel Cross Decoupling Network (CDN) (i)
decouples the learning process of memorization and generalization on the item
side through a mixture-of-expert architecture; (ii) decouples the user samples
from different distributions through a regularized bilateral branch network.
Finally, a new adapter is introduced to aggregate the decoupled vectors, and
softly shift the training attention to tail items. Extensive experimental
results show that CDN significantly outperforms state-of-the-art approaches on
benchmark datasets. We also demonstrate its effectiveness by a case study of
CDN in a large-scale recommendation system at Google.Comment: Accepted by KDD 2023 Applied Data Science (ADS) trac
Juvenile Dermatomyositis: A 20-year Retrospective Analysis of Treatment and Clinical Outcomes
BackgroundJuvenile dermatomyositis is a rare childhood multisystem autoimmune disease involving primarily the skin and muscles, and it may lead to long-term disability. This study aimed to describe the clinical course of juvenile dermatomyositis and determine if any early clinical or laboratory features could predict outcome.MethodsMedical charts of patients aged ≤18 years and diagnosed with juvenile dermatomyositis (according to the criteria of Bohan and Peter) at the Pediatric Department, National Taiwan University Hospital, between 1989 and 2009 were reviewed. The endpoints for disease assessment were complete clinical response and complete clinical remission. Cox's proportional hazards model was fitted to identify important predictors of complete clinical remission.ResultsA total of 39 patients with juvenile dermatomyositis were reviewed. Two-thirds were females, and the mean age at disease onset was 81.97 ± 46.63 months. The most common initial presentations were Gottron's papule (82.1%) and muscle weakness (82.1%). After excluding one patient with an incomplete record, the remaining 31 patients who had muscle weakness were analyzed; among them, 22 (70.97%) achieved complete clinical response, but only six (19.4%) achieved complete clinical remission. Multivariate analysis showed that female sex, negative Gowers' sign at disease onset, and positive photosensitivity at disease onset were favorable factors to achieve complete clinical remission. Moreover, covariate-adjusted survival curves were drawn for making predictions of complete clinical remission. Only 13 (33.33%) patients were symptom free at the end of follow up, whereas the other 26 suffered from different kinds of complications. None of them developed malignancy, but two (5.13%) patients died during the follow-up period.ConclusionFactors such as male sex and Gowers' sign were unlikely to favor the achievement of complete clinical remission in juvenile dermatomyositis. Certain complications cannot be avoided, and thus more effective treatments and monitoring strategies are needed for better control of juvenile dermatomyositis
- …