317 research outputs found
Graph Homomorphism Revisited for Graph Matching
In a variety of emerging applications one needs to decide whether a graph
G matches
another
G
p
,
i.e.
, whether
G
has a topological structure similar to that of
G
p
. The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applications. This paper studies revisions of these notions, providing a full treatment from complexity to algorithms. (1) We propose
p-homomorphism (p
-hom) and 1-1
p
-hom, which extend graph homomorphism and subgraph isomorphism, respectively, by mapping
edges
from one graph to
paths
in another, and by measuring
the similarity of nodes
. (2) We introduce metrics to measure graph similarity, and several optimization problems for
p
-hom and 1-1
p
-hom. (3) We show that the decision problems for
p
-hom and 1-1
p
-hom are NP-complete even for DAGs, and that the optimization problems are approximation-hard. (4) Nevertheless, we provide approximation algorithms with
provable guarantees
on match quality. We experimentally verify the effectiveness of the revised notions and the efficiency of our algorithms in Web site matching, using real-life and synthetic data.
</jats:p
Propagating functional dependencies with conditions
The dependency propagation problem is to determine, given a view defined on data sources and a set of dependencies on the sources, whether another dependency is guaranteed to hold on the view. This paper investigates dependency propagation for recently proposed conditional functional dependencies (CFDs). The need for this study is evident in data integration, exchange and cleaning since dependencies on data sources often only hold
conditionally
on the view. We investigate dependency propagation for views defined in various fragments of relational algebra, CFDs as view dependencies, and for source dependencies given as either CFDs or traditional functional dependencies (FDs). (a) We establish lower and upper bounds,
all matching
, ranging from PTIME to undecidable. These not only provide the
first
results for CFD propagation, but also extend the classical work of FD propagation by giving new complexity bounds in the presence of finite domains. (b) We provide the first algorithm for computing a minimal cover of
all
CFDs propagated via SPC views; the algorithm has the same complexity as one of the most efficient algorithms for computing a cover of FDs propagated via a projection view, despite the increased expressive power of CFDs and SPC views. (c) We experimentally verify that the algorithm is efficient.
</jats:p
Water Pipeline Leakage Detection Based on Machine Learning and Wireless Sensor Networks
The detection of water pipeline leakage is important to ensure that water supply networks can operate safely and conserve water resources. To address the lack of intelligent and the low efficiency of conventional leakage detection methods, this paper designs a leakage detection method based on machine learning and wireless sensor networks (WSNs). The system employs wireless sensors installed on pipelines to collect data and utilizes the 4G network to perform remote data transmission. A leakage triggered networking method is proposed to reduce the wireless sensor network’s energy consumption and prolong the system life cycle effectively. To enhance the precision and intelligence of leakage detection, we propose a leakage identification method that employs the intrinsic mode function, approximate entropy, and principal component analysis to construct a signal feature set and that uses a support vector machine (SVM) as a classifier to perform leakage detection. Simulation analysis and experimental results indicate that the proposed leakage identification method can effectively identify the water pipeline leakage and has lower energy consumption than the networking methods used in conventional wireless sensor networks
Robust Sparse Mean Estimation via Incremental Learning
In this paper, we study the problem of robust sparse mean estimation, where
the goal is to estimate a -sparse mean from a collection of partially
corrupted samples drawn from a heavy-tailed distribution. Existing estimators
face two critical challenges in this setting. First, they are limited by a
conjectured computational-statistical tradeoff, implying that any
computationally efficient algorithm needs samples, while
its statistically-optimal counterpart only requires samples.
Second, the existing estimators fall short of practical use as they scale
poorly with the ambient dimension. This paper presents a simple mean estimator
that overcomes both challenges under moderate conditions: it runs in
near-linear time and memory (both with respect to the ambient dimension) while
requiring only samples to recover the true mean. At the core of
our method lies an incremental learning phenomenon: we introduce a simple
nonconvex framework that can incrementally learn the top- nonzero elements
of the mean while keeping the zero elements arbitrarily small. Unlike existing
estimators, our method does not need any prior knowledge of the sparsity level
. We prove the optimality of our estimator by providing a matching
information-theoretic lower bound. Finally, we conduct a series of simulations
to corroborate our theoretical findings. Our code is available at
https://github.com/huihui0902/Robust_mean_estimation
Association between tea drinking and disability levels in older Chinese adults: a longitudinal analysis
ObjectiveAs the global population ages, disability among the elderly presents unprecedented challenges for healthcare systems. However, limited research has examined whether dietary interventions like tea consumption may alleviate and prevent disability in older adults. As an important dietary therapy, the health benefits of tea drinking have gained recognition across research disciplines. Therefore, this study aimed to investigate the association between tea drinking habits and disability levels in the elderly Chinese population.MethodsLeveraging data from the 2008 to 2018 waves of the Chinese Longitudinal Healthy Longevity Survey, we disaggregated tea drinking frequency and activities of daily living (ADL) measures and deployed fixed-effect ordered logit models to examine the tea-disability association for the first time. We statistically adjusted for potential confounders and conducted stratified analyses to assess heterogeneity across subpopulations.ResultsMultivariable fixed-effect ordered logistic regression suggested tea drinking has protective effects against ADL disability. However, only daily tea drinking was associated with lower risks of basic activities of daily living (BADL) disability [odds ratio (OR) = 0.61; 95% confidence interval (CI), 0.41–0.92] and lower levels of instrumental activities of daily living (IADL) disability (OR = 0.78; 95% CI, 0.64–0.95). Stratified analyses indicated heterogeneous effects across age and income groups. Daily tea drinking protected against BADL (OR = 0.26 and OR = 0.28) and IADL disability (OR = 0.48 and OR = 0.45) for adults over 83 years old and high-income households, respectively.ConclusionWe found that drinking tea almost daily was protective against disability in elderly people, warranting further research into optimal dosages. Future studies should utilize more rigorous causal inference methods and control for confounders
Focus Is What You Need For Chinese Grammatical Error Correction
Chinese Grammatical Error Correction (CGEC) aims to automatically detect and
correct grammatical errors contained in Chinese text. In the long term,
researchers regard CGEC as a task with a certain degree of uncertainty, that
is, an ungrammatical sentence may often have multiple references. However, we
argue that even though this is a very reasonable hypothesis, it is too harsh
for the intelligence of the mainstream models in this era. In this paper, we
first discover that multiple references do not actually bring positive gains to
model training. On the contrary, it is beneficial to the CGEC model if the
model can pay attention to small but essential data during the training
process. Furthermore, we propose a simple yet effective training strategy
called OneTarget to improve the focus ability of the CGEC models and thus
improve the CGEC performance. Extensive experiments and detailed analyses
demonstrate the correctness of our discovery and the effectiveness of our
proposed method.Comment: Submitted to ICASSP2023 (currently under review
Contextual Similarity is More Valuable than Character Similarity: Curriculum Learning for Chinese Spell Checking
Chinese Spell Checking (CSC) task aims to detect and correct Chinese spelling
errors. In recent years, related researches focus on introducing the character
similarity from confusion set to enhance the CSC models, ignoring the context
of characters that contain richer information. To make better use of contextual
similarity, we propose a simple yet effective curriculum learning framework for
the CSC task. With the help of our designed model-agnostic framework, existing
CSC models will be trained from easy to difficult as humans learn Chinese
characters and achieve further performance improvements. Extensive experiments
and detailed analyses on widely used SIGHAN datasets show that our method
outperforms previous state-of-the-art methods
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
With the continuous evolution and refinement of LLMs, they are endowed with
impressive logical reasoning or vertical thinking capabilities. But can they
think out of the box? Do they possess proficient lateral thinking abilities?
Following the setup of Lateral Thinking Puzzles, we propose a novel evaluation
benchmark, LatEval, which assesses the model's lateral thinking within an
interactive framework. In our benchmark, we challenge LLMs with 2 aspects: the
quality of questions posed by the model and the model's capability to integrate
information for problem-solving. We find that nearly all LLMs struggle with
employing lateral thinking during interactions. For example, even the most
advanced model, GPT-4, exhibits the advantage to some extent, yet still
maintain a noticeable gap when compared to human. This evaluation benchmark
provides LLMs with a highly challenging and distinctive task that is crucial to
an effective AI assistant.Comment: Work in progres
Modelling other agents through evolutionary behaviours
Modelling other agents is a challenging topic in artificial intelligence research particularly when a subject agent needs to optimise its own decisions by predicting their behaviours under uncertainty. Existing research often leads to a monotonic set of behaviours for other agents so that a subject agent can not cope with unexpected decisions from the other agents. It requires creative ideas about developing diversity of behaviours so as to improve the subject agent’s decision quality. In this paper, we resort to evolutionary computation approaches to generate a new set of behaviours for other agents and solve the complicated agents’ behaviour search and evaluation issues. The new approach starts with the initial behaviours that are ascribed to the other agents and expands the behaviours by using a number of genetic operators in the behaviour evolution. This is the first time that evolutionary techniques are used to modelling other agents in a general multiagent decision framework. We examine the new methods in two well-studied problem domains and provide experimental results in support
- …