24 research outputs found
Counting hypergraph matchings up to uniqueness threshold
We study the problem of approximately counting matchings in hypergraphs of
bounded maximum degree and maximum size of hyperedges. With an activity
parameter , each matching is assigned a weight .
The counting problem is formulated as computing a partition function that gives
the sum of the weights of all matchings in a hypergraph. This problem unifies
two extensively studied statistical physics models in approximate counting: the
hardcore model (graph independent sets) and the monomer-dimer model (graph
matchings).
For this model, the critical activity
is the threshold for the uniqueness of Gibbs measures on the infinite
-uniform -regular hypertree. Consider hypergraphs of maximum
degree at most and maximum size of hyperedges at most . We show that
when , there is an FPTAS for computing the partition
function; and when , there is a PTAS for computing the
log-partition function. These algorithms are based on the decay of correlation
(strong spatial mixing) property of Gibbs distributions. When , there is no PRAS for the partition function or the log-partition
function unless NPRP.
Towards obtaining a sharp transition of computational complexity of
approximate counting, we study the local convergence from a sequence of finite
hypergraphs to the infinite lattice with specified symmetry. We show a
surprising connection between the local convergence and the reversibility of a
natural random walk. This leads us to a barrier for the hardness result: The
non-uniqueness of infinite Gibbs measure is not realizable by any finite
gadgets
Code Prediction by Feeding Trees to Transformers
We advance the state-of-the-art in the accuracy of code prediction (next
token prediction) used in autocomplete systems. First, we report that using the
recently proposed Transformer architecture even out-of-the-box outperforms
previous neural and non-neural systems for code prediction. We then show that
by making the Transformer architecture aware of the syntactic structure of
code, we further increase the margin by which a Transformer-based system
outperforms previous systems. With this, it outperforms the accuracy of an
RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3
system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et
al., 2018) for code prediction by 14.4\%.
We present in the paper several ways of communicating the code structure to
the Transformer, which is fundamentally built for processing sequence data. We
provide a comprehensive experimental evaluation of our proposal, along with
alternative design choices, on a standard Python dataset, as well as on a
Facebook internal Python corpus. Our code and data preparation pipeline will be
available in open source
Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion
Pretrained code language models have enabled great progress towards program
synthesis. However, common approaches only consider in-file local context and
thus miss information and constraints imposed by other parts of the codebase
and its external dependencies. Existing code completion benchmarks also lack
such context. To resolve these restrictions we curate a new dataset of
permissively licensed Python packages that includes full projects and their
dependencies and provide tools to extract non-local information with the help
of program analyzers. We then focus on the task of function call argument
completion which requires predicting the arguments to function calls. We show
that existing code completion models do not yield good results on our
completion task. To better solve this task, we query a program analyzer for
information relevant to a given function call, and consider ways to provide the
analyzer results to different code completion models during inference and
training. Our experiments show that providing access to the function
implementation and function usages greatly improves the argument completion
performance. Our ablation study provides further insights on how different
types of information available from the program analyzer and different ways of
incorporating the information affect the model performance.Comment: 12 pages. Accepted to AAAI 202
Large Language Models of Code Fail at Completing Code with Potential Bugs
Large language models of code (Code-LLMs) have recently brought tremendous
advances to code completion, a fundamental feature of programming assistance
and code intelligence. However, most existing works ignore the possible
presence of bugs in the code context for generation, which are inevitable in
software development. Therefore, we introduce and study the buggy-code
completion problem, inspired by the realistic scenario of real-time code
suggestion where the code context contains potential bugs -- anti-patterns that
can become bugs in the completed program. To systematically study the task, we
introduce two datasets: one with synthetic bugs derived from semantics-altering
operator changes (buggy-HumanEval) and one with realistic bugs derived from
user submissions to coding problems (buggy-FixEval). We find that the presence
of potential bugs significantly degrades the generation performance of the
high-performing Code-LLMs. For instance, the passing rates of CodeGen-2B-mono
on test cases of buggy-HumanEval drop more than 50% given a single potential
bug in the context. Finally, we investigate several post-hoc methods for
mitigating the adverse effect of potential bugs and find that there remains a
large gap in post-mitigation performance.Comment: 25 page
Structural Realization with GGNNs
To appear in Proceedings of the 15th Workshop on Graph-Based Natural Language Processing (TextGraphs-15), 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.In this paper, we define an abstract task called structural realization that generates words given a prefix of words and a partial representation of a parse tree. We also present a method for solving instances of this task using a Gated Graph Neural Network (GGNN). We evaluate it with standard accuracy measures, as well as with respect to perplexity, in which its comparison to previous work on language modelling serves to quantify the information added to a lexical selection task by the presence of syntactic knowledge. That the addition of parse-tree-internal nodes to this neural model should improve the model, with respect both to accuracy and to more conventional measures such as perplexity, may seem unsurprising, but previous attempts have not met with nearly as much success. We have also learned that transverse links through the parse tree compromise the model's accuracy at generating
adjectival and nominal parts of speech
Effects of Neighborhood Competition and Stand Structure on the Productivity of Pure and Mixed <i>Larix principis-rupprechtii</i> Forests
Understanding the factors influencing tree productivity is central to forest ecology. However, the relative contributions of neighborhood interactions, tree species diversity, and tree size to larch (Larix principis-rupprechtii) productivity require further study. Three plots in the Guandi Mountains, Shanxi Province, were set up for each of the following forest types: natural pure larch forest (PL), mixed larch and birch (Betula platyphylla) forest (LB), and mixed larch and spruce (Picea asperata) forest (LS). Based on the tree size-stratified sampling method, a total of 318 tree core samples were collected. A linear mixed model was used to analyze the effects of tree size, dominance, mixing, and neighborhood competition on larch productivity. Birch and spruce promoted larch growth at the stand and individual tree levels, and birch exhibited a more significant facilitating effect. Intraspecific competition was the main factor affecting larch growth. When the intensity of competition among trees was low, the basal area increment (BAI) of larch in the mixed forests was higher than that in the pure forest. However, with increasing competition, the BAI of larch was lower in the mixed forests than in the pure forest. Factors including tree size, dominance, and mingling were positively correlated with the BAI of larch. With increasing tree size, the BAI of larch was higher in the mixed forests than in the pure forest and higher in LB than in LS. When the dominance was less than 0.5, the BAI of larch was higher in the pure forest than in the mixed forests and higher in LS than in LB. With increasing dominance, the BAI of larch was higher in the mixed forests than in the pure forest. The BAI of larch increased with an increasing mixing degree in the mixed forests, and the increasing trend of BAI was larger in LB than in LS. Larch productivity was influenced mainly by neighborhood interactions and stand structure. Improving neighborhood tree diversity and increasing the large tree proportion and dominance of larch will be helpful for improving larch productivity in mixed forests
Optical-lattice-like waveguide structures in Ti:Sapphire by femtosecond laser inscription for beam splitting
In this work, we report on the fabrication of deeply embedded optical-lattice-like structures in a Ti:Sapphire crystal by applying femtosecond laser inscription (FLI) to implement two-dimensional (2D) one-to-two and three-dimensional (3D) one-to-four beam splitting. Such a family of photonic microstructures is characterized at near-infrared both experimentally and numerically, showing excellent capability of simultaneous light confinement and beam tailoring at two orthogonal polarizations. The confocal micro-Raman image of the obtained structure reveals that the optical properties of the substrate have been well-preserved in the waveguide’s active volumes. Our results pave a way to construct complex integrated waveguide splitters in Ti:Sapphire crystals by using FLI for photonic applications.This work is supported by the National Natural Science Foundation of China (No.11404194 and No. 11404196). Authors acknowledge support from Junta de Castilla y León (Project SA046U16) and MINECO (FIS2015-71933-REDT). Authors would like to thank Prof. Ajoy. K. Kar and Dr. Mark D. Mackenzie from Heriot-Watt University for their help on µ-Raman intensity measurement