127 research outputs found

    Generative Type Inference for Python

    Full text link
    Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. The rule-based type inference approaches can ensure the accuracy of predicted variable types, but they suffer from low coverage problems. Supervised type inference approaches, while feature-agnostic, require large, high-quality annotated datasets and are limited to pre-defined types. As zero-shot approaches, the cloze-style approaches reformulate the type inference problem into a fill-in-the-blank problem. However, their performance is limited. This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs), enabling language models to learn from how static analysis infers types. By combining COT prompts with code slices and type hints, TypeGen constructs example prompts from human annotations. TypeGen only requires very few annotated examples to teach language models to generate similar COT prompts via in-context learning. Moreover, TypeGen enhances the interpretability of results through the use of the input-explanation-output strategy. Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match by using only five examples. Furthermore, TypeGen achieves substantial improvements of 27% to 84% compared to the zero-shot performance of large language models with parameter sizes ranging from 1.3B to 175B in terms of top-1 Exact Match.Comment: This paper has been accepted by ASE'2

    CARBON: A Counterfactual Reasoning based Framework for Neural Code Comprehension Debiasing

    Full text link
    Previous studies have demonstrated that code intelligence models are sensitive to program transformation among which identifier renaming is particularly easy to apply and effective. By simply renaming one identifier in source code, the models would output completely different results. The prior research generally mitigates the problem by generating more training samples. Such an approach is less than ideal since its effectiveness depends on the quantity and quality of the generated samples. Different from these studies, we are devoted to adjusting models for explicitly distinguishing the influence of identifier names on the results, called naming bias in this paper, and thereby making the models robust to identifier renaming. Specifically, we formulate the naming bias with a structural causal model (SCM), and propose a counterfactual reasoning based framework named CARBON for eliminating the naming bias in neural code comprehension. CARBON explicitly captures the naming bias through multi-task learning in the training stage, and reduces the bias by counterfactual inference in the inference stage. We evaluate CARBON on three neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CARBON achieves relatively better performance (e.g., +0.5% on the function naming task at F1 score) than the baseline models on the original benchmark datasets, and significantly improvement (e.g., +37.9% on the function naming task at F1 score) on the datasets with identifiers renamed. The proposed framework provides a causal view for improving the robustness of code intelligence models

    When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

    Full text link
    Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection. In this paper, we focus on the Positive and Unlabeled (PU) learning problem for vulnerability detection and propose a novel model named PILOT, i.e., PositIve and unlabeled Learning mOdel for vulnerability deTection. PILOT only learns from positive and unlabeled data for vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations.Comment: This paper is accepted by ASE 202

    On the Feasibility of Specialized Ability Stealing for Large Language Code Models

    Full text link
    Recent progress in large language code models (LLCMs) has led to a dramatic surge in the use of software development. Nevertheless, it is widely known that training a well-performed LLCM requires a plethora of workforce for collecting the data and high quality annotation. Additionally, the training dataset may be proprietary (or partially open source to the public), and the training process is often conducted on a large-scale cluster of GPUs with high costs. Inspired by the recent success of imitation attacks in stealing computer vision and natural language models, this work launches the first imitation attack on LLCMs: by querying a target LLCM with carefully-designed queries and collecting the outputs, the adversary can train an imitation model that manifests close behavior with the target LLCM. We systematically investigate the effectiveness of launching imitation attacks under different query schemes and different LLCM tasks. We also design novel methods to polish the LLCM outputs, resulting in an effective imitation training process. We summarize our findings and provide lessons harvested in this study that can help better depict the attack surface of LLCMs. Our research contributes to the growing body of knowledge on imitation attacks and defenses in deep neural models, particularly in the domain of code related tasks.Comment: 11 page

    VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models

    Full text link
    With recent advancements in Large Multimodal Models (LMMs) across various domains, a novel prompting method called visual referring prompting has emerged, showing significant potential in enhancing human-computer interaction within multimodal systems. This method offers a more natural and flexible approach to human interaction with these systems compared to traditional text descriptions or coordinates. However, the categorization of visual referring prompting remains undefined, and its impact on the performance of LMMs has yet to be formally examined. In this study, we conduct the first comprehensive analysis of LMMs using a variety of visual referring prompting strategies. We introduce a benchmark dataset called VRPTEST, comprising 3 different visual tasks and 2,275 images, spanning diverse combinations of prompt strategies. Using VRPTEST, we conduct a comprehensive evaluation of eight versions of prominent open-source and proprietary foundation models, including two early versions of GPT-4V. We develop an automated assessment framework based on software metamorphic testing techniques to evaluate the accuracy of LMMs without the need for human intervention or manual labeling. We find that the current proprietary models generally outperform the open-source ones, showing an average accuracy improvement of 22.70%; however, there is still potential for improvement. Moreover, our quantitative analysis shows that the choice of prompt strategy significantly affects the accuracy of LMMs, with variations ranging from -17.5% to +7.3%. Further case studies indicate that an appropriate visual referring prompting strategy can improve LMMs' understanding of context and location information, while an unsuitable one might lead to answer rejection. We also provide insights on minimizing the negative impact of visual referring prompting on LMMs.Comment: 13 page

    Effects of SOCS3 on the development of colon cancer via regulation of HIF-1α

    Get PDF
    Purpose: To investigate the influence of suppressor of cytokine signaling 3 (SOCS3) on rats with colon cancer (CC). Methods: Sprague-Dawley (SD) rats were randomly divided into CC group and control group, and then CC rat model was constructed. The expression of SOCS3 in CC tissues was determined by quantitative real time-polymerase chain reaction (qRT-PCR). Hematoxylin-eosin staining (H&E) was used to examine colon tissue morphology. Immunohistochemistry (IHC) staining assay was performed to determine the expression of SOCS3 protein in colon tissues. The contents of HIF-1α, phosphorylated phosphatidylinositol 3-hydroxy kinase (p-PI3K), and phosphorylated protein kinase B (p-AKT) proteins were determined by Western blotting (WB). Results: Compared with that in the control group, the number of tumors in CC group was significantly increased (p < 0.05). 2). On the other hand, protein and message ribonucleic acid (mRNA) expressions of SOCS3 were down-regulated in CC group (p < 0.05). 3), while protein expressions of p-PI3K, p-AKT and HIF-1α were raised in CC group (p < 0.05). Conclusion: SOCS3 is lowly expressed in CC rats, and promotes the expression of HIF-1α by activating PI3K/AKT signaling pathway. Thus, SOCS3 provides a therapeutic strategy for the management of colon cancer. Keywords: Colon cancer; Suppressor of cytokine signaling protein 3 (SOCS3); Hypoxia inducible factor-1

    Effects of suppressor of cytokine signaling 3 (SOCS3) on the development of colon cancer via regulation of HIF-1α

    Get PDF
    Purpose: To investigate the influence of suppressor of cytokine signaling 3 (SOCS3) on rats with colon cancer (CC). Methods: Sprague-Dawley (SD) rats were randomly divided into CC group and control group. CC models were constructed. The expression of SOCS3 in CC tissues was determined by quantitative real time-polymerase chain reaction (qRT-PCR). Hematoxylin-eosin staining (H&E) was used to examine colon tissue morphology, while immunohistochemistry (IHC) staining assay was performed to determine the expression of SOCS3 protein in colon tissues. The content of HIF-1α, phosphorylated phosphatidylinositol 3-hydroxy kinase (p-PI3K), and phosphorylated protein kinase B (p-AKT) proteins was determined by Western blotting (WB). Results: Compared with that in the control group, the number of tumors in the CC group was significantly increased (p < 0.05). Protein and messenger ribonucleic acid (mRNA) expressions of SOCS3 were down-regulated in CC group (p < 0.05), while protein expressions of p-PI3K, p-AKT and HIF-1α were significantly elevated in CC group (p < 0.05). Conclusion: SOCS3 is poorly expressed in CC rats, and promotes the expression of HIF-1α by activating PI3K/AKT signaling pathway. The findings, thus, provide a probable strategy for management of colon cancer

    Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling

    Full text link
    Transformer-based models have achieved great success on sentence pair modeling tasks, such as answer selection and natural language inference (NLI). These models generally perform cross-attention over input pairs, leading to prohibitive computational costs. Recent studies propose dual-encoder and late interaction architectures for faster computation. However, the balance between the expressive of cross-attention and computation speedup still needs better coordinated. To this end, this paper introduces a novel paradigm MixEncoder for efficient sentence pair modeling. MixEncoder involves a light-weight cross-attention mechanism. It conducts query encoding only once while modeling the query-candidate interaction in parallel. Extensive experiments conducted on four tasks demonstrate that our MixEncoder can speed up sentence pairing by over 113x while achieving comparable performance as the more expensive cross-attention models.Comment: Accepted to EMNLP 202

    Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem

    Full text link
    Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries within the PyPI ecosystem. Nevertheless, the utilization of third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. Moreover, endeavors have been made to automatically infer dependencies. These approaches focus on version-level checks and inference, based on the assumption that configurations of libraries in the PyPI ecosystem are correct. However, our study reveals that this assumption is not universally valid, and relying solely on version-level checks proves inadequate in ensuring compatible run-time environments. In this paper, we conduct an empirical study to comprehensively study the configuration issues in the PyPI ecosystem. Specifically, we propose PyCon, a source-level detector, for detecting potential configuration issues. PyCon employs three distinct checks, targeting the setup, packing, and usage stages of libraries, respectively. To evaluate the effectiveness of the current automatic dependency inference approaches, we build a benchmark called VLibs, comprising library releases that pass all three checks of PyCon. We identify 15 kinds of configuration issues and find that 183,864 library releases suffer from potential configuration issues. Remarkably, 68% of these issues can only be detected via the source-level check. Our experiment results show that the most advanced automatic dependency inference approach, PyEGo, can successfully infer dependencies for only 65% of library releases. The primary failures stem from dependency conflicts and the absence of required libraries in the generated configurations. Based on the empirical results, we derive six findings and draw two implications for open-source developers and future research in automatic dependency inference.Comment: This paper has been accepted by ICSE 202

    What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?

    Full text link
    Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectivelyComment: This paper is accepted by ASE 202
    corecore