76 research outputs found

    Code Generation as a Dual Task of Code Summarization

    Full text link
    Code summarization (CS) and code generation (CG) are two crucial tasks in the field of automatic software development. Various neural network-based approaches are proposed to solve these two tasks separately. However, there exists a specific intuitive correlation between CS and CG, which have not been exploited in previous work. In this paper, we apply the relations between two tasks to improve the performance of both tasks. In other words, exploiting the duality between the two tasks, we propose a dual training framework to train the two tasks simultaneously. In this framework, we consider the dualities on probability and attention weights, and design corresponding regularization terms to constrain the duality. We evaluate our approach on two datasets collected from GitHub, and experimental results show that our dual framework can improve the performance of CS and CG tasks over baselines.Comment: To appear at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 201

    Generation of a recombinant rabies Flury LEP virus carrying an additional G gene creates an improved seed virus for inactivated vaccine production

    Get PDF
    The rabies Flury Low Egg Passage virus (LEP) has been widely used as a seed virus to generate inactive vaccine. Here, we established a reverse genetic system for LEP and generated a recombinant LEP virus (rLEP-G) that carries two identical G genes. This recombinant virus showed similar properties to those of LEP with respect to in vitro growth, neurotropism index, and virulence in mice. rLEP-G produced 4.3-fold more G protein than did LEP in BHK-21 cells. The inactivated vaccine generated from rLEP-G induced significantly higher virus neutralization titers in mice and dogs than those produced in response to LEP-derived vaccine. Our results suggest that rLEP-G is an improved seed virus candidate for inactivated rabies virus vaccine manufacture

    Data-Juicer: A One-Stop Data Processing System for Large Language Models

    Full text link
    The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, diverse, and high-quality data. Despite this, existing open-source tools for LLM data processing remain limited and mostly tailored to specific datasets, with an emphasis on the reproducibility of released data over adaptability and usability, inhibiting potential applications. In response, we propose a one-stop, powerful yet flexible and user-friendly LLM data processing system named Data-Juicer. Our system offers over 50 built-in versatile operators and pluggable tools, which synergize modularity, composability, and extensibility dedicated to diverse LLM data processing needs. By incorporating visualized and automatic evaluation capabilities, Data-Juicer enables a timely feedback loop to accelerate data processing and gain data insights. To enhance usability, Data-Juicer provides out-of-the-box components for users with various backgrounds, and fruitful data recipes for LLM pre-training and post-tuning usages. Further, we employ multi-facet system optimization and seamlessly integrate Data-Juicer with both LLM and distributed computing ecosystems, to enable efficient and scalable data processing. Empirical validation of the generated data recipes reveals considerable improvements in LLaMA performance for various pre-training and post-tuning cases, demonstrating up to 7.45% relative improvement of averaged score across 16 LLM benchmarks and 16.25% higher win rate using pair-wise GPT-4 evaluation. The system's efficiency and scalability are also validated, supported by up to 88.7% reduction in single-machine processing time, 77.1% and 73.1% less memory and CPU usage respectively, and 7.91x processing acceleration when utilizing distributed computing ecosystems. Our system, data recipes, and multiple tutorial demos are released, calling for broader research centered on LLM data.Comment: Under continuous maintenance and updating; The system, refined data recipes, and demos are at https://github.com/alibaba/data-juice

    Developmental phosphoproteomics identifies the kinase CK2 as a driver of Hedgehog signaling and a therapeutic target in medulloblastoma

    Get PDF
    A major limitation of targeted cancer therapy is the rapid emergence of drug resistance, which often arises through mutations at or downstream of the drug target or through intrinsic resistance of subpopulations of tumor cells. Medulloblastoma (MB), the most common pediatric brain tumor, is no exception, and MBs that are driven by sonic hedgehog (SHH) signaling are particularly aggressive and drug-resistant. To find new drug targets and therapeutics for MB that may be less susceptible to common resistance mechanisms, we used a developmental phosphoproteomics approach in murine granule neuron precursors (GNPs), the developmental cell of origin of MB. The protein kinase CK2 emerged as a driver of hundreds of phosphorylation events during the proliferative, MB-like stage of GNP growth, including the phosphorylation of three of the eight proteins commonly amplified in MB. CK2 was critical to the stabilization and activity of the transcription factor GLI2, a late downstream effector in SHH signaling. CK2 inhibitors decreased the viability of primary SHH-type MB patient cells in culture and blocked the growth of murine MB tumors that were resistant to currently available Hh inhibitors, thereby extending the survival of tumor-bearing mice. Because of structural interactions, one CK2 inhibitor (CX-4945) inhibited both wild-type and mutant CK2, indicating that this drug may avoid at least one common mode of acquired resistance. These findings suggest that CK2 inhibitors may be effective for treating patients with MB and show how phosphoproteomics may be used to gain insight into developmental biology and pathology

    Herd-level risk factors associated with Leptospira Hardjo seroprevalence in Beef/Suckler herds in the Republic of Ireland

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The aim of the present study was to investigate risk factors for herd seropositivity to <it>Leptospira </it>Hardjo in Irish suckler herds. Herds were considered eligible for the study if they were unvaccinated and contained ≄ 9 breeding animals of beef breed which were ≄ 12 months of age. The country was divided into six regions using county boundaries. Herd and individual animal prevalence data were available from the results of a concurrent seroprevalence study. Herds were classified as either "Free from Infection" or "Infected" based on a minimum expected 40% within-herd prevalence.</p> <p>Questionnaires were posted to 320 farmers chosen randomly from 6 regions, encompassing 25 counties, of the Republic of Ireland. The questionnaire was designed to obtain information about vaccination; reproductive disease; breeding herd details; the presence of recognized risk factors from previous studies; and husbandry on each farm. Data collected from 128 eligible herds were subjected to statistical analysis.</p> <p>Results</p> <p>Following the use of Pearson's Chi-Square Test, those variables associated with a herd being "infected" with a significance level of P < 0.2 were considered as candidates for multivariable logistic regression modelling. Breeding herd size was found to be a statistically significant risk factor after multivariable logistic regression. The odds of a herd being positive for leptospiral infection were 5.47 times higher (P = 0.032) in herds with 14 to 23 breeding animals compared with herds with ≀ 13 breeding animals, adjusting for Region, and 7.08 times higher (P = 0.033) in herds with 32.6 to 142 breeding animals.</p> <p>Conclusions</p> <p>Breeding herd size was identified as a significant risk factor for leptospiral infection in Irish suckler herds, which was similar to findings of previous studies of leptospirosis in dairy herds.</p

    Cyclical changes in seroprevalence of leptospirosis in California sea lions: endemic and epidemic disease in one host species?

    Get PDF
    BackgroundLeptospirosis is a zoonotic disease infecting a broad range of mammalian hosts, and is re-emerging globally. California sea lions (Zalophus californianus) have experienced recurrent outbreaks of leptospirosis since 1970, but it is unknown whether the pathogen persists in the sea lion population or is introduced repeatedly from external reservoirs.MethodsWe analyzed serum samples collected over an 11-year period from 1344 California sea lions that stranded alive on the California coast, using the microscopic agglutination test (MAT) for antibodies to Leptospira interrogans serovar Pomona. We evaluated seroprevalence among yearlings as a measure of incidence in the population, and characterized antibody persistence times based on temporal changes in the distribution of titer scores. We conducted multinomial logistic regression to determine individual risk factors for seropositivity with high and low titers.ResultsThe serosurvey revealed cyclical patterns in seroprevalence to L. interrogans serovar Pomona, with 4-5 year periodicity and peak seroprevalence above 50%. Seroprevalence in yearling sea lions was an accurate index of exposure among all age classses, and indicated on-going exposure to leptospires in non-outbreak years. Analysis of titer decay rates showed that some individuals probably maintain high titers for more than a year following exposure.ConclusionThis study presents results of an unprecedented long-term serosurveillance program in marine mammals. Our results suggest that leptospirosis is endemic in California sea lions, but also causes periodic epidemics of acute disease. The findings call into question the classical dichotomy between maintenance hosts of leptospirosis, which experience chronic but largely asymptomatic infections, and accidental hosts, which suffer acute illness or death as a result of disease spillover from reservoir species

    Mother-male bond, but not paternity, influences male-infant affiliation in wild crested macaques

    Get PDF
    In promiscuous primates, interactions between adult males and infants have rarely been investigated. However, recent evidence suggests that male affiliation towards infants has an influence on several aspects of the infants’ life. Furthermore, affiliations may be associated with male reproductive strategy. In this study, we examined which social factors influenced male-infant affiliation initiated by either male or infant, in wild crested macaques (Macaca nigra). We combined behavioral data and genetic paternity analysis from 30 infants living in three wild groups in Tangkoko Reserve, Indonesia. Our results indicate that adult males and infants do not interact at random, but rather form preferential associations. The social factors with the highest influence on infant-initiated interactions were male rank and male association with the infant’s mother. While infants initiated affiliations with males more often in the absence of their mothers, adult males initiated more affiliations with infants when their mothers were present. Furthermore, males initiated affiliations more often when they were in the same group at the time the infant was conceived, when they held a high dominance rank or when they had a close relationship with the mother. Interestingly, paternity did not affect male-infant affiliation despite being highly skewed in this species. Overall, our results suggest that adult males potentially associate with an infant to secure future mating with the mother. Infants are more likely to associate with a male to receive better support, suggesting a strategy to increase the chance of infant survival in a primate society with high infant mortality
    • 

    corecore