123 research outputs found

    MC^2: A Multilingual Corpus of Minority Languages in China

    Full text link
    Large-scale corpora play a vital role in the construction of large language models (LLMs). However, existing LLMs exhibit limited abilities in understanding low-resource languages, including the minority languages in China, due to a lack of training data. To improve the accessibility of these languages, we present MC^2, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus so far. It encompasses four underrepresented languages, i.e., Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script. Notably, two writing systems in MC^2 are long neglected in previous corpora. As we identify serious contamination in the low-resource language split in the existing multilingual corpora, we propose a quality-centric solution for collecting MC^2, prioritizing quality and accuracy while enhancing representativeness and diversity. By in-depth analysis, we demonstrate the new research challenges MC^2 brings, such as long-text modeling and multiplicity of writing systems. We hope MC^2 can help enhance the equity of the underrepresented languages in China and provide a reliable data foundation for further research on low-resource languages.Comment: Work in progres

    SPar: estimating stellar parameters from multi-band photometries with empirical stellar libraries

    Full text link
    Modern large-scale photometric surveys have provided us with multi-band photometries of billions of stars. Determining the stellar atmospheric parameters, such as the effective temperature (\teff) and metallicities (\feh), absolute magnitudes (MGM_{G}), distances (dd) and reddening values (\ebr) is fundamental to study the stellar populations, structure, kinematics and chemistry of the Galaxy. This work constructed an empirical stellar library which maps the stellar parameters to multi-band photometries from a dataset with Gaia parallaxes, LAMOST atmospheric parameters, and optical to near-infrared photometry from several photometric surveys. Based on the stellar library, we developed a new algorithm, SPar (\textbf{S}tellar \textbf{P}arameters from multib\textbf{a}nd photomet\textbf{r}y), which fits the multi-band stellar photometries to derive the stellar parameters (\teff, \feh, MGM_G, dd and \ebr) of the individual stars. The algorithm is applied to the multi-band photometric measurements of a sample of stars selected from the SMSS survey, which have stellar parameters derived from the spectroscopic surveys. The stellar parameters derived from multi-band photometries by our algorithm are in good agreement with those from the spectroscopic surveys. The typical differences between our results and the literature values are 170\,K for \teff, 0.23\,dex for \feh, 0.13\,mag for MGM_G and 0.05\,mag for \ebr. The algorithm proved to be robust and effective and will be applied to the data of future large-scale photometric surveys such as the Mephisto and CSST surveys.Comment: 16 pages, 10 figures, Accepted by The Astronomical Journal on 7/8/202

    Lawyer LLaMA Technical Report

    Full text link
    Large Language Models (LLMs), like LLaMA, have exhibited remarkable performances across various tasks. Nevertheless, when deployed to specific domains such as law or medicine, the models still confront the challenge of a deficiency in domain-specific knowledge and an inadequate capability to leverage that knowledge to resolve domain-related problems. In this paper, we focus on the legal domain and explore how to inject domain knowledge during the continual training stage and how to design proper supervised finetune tasks to help the model tackle practical issues. Moreover, to alleviate the hallucination problem during model's generation, we add a retrieval module and extract relevant articles before the model answers any queries. Augmenting with the extracted evidence, our model could generate more reliable responses. We release our data and model at https://github.com/AndrewZhe/lawyer-llama.Comment: Work in progres

    Nocaviogua A and B: two lipolanthines from root-nodule-associated Nocardia sp.

    Get PDF
    Nocaviogua A (1) and B (2), two lipolanthines featuring a non-canonical avionin (Avi)-containing macrocycle and a long acyl chain, were identified from the mutualistic actinomycete Nocardia sp. XZ19_369, which was isolated from the nodules of sea buckthorn collected in Tibet. Their planar structures were elucidated via extensive analyses of 1D and 2D NMR, as well as HRMS data. The absolute configurations were fully elucidated by advanced Marfey’s analysis and GIAO NMR calculations, representing the first time that the configurations of this family of lipolanthines have been determined. Nocaviogua A (1) exhibited weak cytotoxicity against human chronic uveal melanoma cells (UM92-1), non-small cell lung cancer (NCI-H2170), and breast cancer (MDA-MB-231). Our work provides valuable information on this burgeoning class of lipolanthines for further investigations

    Trading strategies of institutional investors in a limit order book market

    No full text
    The study aims to examine the trading strategies of institutional investors in limit order book market. The study modifies assumptions of prior studies [1,2] to match actual situations or facilitate calculations. First, to match actual situations or facilitate calculations. First, the investors’ objective in the study is profit maximization rather than minimization of trading costs. Second, time is continuous rather than discrete. Third, price impact functions are non-linear and take the quadratic form that features increasing prices. Study results indicate that institutional investors adopt the increasing trading strategy if the permanent price impact dominate whereas they adopt the decreasing trading strategy if the transient price impact dominates. In addition, the average trading strategy is adopted if and only if the permanent and transient price impacts are combined in some fixed proportions

    Trading strategies of institutional investors in a limit order book market

    No full text
    The study aims to examine the trading strategies of institutional investors in limit order book market. The study modifies assumptions of prior studies [1,2] to match actual situations or facilitate calculations. First, to match actual situations or facilitate calculations. First, the investors’ objective in the study is profit maximization rather than minimization of trading costs. Second, time is continuous rather than discrete. Third, price impact functions are non-linear and take the quadratic form that features increasing prices. Study results indicate that institutional investors adopt the increasing trading strategy if the permanent price impact dominate whereas they adopt the decreasing trading strategy if the transient price impact dominates. In addition, the average trading strategy is adopted if and only if the permanent and transient price impacts are combined in some fixed proportions

    Microstructure Evolution and Shear Strength of the Cu/Au80Sn20/Cu Solder Joints with Multiple Reflow Temperatures

    No full text
    In order to present the multiple reflow process during electronic packaging, the influence of the different short-time reheating temperatures on the microstructure and shear strength of the Cu/Au80Sn20/Cu solder joints was studied and discussed. The results showed that high-quality Cu/Au80Sn20/Cu solder joints were obtained with 30 °C for 3 min. The joints were mainly composed of the ζ-(Au,Cu)5Sn intermetallic compound (IMC) with an average thickness of 8 μm between Cu and solder matrix, and (ζ-(Au,Cu)5Sn +δ-(Au,Cu)Sn) eutectic structure in the solder matrix. With an increase in the multiple reflow temperature from 180 °C to 250 °C, the microstructure of the joint interface showed little change due to the barrier effect of the formed ζ IMC layer and the limitation of short-time reheating on the element diffusion. The eutectic structures in the solder matrix were coarsened and transformed from lamellar to the bulk morphology. The shear strength of the as-welded joint reached 31.5 MPa. The joint shear strength decreased slightly with reheating temperatures lower than 200 °C, while it decreased significantly (by about 10%) with reheating temperatures above 250 °C compared to the as-welded joint. The shear strength of the joints was determined by the brittle solder matrix, showing that the joint strength decreased with the coarsening of the δ phase in the eutectic structure
    • …
    corecore