322 research outputs found
ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model
During the development of large language models (LLMs), the scale and quality
of the pre-training data play a crucial role in shaping LLMs' capabilities. To
accelerate the research of LLMs, several large-scale datasets, such as C4 [1],
Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public.
However, most of the released corpus focus mainly on English, and there is
still lack of complete tool-chain for extracting clean texts from web data.
Furthermore, fine-grained information of the corpus, e.g. the quality of each
text, is missing. To address these challenges, we propose in this paper a new
complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data.
First, similar to previous work, manually crafted rules are employed to discard
explicit noisy texts from the raw crawled web contents. Second, a well-designed
evaluation model is leveraged to assess the remaining relatively clean data,
and each text is assigned a specific quality score. Finally, we can easily
utilize an appropriate threshold to select the high-quality pre-training data
for Chinese. Using our proposed approach, we release the largest and latest
large-scale high-quality Chinese web text ChineseWebText, which consists of
1.42 TB and each text is associated with a quality score, facilitating the LLM
researchers to choose the data according to the desired quality thresholds. We
also release a much cleaner subset of 600 GB Chinese data with the quality
exceeding 90%
An Updated Search of Steady TeV Ray Point Sources in Northern Hemisphere Using the Tibet Air Shower Array
Using the data taken from Tibet II High Density (HD) Array (1997
February-1999 September) and Tibet-III array (1999 November-2005 November), our
previous northern sky survey for TeV ray point sources has now been
updated by a factor of 2.8 improved statistics. From to
in declination (Dec) range, no new TeV ray point
sources with sufficiently high significance were identified while the
well-known Crab Nebula and Mrk421 remain to be the brightest TeV ray
sources within the field of view of the Tibet air shower array. Based on the
currently available data and at the 90% confidence level (C.L.), the flux upper
limits for different power law index assumption are re-derived, which are
approximately improved by 1.7 times as compared with our previous reported
limits.Comment: This paper has been accepted by hepn
Structural and Functional Diversity of Acidic Scorpion Potassium Channel Toxins
Background: Although the basic scorpion K + channel toxins (KTxs) are well-known pharmacological tools and potential drug candidates, characterization the acidic KTxs still has the great significance for their potential selectivity towards different K + channel subtypes. Unfortunately, research on the acidic KTxs has been ignored for several years and progressed slowly. Principal Findings: Here, we describe the identification of nine new acidic KTxs by cDNA cloning and bioinformatic analyses. Seven of these toxins belong to three new a-KTx subfamilies (a-KTx28, a-KTx29, and a-KTx30), and two are new members of the known k-KTx2 subfamily. ImKTx104 containing three disulfide bridges, the first member of the a-KTx28 subfamily, has a low sequence homology with other known KTxs, and its NMR structure suggests ImKTx104 adopts a modified cystine-stabilized a-helix-loop-b-sheet (CS-a/b) fold motif that has no apparent a-helixs and b-sheets, but still stabilized by three disulfide bridges. These newly described acidic KTxs exhibit differential pharmacological effects on potassium channels. Acidic scorpion toxin ImKTx104 was the first peptide inhibitor found to affect KCNQ1 channel, which is insensitive to the basic KTxs and is strongly associated with human cardiac abnormalities. ImKTx104 selectively inhibited KCNQ1 channel with a Kd of 11.69 mM, but was less effective against the basic KTxs-sensitive potassium channels. In addition to the ImKTx104 toxin, HeTx204 peptide, containing a cystine-stabilized a-helix-loop-helix (CS-a/a) fold scaffold motif
Aggregation-Induced Emission (AIE), Life and Health
Light has profoundly impacted modern medicine and healthcare, with numerous luminescent agents and imaging techniques currently being used to assess health and treat diseases. As an emerging concept in luminescence, aggregation-induced emission (AIE) has shown great potential in biological applications due to its advantages in terms of brightness, biocompatibility, photostability, and positive correlation with concentration. This review provides a comprehensive summary of AIE luminogens applied in imaging of biological structure and dynamic physiological processes, disease diagnosis and treatment, and detection and monitoring of specific analytes, followed by representative works. Discussions on critical issues and perspectives on future directions are also included. This review aims to stimulate the interest of researchers from different fields, including chemistry, biology, materials science, medicine, etc., thus promoting the development of AIE in the fields of life and health
Corrigendum to: The TianQin project: current progress on science and technology
In the originally published version, this manuscript included an error related to indicating the corresponding author within the author list. This has now been corrected online to reflect the fact that author Jun Luo is the corresponding author of the article
Potential of Core-Collapse Supernova Neutrino Detection at JUNO
JUNO is an underground neutrino observatory under construction in Jiangmen, China. It uses 20kton liquid scintillator as target, which enables it to detect supernova burst neutrinos of a large statistics for the next galactic core-collapse supernova (CCSN) and also pre-supernova neutrinos from the nearby CCSN progenitors. All flavors of supernova burst neutrinos can be detected by JUNO via several interaction channels, including inverse beta decay, elastic scattering on electron and proton, interactions on C12 nuclei, etc. This retains the possibility for JUNO to reconstruct the energy spectra of supernova burst neutrinos of all flavors. The real time monitoring systems based on FPGA and DAQ are under development in JUNO, which allow prompt alert and trigger-less data acquisition of CCSN events. The alert performances of both monitoring systems have been thoroughly studied using simulations. Moreover, once a CCSN is tagged, the system can give fast characterizations, such as directionality and light curve
Detection of the Diffuse Supernova Neutrino Background with JUNO
As an underground multi-purpose neutrino detector with 20 kton liquid scintillator, Jiangmen Underground Neutrino Observatory (JUNO) is competitive with and complementary to the water-Cherenkov detectors on the search for the diffuse supernova neutrino background (DSNB). Typical supernova models predict 2-4 events per year within the optimal observation window in the JUNO detector. The dominant background is from the neutral-current (NC) interaction of atmospheric neutrinos with 12C nuclei, which surpasses the DSNB by more than one order of magnitude. We evaluated the systematic uncertainty of NC background from the spread of a variety of data-driven models and further developed a method to determine NC background within 15\% with {\it{in}} {\it{situ}} measurements after ten years of running. Besides, the NC-like backgrounds can be effectively suppressed by the intrinsic pulse-shape discrimination (PSD) capabilities of liquid scintillators. In this talk, I will present in detail the improvements on NC background uncertainty evaluation, PSD discriminator development, and finally, the potential of DSNB sensitivity in JUNO
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
- …