141 research outputs found
Crop Phenology Estimation in Rice Fields Using Sentinel-1 GRD SAR Data and Machine Learning-Aided Particle Filtering Approach
Monitoring crop phenology is essential for managing field disasters, protecting the environment, and making decisions about agricultural productivity. Because of its high timeliness, high resolution, great penetration, and sensitivity to specific structural elements, synthetic aperture radar (SAR) is a valuable technique for crop phenology estimation. Particle filtering (PF) belongs to the family of dynamical approach and has the ability to predict crop phenology with SAR data in real time. The observation equation is a key factor affecting the accuracy of particle filtering estimation and depends on fitting. Compared to the common polynomial fitting (POLY), machine learning methods can automatically learn features and handle complex data structures, offering greater flexibility and generalization capabilities. Therefore, incorporating two ensemble learning algorithms consisting of support vector machine regression (SVR), random forest regression (RFR), respectively, we proposed two machine learning-aided particle filtering approaches (PF-SVR, PF-RFR) to estimate crop phenology. One year of time-series Sentinel-1 GRD SAR data in 2017 covering rice fields in Sevilla region in Spain was used for establishing the observation and prediction equations, and the other year of data in 2018 was used for validating the prediction accuracy of PF methods. Four polarization features (VV, VH, VH/VV and Radar Vegetation Index (RVI)) were exploited as the observations in modeling. Experimental results reveals that the machine learning-aided methods are superior than the PF-POLY method. The PF-SVR exhibited better performance than the PF-RFR and PF-POLY methods. The optimal outcome from PF-SVR yielded a root-mean-square error (RMSE) of 7.79, compared to 7.94 for PF-RFR and 9.1 for PF-POLY. Moreover, the results suggest that the RVI is generally more sensitive than other features to crop phenology and the performance of polarization features presented consistent among all methods, i.e., RVI>VV>VH>VH/VV. Our findings offer valuable references for real-time crop phenology monitoring with SAR data
ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
With large language models (LLMs) achieving remarkable breakthroughs in
natural language processing (NLP) domains, LLM-enhanced recommender systems
have received much attention and have been actively explored currently. In this
paper, we focus on adapting and empowering a pure large language model for
zero-shot and few-shot recommendation tasks. First and foremost, we identify
and formulate the lifelong sequential behavior incomprehension problem for LLMs
in recommendation domains, i.e., LLMs fail to extract useful information from a
textual context of long user behavior sequence, even if the length of context
is far from reaching the context limitation of LLMs. To address such an issue
and improve the recommendation performance of LLMs, we propose a novel
framework, namely Retrieval-enhanced Large Language models (ReLLa) for
recommendation tasks in both zero-shot and few-shot settings. For zero-shot
recommendation, we perform semantic user behavior retrieval (SUBR) to improve
the data quality of testing samples, which greatly reduces the difficulty for
LLMs to extract the essential knowledge from user behavior sequences. As for
few-shot recommendation, we further design retrieval-enhanced instruction
tuning (ReiT) by adopting SUBR as a data augmentation technique for training
samples. Specifically, we develop a mixed training dataset consisting of both
the original data samples and their retrieval-enhanced counterparts. We conduct
extensive experiments on a real-world public dataset (i.e., MovieLens-1M) to
demonstrate the superiority of ReLLa compared with existing baseline models, as
well as its capability for lifelong sequential behavior comprehension.Comment: Under Revie
ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model
During the development of large language models (LLMs), the scale and quality
of the pre-training data play a crucial role in shaping LLMs' capabilities. To
accelerate the research of LLMs, several large-scale datasets, such as C4 [1],
Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public.
However, most of the released corpus focus mainly on English, and there is
still lack of complete tool-chain for extracting clean texts from web data.
Furthermore, fine-grained information of the corpus, e.g. the quality of each
text, is missing. To address these challenges, we propose in this paper a new
complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data.
First, similar to previous work, manually crafted rules are employed to discard
explicit noisy texts from the raw crawled web contents. Second, a well-designed
evaluation model is leveraged to assess the remaining relatively clean data,
and each text is assigned a specific quality score. Finally, we can easily
utilize an appropriate threshold to select the high-quality pre-training data
for Chinese. Using our proposed approach, we release the largest and latest
large-scale high-quality Chinese web text ChineseWebText, which consists of
1.42 TB and each text is associated with a quality score, facilitating the LLM
researchers to choose the data according to the desired quality thresholds. We
also release a much cleaner subset of 600 GB Chinese data with the quality
exceeding 90%
How Can Recommender Systems Benefit from Large Language Models: A Survey
Recommender systems (RS) play important roles to match users' information
needs for Internet applications. In natural language processing (NLP) domains,
large language model (LLM) has shown astonishing emergent abilities (e.g.,
instruction following, reasoning), thus giving rise to the promising research
direction of adapting LLM to RS for performance enhancements and user
experience improvements. In this paper, we conduct a comprehensive survey on
this research direction from an application-oriented view. We first summarize
existing research works from two orthogonal perspectives: where and how to
adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could
play in different stages of the recommendation pipeline, i.e., feature
engineering, feature encoder, scoring/ranking function, and pipeline
controller. For the "HOW" question, we investigate the training and inference
strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to
tune LLMs or not, and whether to involve conventional recommendation model
(CRM) for inference. Detailed analysis and general development trajectories are
provided for both questions, respectively. Then, we highlight key challenges in
adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and
ethics. Finally, we summarize the survey and discuss the future prospects. We
also actively maintain a GitHub repository for papers and other related
resources in this rising direction:
https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.Comment: 15 pages; 3 figures; summarization table in appendi
Case report of a Li-Fraumeni syndrome-like phenotype with a de novo mutation in <i>CHEK2</i>
BACKGROUND: Cases of multiple tumors are rarely reported in China. In our study, a 57-year-old female patient had concurrent squamous cell carcinoma, mucoepidermoid carcinoma, brain cancer, bone cancer, and thyroid cancer, which has rarely been reported to date. METHODS: To determine the relationship among these multiple cancers, available DNA samples from the thyroid, lung, and skin tumors and from normal thyroid tissue were sequenced using whole exome sequencing. RESULTS: The notable discrepancies of somatic mutations among the 3 tumor tissues indicated that they arose independently, rather than metastasizing from 1 tumor. A novel deleterious germline mutation (chr22:29091846, G->A, p.H371Y) was identified in CHEK2, a Li–Fraumeni syndrome causal gene. Examining the status of this novel mutation in the patient's healthy siblings revealed its de novo origin. CONCLUSION: Our study reports the first case of Li–Fraumeni syndrome-like in Chinese patients and demonstrates the important contribution of de novo mutations in this type of rare disease
Detection and analysis of human papillomavirus (HPV) DNA in breast cancer patients by an effective method of HPV capture
Despite an increase in the number of molecular epidemiological studies conducted in recent years to evaluate the association between human papillomavirus (HPV) and the risk of breast carcinoma, these studies remain inconclusive. Here we aim to detect HPV DNA in various tissues from patients with breast carcinoma using the method of HPV capture combined with massive paralleled sequencing (MPS). To validate the confidence of our methods, 15 cervical cancer samples were tested by PCR and the new method. Results showed that there was 100% consistence between the two methods.DNA from peripheral blood, tumor tissue, adjacent lymph nodes and adjacent normal tissue were collected from seven malignant breast cancer patients, and HPV type 16(HPV16) was detected in 1/7, 1/7, 1/7and 1/7 of patients respectively. Peripheral blood, tumor tissue and adjacent normal tissue were also collected from two patients with benign breast tumor, and 1/2, 2/2 and 2/2 was detected to have HPV16 DNA respectively. MPS metrics including mapping ratio, coverage, depth and SNVs were provided to characterize HPV in samples. The average coverage was 69% and 61.2% for malignant and benign samples respectively. 126 SNVs were identified in all 9 samples. The maximum number of SNVs was located in the gene of E2 and E4 among all samples. Our study not only provided an efficient method to capture HPV DNA, but detected the SNVS, coverage, SNV type and depth. The finding has provided further clue of association between HPV16 and breast cancer
miR-486-3p Influences the Neurotoxicity of a-Synuclein by Targeting the SIRT2 Gene and the Polymorphisms at Target Sites Contributing to Parkinson’s Disease
Background/Aims: Increasing evidence suggests the important role of sirtuin 2 (SIRT2) in the pathology of Parkinson’s disease (PD). However, the association between potential functional polymorphisms in the SIRT2 gene and PD still needs to be identified. Exploring the molecular mechanism underlying this potential association could also provide novel insights into the pathogenesis of this disorder. Methods: Bioinformatics analysis and screening were first performed to find potential microRNAs (miRNAs) that could target the SIRT2 gene, and molecular biology experiments were carried out to further identify the regulation between miRNA and SIRT2 and characterize the pivotal role of miRNA in PD models. Moreover, a clinical case-control study was performed with 304 PD patients and 312 healthy controls from the Chinese Han population to identify the possible association of single nucleotide polymorphisms (SNPs) within the miRNA binding sites of SIRT2 with the risk of PD. Results: Here, we demonstrate that miR-486-3p binds to the 3’ UTR of SIRT2 and influences the translation of SIRT2. MiR-486-3p mimics can decrease the level of SIRT2 and reduce a-synuclein (α-syn)-induced aggregation and toxicity, which may contribute to the progression of PD. Interestingly, we find that a SNP, rs2241703, may disrupt miR-486-3p binding sites in the 3’ UTR of SIRT2, subsequently influencing the translation of SIRT2. Through the clinical case-control study, we further verify that rs2241703 is associated with PD risk in the Chinese Han population. Conclusion: The present study confirms that the rs2241703 polymorphism in the SIRT2 gene is associated with PD in the Chinese Han population, provides the potential mechanism of the susceptibility locus in determining PD risk and reveals a potential target of miRNA for the treatment and prevention of PD
Photoemission Evidence of a Novel Charge Order in Kagome Metal FeGe
A charge order has been discovered to emerge deep into the antiferromagnetic
phase of the kagome metal FeGe. To study its origin, the evolution of the
low-lying electronic structure across the charge order phase transition is
investigated with angle-resolved photoemission spectroscopy. We do not find
signatures of nesting between Fermi surface sections or van-Hove singularities
in zero-frequency joint density of states, and there are no obvious energy gaps
at the Fermi level, which exclude the nesting mechanism for the charge order
formation in FeGe. However, two obvious changes in the band structure have been
detected, i.e., one electron-like band around the K point and another one
around the A point move upward in energy position when the charge order forms.
These features can be well reproduced by our density-functional theory
calculations, where the charge order is primarily driven by magnetic energy
saving via large dimerizations of a quarter of Ge1-sites (in the kagome plane)
along the c-axis. Our results provide strong support for this novel charge
order formation mechanism in FeGe, in contrast to the conventional nesting
mechanism.Comment: 6 pages, 4 figure
Principles and methods of scaling geospatial Earth science data
The properties of geographical phenomena vary with changes in the scale of measurement. The information observed at one scale often cannot be directly used as information at another scale. Scaling addresses these changes in properties in relation to the scale of measurement, and plays an important role in Earth sciences by providing information at the scale of interest, which may be required for a range of applications, and may be useful for inferring geographical patterns and processes. This paper presents a review of geospatial scaling methods for Earth science data. Based on spatial properties, we propose a methodological framework for scaling addressing upscaling, downscaling and side-scaling. This framework combines scale-independent and scale-dependent properties of geographical variables. It allows treatment of the varying spatial heterogeneity of geographical phenomena, combines spatial autocorrelation and heterogeneity, addresses scale-independent and scale-dependent factors, explores changes in information, incorporates geospatial Earth surface processes and uncertainties, and identifies the optimal scale(s) of models. This study shows that the classification of scaling methods according to various heterogeneities has great potential utility as an underpinning conceptual basis for advances in many Earth science research domains. © 2019 Elsevier B.V
- …