141 research outputs found

    Crop Phenology Estimation in Rice Fields Using Sentinel-1 GRD SAR Data and Machine Learning-Aided Particle Filtering Approach

    Get PDF
    Monitoring crop phenology is essential for managing field disasters, protecting the environment, and making decisions about agricultural productivity. Because of its high timeliness, high resolution, great penetration, and sensitivity to specific structural elements, synthetic aperture radar (SAR) is a valuable technique for crop phenology estimation. Particle filtering (PF) belongs to the family of dynamical approach and has the ability to predict crop phenology with SAR data in real time. The observation equation is a key factor affecting the accuracy of particle filtering estimation and depends on fitting. Compared to the common polynomial fitting (POLY), machine learning methods can automatically learn features and handle complex data structures, offering greater flexibility and generalization capabilities. Therefore, incorporating two ensemble learning algorithms consisting of support vector machine regression (SVR), random forest regression (RFR), respectively, we proposed two machine learning-aided particle filtering approaches (PF-SVR, PF-RFR) to estimate crop phenology. One year of time-series Sentinel-1 GRD SAR data in 2017 covering rice fields in Sevilla region in Spain was used for establishing the observation and prediction equations, and the other year of data in 2018 was used for validating the prediction accuracy of PF methods. Four polarization features (VV, VH, VH/VV and Radar Vegetation Index (RVI)) were exploited as the observations in modeling. Experimental results reveals that the machine learning-aided methods are superior than the PF-POLY method. The PF-SVR exhibited better performance than the PF-RFR and PF-POLY methods. The optimal outcome from PF-SVR yielded a root-mean-square error (RMSE) of 7.79, compared to 7.94 for PF-RFR and 9.1 for PF-POLY. Moreover, the results suggest that the RVI is generally more sensitive than other features to crop phenology and the performance of polarization features presented consistent among all methods, i.e., RVI>VV>VH>VH/VV. Our findings offer valuable references for real-time crop phenology monitoring with SAR data

    ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

    Full text link
    With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation, we perform semantic user behavior retrieval (SUBR) to improve the data quality of testing samples, which greatly reduces the difficulty for LLMs to extract the essential knowledge from user behavior sequences. As for few-shot recommendation, we further design retrieval-enhanced instruction tuning (ReiT) by adopting SUBR as a data augmentation technique for training samples. Specifically, we develop a mixed training dataset consisting of both the original data samples and their retrieval-enhanced counterparts. We conduct extensive experiments on a real-world public dataset (i.e., MovieLens-1M) to demonstrate the superiority of ReLLa compared with existing baseline models, as well as its capability for lifelong sequential behavior comprehension.Comment: Under Revie

    ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

    Full text link
    During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%

    How Can Recommender Systems Benefit from Large Language Models: A Survey

    Full text link
    Recommender systems (RS) play important roles to match users' information needs for Internet applications. In natural language processing (NLP) domains, large language model (LLM) has shown astonishing emergent abilities (e.g., instruction following, reasoning), thus giving rise to the promising research direction of adapting LLM to RS for performance enhancements and user experience improvements. In this paper, we conduct a comprehensive survey on this research direction from an application-oriented view. We first summarize existing research works from two orthogonal perspectives: where and how to adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could play in different stages of the recommendation pipeline, i.e., feature engineering, feature encoder, scoring/ranking function, and pipeline controller. For the "HOW" question, we investigate the training and inference strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to tune LLMs or not, and whether to involve conventional recommendation model (CRM) for inference. Detailed analysis and general development trajectories are provided for both questions, respectively. Then, we highlight key challenges in adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and ethics. Finally, we summarize the survey and discuss the future prospects. We also actively maintain a GitHub repository for papers and other related resources in this rising direction: https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.Comment: 15 pages; 3 figures; summarization table in appendi

    Case report of a Li-Fraumeni syndrome-like phenotype with a de novo mutation in <i>CHEK2</i>

    Get PDF
    BACKGROUND: Cases of multiple tumors are rarely reported in China. In our study, a 57-year-old female patient had concurrent squamous cell carcinoma, mucoepidermoid carcinoma, brain cancer, bone cancer, and thyroid cancer, which has rarely been reported to date. METHODS: To determine the relationship among these multiple cancers, available DNA samples from the thyroid, lung, and skin tumors and from normal thyroid tissue were sequenced using whole exome sequencing. RESULTS: The notable discrepancies of somatic mutations among the 3 tumor tissues indicated that they arose independently, rather than metastasizing from 1 tumor. A novel deleterious germline mutation (chr22:29091846, G->A, p.H371Y) was identified in CHEK2, a Li–Fraumeni syndrome causal gene. Examining the status of this novel mutation in the patient's healthy siblings revealed its de novo origin. CONCLUSION: Our study reports the first case of Li–Fraumeni syndrome-like in Chinese patients and demonstrates the important contribution of de novo mutations in this type of rare disease

    Detection and analysis of human papillomavirus (HPV) DNA in breast cancer patients by an effective method of HPV capture

    Get PDF
    Despite an increase in the number of molecular epidemiological studies conducted in recent years to evaluate the association between human papillomavirus (HPV) and the risk of breast carcinoma, these studies remain inconclusive. Here we aim to detect HPV DNA in various tissues from patients with breast carcinoma using the method of HPV capture combined with massive paralleled sequencing (MPS). To validate the confidence of our methods, 15 cervical cancer samples were tested by PCR and the new method. Results showed that there was 100% consistence between the two methods.DNA from peripheral blood, tumor tissue, adjacent lymph nodes and adjacent normal tissue were collected from seven malignant breast cancer patients, and HPV type 16(HPV16) was detected in 1/7, 1/7, 1/7and 1/7 of patients respectively. Peripheral blood, tumor tissue and adjacent normal tissue were also collected from two patients with benign breast tumor, and 1/2, 2/2 and 2/2 was detected to have HPV16 DNA respectively. MPS metrics including mapping ratio, coverage, depth and SNVs were provided to characterize HPV in samples. The average coverage was 69% and 61.2% for malignant and benign samples respectively. 126 SNVs were identified in all 9 samples. The maximum number of SNVs was located in the gene of E2 and E4 among all samples. Our study not only provided an efficient method to capture HPV DNA, but detected the SNVS, coverage, SNV type and depth. The finding has provided further clue of association between HPV16 and breast cancer

    miR-486-3p Influences the Neurotoxicity of a-Synuclein by Targeting the SIRT2 Gene and the Polymorphisms at Target Sites Contributing to Parkinson’s Disease

    Get PDF
    Background/Aims: Increasing evidence suggests the important role of sirtuin 2 (SIRT2) in the pathology of Parkinson’s disease (PD). However, the association between potential functional polymorphisms in the SIRT2 gene and PD still needs to be identified. Exploring the molecular mechanism underlying this potential association could also provide novel insights into the pathogenesis of this disorder. Methods: Bioinformatics analysis and screening were first performed to find potential microRNAs (miRNAs) that could target the SIRT2 gene, and molecular biology experiments were carried out to further identify the regulation between miRNA and SIRT2 and characterize the pivotal role of miRNA in PD models. Moreover, a clinical case-control study was performed with 304 PD patients and 312 healthy controls from the Chinese Han population to identify the possible association of single nucleotide polymorphisms (SNPs) within the miRNA binding sites of SIRT2 with the risk of PD. Results: Here, we demonstrate that miR-486-3p binds to the 3’ UTR of SIRT2 and influences the translation of SIRT2. MiR-486-3p mimics can decrease the level of SIRT2 and reduce a-synuclein (α-syn)-induced aggregation and toxicity, which may contribute to the progression of PD. Interestingly, we find that a SNP, rs2241703, may disrupt miR-486-3p binding sites in the 3’ UTR of SIRT2, subsequently influencing the translation of SIRT2. Through the clinical case-control study, we further verify that rs2241703 is associated with PD risk in the Chinese Han population. Conclusion: The present study confirms that the rs2241703 polymorphism in the SIRT2 gene is associated with PD in the Chinese Han population, provides the potential mechanism of the susceptibility locus in determining PD risk and reveals a potential target of miRNA for the treatment and prevention of PD

    Photoemission Evidence of a Novel Charge Order in Kagome Metal FeGe

    Full text link
    A charge order has been discovered to emerge deep into the antiferromagnetic phase of the kagome metal FeGe. To study its origin, the evolution of the low-lying electronic structure across the charge order phase transition is investigated with angle-resolved photoemission spectroscopy. We do not find signatures of nesting between Fermi surface sections or van-Hove singularities in zero-frequency joint density of states, and there are no obvious energy gaps at the Fermi level, which exclude the nesting mechanism for the charge order formation in FeGe. However, two obvious changes in the band structure have been detected, i.e., one electron-like band around the K point and another one around the A point move upward in energy position when the charge order forms. These features can be well reproduced by our density-functional theory calculations, where the charge order is primarily driven by magnetic energy saving via large dimerizations of a quarter of Ge1-sites (in the kagome plane) along the c-axis. Our results provide strong support for this novel charge order formation mechanism in FeGe, in contrast to the conventional nesting mechanism.Comment: 6 pages, 4 figure

    Principles and methods of scaling geospatial Earth science data

    Get PDF
    The properties of geographical phenomena vary with changes in the scale of measurement. The information observed at one scale often cannot be directly used as information at another scale. Scaling addresses these changes in properties in relation to the scale of measurement, and plays an important role in Earth sciences by providing information at the scale of interest, which may be required for a range of applications, and may be useful for inferring geographical patterns and processes. This paper presents a review of geospatial scaling methods for Earth science data. Based on spatial properties, we propose a methodological framework for scaling addressing upscaling, downscaling and side-scaling. This framework combines scale-independent and scale-dependent properties of geographical variables. It allows treatment of the varying spatial heterogeneity of geographical phenomena, combines spatial autocorrelation and heterogeneity, addresses scale-independent and scale-dependent factors, explores changes in information, incorporates geospatial Earth surface processes and uncertainties, and identifies the optimal scale(s) of models. This study shows that the classification of scaling methods according to various heterogeneities has great potential utility as an underpinning conceptual basis for advances in many Earth science research domains. © 2019 Elsevier B.V
    corecore