97 research outputs found

    Instruction Mining: When Data Mining Meets Large Language Model Finetuning

    Full text link
    Large language models (LLMs) are initially pretrained for broad capabilities and then finetuned with instruction-following datasets to improve their performance in interacting with humans. Despite advances in finetuning, a standardized guideline for selecting high-quality datasets to optimize this process remains elusive. In this paper, we first propose InstructMining, an innovative method designed for automatically selecting premium instruction-following data for finetuning LLMs. Specifically, InstructMining utilizes natural language indicators as a measure of data quality, applying them to evaluate unseen datasets. During experimentation, we discover that double descent phenomenon exists in large language model finetuning. Based on this observation, we further leverage BlendSearch to help find the best subset among the entire dataset (i.e., 2,532 out of 100,000). Experiment results show that InstructMining-7B achieves state-of-the-art performance on two of the most popular benchmarks: LLM-as-a-judge and Huggingface OpenLLM leaderboard.Comment: 22 pages, 7 figure

    Short-Term Wind Speed Prediction Using EEMD-LSSVM Model

    Get PDF
    Hybrid Ensemble Empirical Mode Decomposition (EEMD) and Least Square Support Vector Machine (LSSVM) is proposed to improve short-term wind speed forecasting precision. The EEMD is firstly utilized to decompose the original wind speed time series into a set of subseries. Then the LSSVM models are established to forecast these subseries. Partial autocorrelation function is adopted to analyze the inner relationships between the historical wind speed series in order to determine input variables of LSSVM models for prediction of every subseries. Finally, the superposition principle is employed to sum the predicted values of every subseries as the final wind speed prediction. The performance of hybrid model is evaluated based on six metrics. Compared with LSSVM, Back Propagation Neural Networks (BP), Auto-Regressive Integrated Moving Average (ARIMA), combination of Empirical Mode Decomposition (EMD) with LSSVM, and hybrid EEMD with ARIMA models, the wind speed forecasting results show that the proposed hybrid model outperforms these models in terms of six metrics. Furthermore, the scatter diagrams of predicted versus actual wind speed and histograms of prediction errors are presented to verify the superiority of the hybrid model in short-term wind speed prediction

    Recent Development of Nano-Materials Used in DNA Biosensors

    Get PDF
    As knowledge of the structure and function of nucleic acid molecules has increased, sequence-specific DNA detection has gained increased importance. DNA biosensors based on nucleic acid hybridization have been actively developed because of their specificity, speed, portability, and low cost. Recently, there has been considerable interest in using nano-materials for DNA biosensors. Because of their high surface-to-volume ratios and excellent biological compatibilities, nano-materials could be used to increase the amount of DNA immobilization; moreover, DNA bound to nano-materials can maintain its biological activity. Alternatively, signal amplification by labeling a targeted analyte with nano-materials has also been reported for DNA biosensors in many papers. This review summarizes the applications of various nano-materials for DNA biosensors during past five years. We found that nano-materials of small sizes were advantageous as substrates for DNA attachment or as labels for signal amplification; and use of two or more types of nano-materials in the biosensors could improve their overall quality and to overcome the deficiencies of the individual nano-components. Most current DNA biosensors require the use of polymerase chain reaction (PCR) in their protocols. However, further development of nano-materials with smaller size and/or with improved biological and chemical properties would substantially enhance the accuracy, selectivity and sensitivity of DNA biosensors. Thus, DNA biosensors without PCR amplification may become a reality in the foreseeable future

    Association of genetic variation with systolic and diastolic blood pressure among African Americans: the Candidate Gene Association Resource study

    Get PDF
    The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10−8) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10−8). The top IBC association for SBP was rs2012318 (P= 6.4 × 10−6) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10−6) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexit

    Association of genetic variation with systolic and diastolic blood pressure among African Americans: the Candidate Gene Association Resource study.

    Get PDF
    The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10(-8)) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10(-8)). The top IBC association for SBP was rs2012318 (P= 6.4 × 10(-6)) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10(-6)) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexity

    New genetic loci link adipose and insulin biology to body fat distribution.

    Get PDF
    Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms

    Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function.

    Get PDF
    Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, 19 associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways

    <span style="font-size:14.0pt;line-height: 115%;font-family:"Times New Roman";mso-fareast-font-family:"Times New Roman"; color:black;mso-ansi-language:EN-IN;mso-fareast-language:EN-IN;mso-bidi-language: HI" lang="EN-IN">Flavones from <i>Elsholtzia stauntonii</i></span>

    No full text
    1332-1334Structures of two new f1avones isolated from Elsholtzia stauntonii Benth (Chinese name "Muxiangru" , Berberidaceae) have been determined by spectroscopic methods as 5-hydroxy-7,5'-dimethoxy-6,8-dimethyl -3',4'-methylenediox)' flavone 1 and 5-hydroxy-7 -methoxy-8-methyl -3',4' -methylenedioxy-5' -(3-methyl -but-2-enyl )-3',4' -methylenedioxy flavone 2, along with two known flavones, 5-hydroxy-7,4'-dimethoxy-6 -methyl flavone 3, and 5-hydroxy-6,7-dimethoxyflavone 4
    corecore