72 research outputs found

    How to collect high quality segmentations: use human or computer drawn object boundaries?

    Full text link
    High quality segmentations must be captured consistently for applications such as biomedical image analysis. While human drawn segmentations are often collected because they provide a consistent level of quality, computer drawn segmentations can be collected efficiently and inexpensively. In this paper, we examine how to leverage available human and computer resources to consistently create high quality segmentations. We propose a quality control methodology. We demonstrate how to apply this approach using crowdsourced and domain expert votes for the "best" segmentation from a collection of human and computer drawn segmentations for 70 objects from a public dataset and 274 objects from biomedical images. We publicly share the library of biomedical images which includes 1,879 manual annotations of the boundaries of 274 objects. We found for the 344 objects that no single segmentation source was preferred and that human annotations are not always preferred over computer annotations. These results motivated us to examine the traditional approach to evaluate segmentation algorithms, which involves comparing the segmentations produced by the algorithms to manual annotations on benchmark datasets. We found that algorithm benchmarking results change when the comparison is made to consensus-voted segmentations. Our results led us to suggest a new segmentation approach that uses machine learning to predict the optimal segmentation source and a modified segmentation evaluation approach.National Science Foundation (IIS-0910908

    Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning

    Full text link
    We present Archer, a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. Notably, this dataset demonstrates a significantly higher level of complexity compared to existing publicly available datasets. Our evaluation shows that Archer challenges the capabilities of current state-of-the-art models, with a high-ranked model on the Spider leaderboard achieving only 6.73% execution accuracy on Archer test set. Thus, Archer presents a significant challenge for future research in this field.Comment: EACL 202

    Large language models as reliable knowledge bases?

    Get PDF
    The NLP community has recently shown a growing interest in leveraging Large Language Models (LLMs) for knowledge-intensive tasks, viewing LLMs as potential knowledge bases (KBs). However, the reliability and extent to which LLMs can function as KBs remain underexplored. While previous studies suggest LLMs can encode knowledge within their parameters, the amount of parametric knowledge alone is not sufficient to evaluate their effectiveness as KBs. This study defines criteria that a reliable LLM-as-KB should meet, focusing on factuality and consistency, and covering both seen and unseen knowledge. We develop several metrics based on these criteria and use them to evaluate 26 popular LLMs, while providing a comprehensive analysis of the effects of model size, instruction tuning, and in-context learning (ICL). Our results paint a worrying picture. Even a high-performant model like GPT-3.5-turbo is not factual or consistent, and strategies like ICL and fine-tuning are unsuccessful at making LLMs better KBs

    TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

    Full text link
    Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics

    EFFECT OF CONSTANT VOLUME STRUCTURE PARAMETERS ON GRAIN VENTILATION DRYING

    Get PDF
    ABSTRACT The structural parameters of a grain dryer are directly related to the its energy consumption and quality formation. Therefore, based on the Ergun model, air state parameters and uniformity evaluation method, the effect of changes in the ventilation area on grain airflow resistance, drying energy consumption, drying efficiency and uniformity are theoretically and experimentally studied in this work under the same initial grain weight and air flux. The results show that under the conditions of air temperatures 35℃ and 70℃, the weight of paddy 8.547 kg and the air flux 12.3 m3·h-1, hot air introduced into the drying chamber with cross-sectional areas of S1 and S2 respectively, the ventilation area enlarged by 2.328 times, the grain airflow resistance decreased by 7.17 and 6.99 times. Enlarging the ventilation area effectively improved the drying rate of paddy, especially at 70℃, while the unit energy consumption was the opposite. It also accelerated the moving speed of the saturated humidity line in the drying layer and improved the drying uniformity of the paddy. These experimental results are in agreement with the theoretical analysis, which provides a reference for the design of grain drying equipment and technology

    Comparison of in vitro Neuronal Differentiation Capacity Between Mouse Epiblast Stem Cells Derived From Nuclear Transfer and Naturally Fertilized Embryos

    Get PDF
    Somatic cell nuclear transfer (SCNT) can give rise to fertile adults, but the successful perinatal and postnatal developmental rates are inefficient, including delayed developmental behaviors, and respiratory failure. However, the molecular and cellular mechanisms remain elusive. Mouse epiblast stem cells (mEpiSCs) from E5.5-6.5 epiblasts share defining features with human embryonic stem cells (hESCs), providing a new opportunity to study early mammalian development in vitro. In this study, mEpiSCs were established from naturally fertilized mouse embryos (F-mEpiSCs) and SCNT mouse embryos (NT-mEpiSCs). Also, the in vitro neuronal differentiation capacity of F-mEpiSCs and NT-mEpiSCs was compared. Morphology analysis showed less and smaller neurospheres formation and lower percentage of early neurons generation in NT-mEpiSCs. The immunocytochemical analysis and altered mRNA expression levels of the neuronal markers in differentiated cells further confirmed that neurogenesis was slower in NT-mEpiSCs than in F-mEpiSCs. Moreover, neuronal differentiation capacity was correlated with the basal expression levels of Atox1 and Vinculin but not Brachyury and Otx2, emphasizing that developmental aberrations in neurogenesis were associated with the NT technique but not random variations between clones. This study provided an important in vitro platform using mEpiSCs to study early epigenetic and developmental processes associated with neurogenesis

    Depiction of immune heterogeneity of peripheral blood from patients with type II diabetic nephropathy based on mass cytometry

    Get PDF
    Diabetic nephropathy (DN) is the most prominent cause of chronic kidney disease and end-stage renal failure. However, the pathophysiology of DN, especially the risk factors for early onset remains elusive. Increasing evidence has revealed the role of the innate immune system in developing DN, but relatively little is known about early immunological change that proceeds from overt DN. Herein, this work aims to investigate the immune-driven pathogenesis of DN using mass cytometry (CyTOF). The peripheral blood mononuclear lymphocytes (PBMC) from 6 patients with early-stage nephropathy and 7 type II diabetes patients without nephropathy were employed in the CyTOF test. A panel that contains 38 lineage markers was designed to monitor immune protein levels in PBMC. The unsupervised clustering analysis was performed to profile the proportion of individual cells. t-Distributed Stochastic Neighbor Embedding (t-SNE) was used to visualize the differences in DN patients’ immune phenotypes. Comprehensive immune profiling revealed substantial immune system alterations in the early onset of DN, including the significant decline of B cells and the marked increase of monocytes. The level of CXCR3 was dramatically reduced in the different immune cellular subsets. The CyTOF data classified the fine-grained differential immune cell subsets in the early stage of DN. Innovatively, we identified several significant changed T cells, B cell, and monocyte subgroups in the early-stage DN associated with several potential biomarkers for developing DN, such as CTLA-4, CXCR3, PD-1, CD39, CCR4, and HLA-DR. Correlation analysis further demonstrated the robust relationship between above immune cell biomarkers and clinical parameters in the DN patients. Therefore, we provided a convincible view of understanding the immune-driven early pathogenesis of DN. Our findings exhibited that patients with DN are more susceptible to immune system disorders. The classification of fine-grained immune cell subsets in this present research might provide novel targets for the immunotherapy of DN

    A Cu2+ (S = 1/2) Kagom\'e Antiferromagnet: MgxCu4-x(OH)6Cl2

    Full text link
    Spin-frustrated systems are one avenue for inducing macroscopic quantum states in materials. However, experimental realization of this goal has been difficult because of the lack of simple materials and, if available, the separation of the unusual magnetic properties arising from exotic magnetic states from behavior associated with chemical disorder, such as site mixing. Here we report the synthesis and magnetic properties of a new series of magnetically frustrated materials, MgxCu4-x(OH)6Cl2. Because of the substantially different ligand-field chemistry of Mg2+ and Cu2+, site disorder within the kagom\'e layers is minimized, as directly measured by X-ray diffraction. Our results reveal that many of the properties of these materials and related systems are not due to disorder of the magnetic lattice but rather reflect an unusual ground state.Comment: Accepted for publication in J. Am. Chem. Soc

    Opposing effects of final population density and stress on Escherichia coli mutation rate

    Get PDF
    Evolution depends on mutations. For an individual genotype, the rate at which mutations arise is known to increase with various stressors (stress-induced mutagenesis-SIM) and decrease at high final population density (density-associated mutation-rate plasticity-DAMP). We hypothesised that these two forms of mutation-rate plasticity would have opposing effects across a nutrient gradient. Here we test this hypothesis, culturing Escherichia coli in increasingly rich media. We distinguish an increase in mutation rate with added nutrients through SIM (dependent on error-prone polymerases Pol IV and Pol V) and an opposing effect of DAMP (dependent on MutT, which removes oxidised G nucleotides). The combination of DAMP and SIM results in a mutation rate minimum at intermediate nutrient levels (which can support 7 × 10  cells ml ). These findings demonstrate a strikingly close and nuanced relationship of ecological factors-stress and population density-with mutation, the fuel of all evolution
    • …
    corecore