72 research outputs found
How to collect high quality segmentations: use human or computer drawn object boundaries?
High quality segmentations must be captured consistently for applications such as biomedical image analysis. While human drawn segmentations are often collected because they provide a consistent level of quality, computer drawn segmentations can be collected efficiently and inexpensively. In this paper, we examine how to leverage available human and computer resources to consistently create high quality segmentations. We propose a quality control methodology. We demonstrate how to apply this approach using crowdsourced and domain expert votes for
the "best" segmentation from a collection of human and computer drawn segmentations for 70 objects from a public dataset and 274 objects from biomedical images. We publicly share the library of biomedical images which includes 1,879 manual annotations of the boundaries of 274 objects. We found for the 344 objects that no single segmentation source was preferred and that human annotations are not always preferred over computer annotations.
These results motivated us to examine the traditional approach to evaluate segmentation algorithms, which involves comparing the segmentations produced by the algorithms to manual annotations on benchmark datasets. We found that algorithm benchmarking results change when the comparison is made to consensus-voted segmentations. Our results
led us to suggest a new segmentation approach that uses machine learning to predict the optimal segmentation source and a modified segmentation evaluation approach.National Science Foundation (IIS-0910908
Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning
We present Archer, a challenging bilingual text-to-SQL dataset specific to
complex reasoning, including arithmetic, commonsense and hypothetical
reasoning. It contains 1,042 English questions and 1,042 Chinese questions,
along with 521 unique SQL queries, covering 20 English databases across 20
domains. Notably, this dataset demonstrates a significantly higher level of
complexity compared to existing publicly available datasets. Our evaluation
shows that Archer challenges the capabilities of current state-of-the-art
models, with a high-ranked model on the Spider leaderboard achieving only 6.73%
execution accuracy on Archer test set. Thus, Archer presents a significant
challenge for future research in this field.Comment: EACL 202
Large language models as reliable knowledge bases?
The NLP community has recently shown a growing interest in leveraging Large Language Models (LLMs) for knowledge-intensive tasks, viewing LLMs as potential knowledge bases (KBs). However, the reliability and extent to which LLMs can function as KBs remain underexplored. While previous studies suggest LLMs can encode knowledge within their parameters, the amount of parametric knowledge alone is not sufficient to evaluate their effectiveness as KBs. This study defines criteria that a reliable LLM-as-KB should meet, focusing on factuality and consistency, and covering both seen and unseen knowledge. We develop several metrics based on these criteria and use them to evaluate 26 popular LLMs, while providing a comprehensive analysis of the effects of model size, instruction tuning, and in-context learning (ICL). Our results paint a worrying picture. Even a high-performant model like GPT-3.5-turbo is not factual or consistent, and strategies like ICL and fine-tuning are unsuccessful at making LLMs better KBs
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Large Language Models (LLMs) have demonstrated impressive capabilities across
various domains, prompting a surge in their practical applications. However,
concerns have arisen regarding the trustworthiness of LLMs outputs,
particularly in closed-book question-answering tasks, where non-experts may
struggle to identify inaccuracies due to the absence of contextual or ground
truth information. This paper introduces TrustScore, a framework based on the
concept of Behavioral Consistency, which evaluates whether an LLMs response
aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly
integrate with fact-checking methods, which assesses alignment with external
knowledge sources. The experimental results show that TrustScore achieves
strong correlations with human judgments, surpassing existing reference-free
metrics, and achieving results on par with reference-based metrics
EFFECT OF CONSTANT VOLUME STRUCTURE PARAMETERS ON GRAIN VENTILATION DRYING
ABSTRACT The structural parameters of a grain dryer are directly related to the its energy consumption and quality formation. Therefore, based on the Ergun model, air state parameters and uniformity evaluation method, the effect of changes in the ventilation area on grain airflow resistance, drying energy consumption, drying efficiency and uniformity are theoretically and experimentally studied in this work under the same initial grain weight and air flux. The results show that under the conditions of air temperatures 35℃ and 70℃, the weight of paddy 8.547 kg and the air flux 12.3 m3·h-1, hot air introduced into the drying chamber with cross-sectional areas of S1 and S2 respectively, the ventilation area enlarged by 2.328 times, the grain airflow resistance decreased by 7.17 and 6.99 times. Enlarging the ventilation area effectively improved the drying rate of paddy, especially at 70℃, while the unit energy consumption was the opposite. It also accelerated the moving speed of the saturated humidity line in the drying layer and improved the drying uniformity of the paddy. These experimental results are in agreement with the theoretical analysis, which provides a reference for the design of grain drying equipment and technology
Comparison of in vitro Neuronal Differentiation Capacity Between Mouse Epiblast Stem Cells Derived From Nuclear Transfer and Naturally Fertilized Embryos
Somatic cell nuclear transfer (SCNT) can give rise to fertile adults, but the successful perinatal and postnatal developmental rates are inefficient, including delayed developmental behaviors, and respiratory failure. However, the molecular and cellular mechanisms remain elusive. Mouse epiblast stem cells (mEpiSCs) from E5.5-6.5 epiblasts share defining features with human embryonic stem cells (hESCs), providing a new opportunity to study early mammalian development in vitro. In this study, mEpiSCs were established from naturally fertilized mouse embryos (F-mEpiSCs) and SCNT mouse embryos (NT-mEpiSCs). Also, the in vitro neuronal differentiation capacity of F-mEpiSCs and NT-mEpiSCs was compared. Morphology analysis showed less and smaller neurospheres formation and lower percentage of early neurons generation in NT-mEpiSCs. The immunocytochemical analysis and altered mRNA expression levels of the neuronal markers in differentiated cells further confirmed that neurogenesis was slower in NT-mEpiSCs than in F-mEpiSCs. Moreover, neuronal differentiation capacity was correlated with the basal expression levels of Atox1 and Vinculin but not Brachyury and Otx2, emphasizing that developmental aberrations in neurogenesis were associated with the NT technique but not random variations between clones. This study provided an important in vitro platform using mEpiSCs to study early epigenetic and developmental processes associated with neurogenesis
Depiction of immune heterogeneity of peripheral blood from patients with type II diabetic nephropathy based on mass cytometry
Diabetic nephropathy (DN) is the most prominent cause of chronic kidney disease and end-stage renal failure. However, the pathophysiology of DN, especially the risk factors for early onset remains elusive. Increasing evidence has revealed the role of the innate immune system in developing DN, but relatively little is known about early immunological change that proceeds from overt DN. Herein, this work aims to investigate the immune-driven pathogenesis of DN using mass cytometry (CyTOF). The peripheral blood mononuclear lymphocytes (PBMC) from 6 patients with early-stage nephropathy and 7 type II diabetes patients without nephropathy were employed in the CyTOF test. A panel that contains 38 lineage markers was designed to monitor immune protein levels in PBMC. The unsupervised clustering analysis was performed to profile the proportion of individual cells. t-Distributed Stochastic Neighbor Embedding (t-SNE) was used to visualize the differences in DN patients’ immune phenotypes. Comprehensive immune profiling revealed substantial immune system alterations in the early onset of DN, including the significant decline of B cells and the marked increase of monocytes. The level of CXCR3 was dramatically reduced in the different immune cellular subsets. The CyTOF data classified the fine-grained differential immune cell subsets in the early stage of DN. Innovatively, we identified several significant changed T cells, B cell, and monocyte subgroups in the early-stage DN associated with several potential biomarkers for developing DN, such as CTLA-4, CXCR3, PD-1, CD39, CCR4, and HLA-DR. Correlation analysis further demonstrated the robust relationship between above immune cell biomarkers and clinical parameters in the DN patients. Therefore, we provided a convincible view of understanding the immune-driven early pathogenesis of DN. Our findings exhibited that patients with DN are more susceptible to immune system disorders. The classification of fine-grained immune cell subsets in this present research might provide novel targets for the immunotherapy of DN
A Cu2+ (S = 1/2) Kagom\'e Antiferromagnet: MgxCu4-x(OH)6Cl2
Spin-frustrated systems are one avenue for inducing macroscopic quantum
states in materials. However, experimental realization of this goal has been
difficult because of the lack of simple materials and, if available, the
separation of the unusual magnetic properties arising from exotic magnetic
states from behavior associated with chemical disorder, such as site mixing.
Here we report the synthesis and magnetic properties of a new series of
magnetically frustrated materials, MgxCu4-x(OH)6Cl2. Because of the
substantially different ligand-field chemistry of Mg2+ and Cu2+, site disorder
within the kagom\'e layers is minimized, as directly measured by X-ray
diffraction. Our results reveal that many of the properties of these materials
and related systems are not due to disorder of the magnetic lattice but rather
reflect an unusual ground state.Comment: Accepted for publication in J. Am. Chem. Soc
Opposing effects of final population density and stress on Escherichia coli mutation rate
Evolution depends on mutations. For an individual genotype, the rate at which mutations arise is known to increase with various stressors (stress-induced mutagenesis-SIM) and decrease at high final population density (density-associated mutation-rate plasticity-DAMP). We hypothesised that these two forms of mutation-rate plasticity would have opposing effects across a nutrient gradient. Here we test this hypothesis, culturing Escherichia coli in increasingly rich media. We distinguish an increase in mutation rate with added nutrients through SIM (dependent on error-prone polymerases Pol IV and Pol V) and an opposing effect of DAMP (dependent on MutT, which removes oxidised G nucleotides). The combination of DAMP and SIM results in a mutation rate minimum at intermediate nutrient levels (which can support 7 × 10  cells ml ). These findings demonstrate a strikingly close and nuanced relationship of ecological factors-stress and population density-with mutation, the fuel of all evolution
- …