1,288 research outputs found
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Hypercarbons in polyhedral structures
Though carbon is mostly tetravalent and tetracoordinated, there are several examples where the coordination number exceeds four. Structural varieties that exhibit hypercarbons in polyhedral structures such as polyhedral carboranes, sandwich complexes, encapsulated polyhedral structures and novel planar aromatic systems with atoms embedded in the middle are reviewed here. The structural variety anticipated with hypercoordinate carbon among carboranes is large as there are many modes of condensation that could lead to large number of new patterns. The relative stabilities of positional isomers of polyhedral carboranes, sandwich structures, and endohedral carboranes are briefly described. The mno rule accounts for the variety of structural patterns. Wheel-shaped and planar hypercoordinated molecules are recent theoretical developments in this area
Operationalizing Machine Learning: An Interview Study
Organizations rely on machine learning engineers (MLEs) to operationalize ML,
i.e., deploy and maintain ML pipelines in production. The process of
operationalizing ML, or MLOps, consists of a continual loop of (i) data
collection and labeling, (ii) experimentation to improve ML performance, (iii)
evaluation throughout a multi-staged deployment process, and (iv) monitoring of
performance drops in production. When considered together, these
responsibilities seem staggering -- how does anyone do MLOps, what are the
unaddressed challenges, and what are the implications for tool builders?
We conducted semi-structured ethnographic interviews with 18 MLEs working
across many applications, including chatbots, autonomous vehicles, and finance.
Our interviews expose three variables that govern success for a production ML
deployment: Velocity, Validation, and Versioning. We summarize common practices
for successful ML experimentation, deployment, and sustaining production
performance. Finally, we discuss interviewees' pain points and anti-patterns,
with implications for tool design.Comment: 20 pages, 4 figure
Low-mass Solitons from Fractional Charges in Quantum Chromodynamics
Slansky, Goldman, and Shaw have proposed a model to account for the observation of fractionally charged states. We show that in this model, there are expected to be several low-mass solitons (four being in the mass range ∼20-60 MeV) associated with the third homotopy group π3(SU(3)/SO(3))=Z4, besides a low-mass (∼30 MeV) Z2 monopole. Confirmation of these levels and hence of the model has important implications for Cabrera\u27s results on the magnetic monopole. An efficient algorithm for the calculation of π3(G/H) for a general Lie group G and a subgroup H is developed. It is pointed out that solitons associated with the third homotopy group are predicted by some grand-unified-theory scenarios
Soliton States in the Quantum-Chromodynamic Effective Lagrangian
The work of Skyrme has shown that the SU(2)×SU(2) chiral model has nontrivial topological sectors which admit solitons for generic chiral Lagrangians. In this paper, we study such models in the presence of baryon fields. The baryon number and strangeness of the solitons, and the bound states of the nucleon to the soliton are investigated. It is found that long-lived levels with large baryon number B and strangeness (≳6 in magnitude) and masses somewhere in the range 1.8 to 5.6 GeV must exist. Some of these levels have half-integral electric charge and exotic relation between B and spin s (e.g., even B and half-integer s). It is speculated that these levels may be related to the anomalous nuclei whose existence has been confirmed in cosmic-ray and LBL Bevalac experiments
Revisiting Prompt Engineering via Declarative Crowdsourcing
Large language models (LLMs) are incredibly powerful at comprehending and
generating data in the form of text, but are brittle and error-prone. There has
been an advent of toolkits and recipes centered around so-called prompt
engineering-the process of asking an LLM to do something via a series of
prompts. However, for LLM-powered data processing workflows, in particular,
optimizing for quality, while keeping cost bounded, is a tedious, manual
process. We put forth a vision for declarative prompt engineering. We view LLMs
like crowd workers and leverage ideas from the declarative crowdsourcing
literature-including leveraging multiple prompting strategies, ensuring
internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make
prompt engineering a more principled process. Preliminary case studies on
sorting, entity resolution, and imputation demonstrate the promise of our
approac
- …