Search CORE

348 research outputs found

Biologic activity in a fragment of recombinant human interferon α

Author: Ackerman Samuel K.
Heintzelman Mark
Hunkapiller Michael
Zoon Kathryn
Zur Nedden Dorothy
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 15/02/1984
Field of study

To attempt to locate functionally important regions of the interferon (IFN) molecule, recombinant human IFN-α2 was subjected to proteolytic digestion. The bacterial proteinase thermolysin produced two major complementary fragments, HuIFN-α2-(1-110) and HuIFN-α2-(111-153). After reduction with 2-mercaptoethanol and separation of the two major fragments on NaDodSO4/polyacrylamide gel electrophoresis, antiviral activity persisted in the larger, Mr 12,000, fragment consisting of the amino-terminal 110 amino acids

Caltech Authors

Characterizing how 'distributional' NLP corpora distance metrics are

Author: Ackerman Samuel
Farchi Eitan
Kour George
Publication venue
Publication date: 23/10/2023
Field of study

A corpus of vector-embedded text documents has some empirical distribution. Given two corpora, we want to calculate a single metric of distance (e.g., Mauve, Frechet Inception) between them. We describe an abstract quality, called `distributionality', of such metrics. A non-distributional metric tends to use very local measurements, or uses global measurements in a way that does not fully reflect the distributions' true distance. For example, if individual pairwise nearest-neighbor distances are low, it may judge the two corpora to have low distance, even if their two distributions are in fact far from each other. A more distributional metric will, in contrast, better capture the distributions' overall distance. We quantify this quality by constructing a Known-Similarity Corpora set from two paraphrase corpora and calculating the distance between paired corpora from it. The distances' trend shape as set element separation increases should quantify the distributionality of the metric. We propose that Average Hausdorff Distance and energy distance between corpora are representative examples of non-distributional and distributional distance metrics, to which other metrics can be compared, to evaluate how distributional they are.Comment: Published in the August 2023 Joint Statistical Meetings proceeding

arXiv.org e-Print Archive

Reliable and Interpretable Drift Detection in Streams of Short Texts

Author: Ackerman Samuel
Anaby-Tavor Ateret
Rabinovich Ella
Vetzler Matan
Publication venue
Publication date: 28/05/2023
Field of study

Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.Comment: ACL2023 industry track (9 pages

arXiv.org e-Print Archive

Detection of data drift and outliers affecting machine learning model performance over time

Author: Ackerman Samuel
Dube Parijat
Farchi Eitan
Raz Orna
Zalmanovici Marcel
Publication venue
Publication date: 20/01/2021
Field of study

A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribution). We wish to detect these changes but can't measure accuracy without deployment data labels. We instead detect drift indirectly by nonparametrically testing the distribution of model prediction confidence for changes. This generalizes our method and sidesteps domain-specific feature representation. We address important statistical issues, particularly Type-1 error control in sequential testing, using Change Point Models (CPMs; see Adams and Ross 2012). We also use nonparametric outlier methods to show the user suspicious observations for model diagnosis, since the before/after change confidence distributions overlap significantly. In experiments to demonstrate robustness, we train on a subset of MNIST digit classes, then insert drift (e.g., unseen digit class) in deployment data in various settings (gradual/sudden changes in the drift proportion). A novel loss function is introduced to compare the performance (detection delay, Type-1 and 2 errors) of a drift detector under different levels of drift class contamination.Comment: In: JSM Proceedings, Nonparametric Statistics Section, 20202. Philadelphia, PA: American Statistical Association. 144--16

arXiv.org e-Print Archive

Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

Author: Ackerman Samuel
Anaby-Tavor Ateret
Kour George
Raz Orna
Zalmanovici Marcel
Publication venue
Publication date: 27/10/2022
Field of study

Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable and when it is not. In addition to business requirements that should provide a threshold, it is a best practice to require any proposed ML solution to out-perform simple baseline models, such as a decision tree. We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label; these measures can then be used to automatically determine a baseline performance threshold. These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form, regardless of the classifier model used. Our experiments with both numeric synthetic data and real natural language chatbot data demonstrate that the complexity measures effectively highlight data regions and observations that are likely to be misclassified.Comment: Accepted to EDSMLS workshop at AAAI conferenc

arXiv.org e-Print Archive

Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

Author: Ackerman Samuel
Anaby-Tavor Ateret
Farchi Eitan
Rabinovich Ella
Raz Orna
Publication venue
Publication date: 02/11/2023
Field of study

Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.Comment: EMNLP2023 GEM workshop, 17 page

arXiv.org e-Print Archive

Data Drift Monitoring for Log Anomaly Detection Pipelines

Author: Ackerman Samuel
Chang Hau-wen
Farchi Eitan
Lalithsena Sarasi
Liu Xiaotong
Wani Dipak
Publication venue
Publication date: 17/10/2023
Field of study

Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updates to the LAD model defining the `normal' log activity profile. In this paper, we introduce a Bayes Factor-based drift detection method that identifies when intervention, retraining, and updating of the LAD model are required with human involvement. We illustrate our method using sequences of log activity, both from unaltered data, and simulated activity with controlled levels of anomaly contamination, based on real collected log data

arXiv.org e-Print Archive

Distribution and Emergency

Author: Ackerman Bruce
Anderson Mary B.
Calhoun Craig
Carens Joseph
De Waal Alex
Duffield Mark
Easterly William
Keen David
Longhurst R.
Miller David
Minear Larry
Nussbaum Martha
Pictet Jean
Pogge Thomas
Pogge Thomas
Rawls John
Rossiter Clinton
Scheffler Samuel
Sen Amartya
Singer Peter
Singer Peter
Tan Kok-Chor
Terry Fiona
Unger Peter
Publication venue: Ohio State University. Mershon Center for International Security Studies
Publication date: 03/04/2006
Field of study

Streaming video requires RealPlayer to view.The University Archives has determined that this item is of continuing value to OSU's history.Humanitarian organizations divide their work into two categories: development aid that improves underlying conditions, and emergency aid, given in response to a natural or manmade disaster. However, Jennifer Rubenstein, a fellow at Princeton University, questioned this distinction. While it might have logistical advantages, she argued, it does not suit the variety of situations and populations requiring aid.Ohio State University. Mershon Center for International Security StudiesEvent webpage, streaming video, photos, power point presentation, and lecture summar

Crossref

KnowledgeBank at OSU

An L Band Spectrum of the Coldest Brown Dwarf

Author: Ackerman A. S.
Allard F.
Andrew J. Skemer
Beichman C.
Brittany E. Miles
Burrows A.
Burrows A.
Burrows A.
Burrows A.
Caroline V. Morley
Chamberlain J. W.
Channon Visscher
Cushing M. C.
Cushing M. C.
Esplin T. L.
Faherty J. K.
Fortney J. J.
Freedman R. S.
Freedman R. S.
Gordon L. Bjoraker
Hinkel N. R.
Hodapp K. W.
Jacqueline K. Faherty
Jonathan J. Fortney
Katelyn N. Allers
Knapp G. R.
Kopytova T. G.
Leggett S. K.
Leggett S. K.
Leggett S. K.
Line M. R.
Line M. R.
Luhman K. L.
Luhman K. L.
Mark. S. Marley
Marley M. S.
Marley M. S.
Marley M. S.
McBride B. J.
McMahon R. G.
Morley C. V.
Morley C. V.
Morley C. V.
Morley C. V.
Morley C. V.
Richard S. Freedman
Roxana Lupu
Samuel A. Beiler
Saumon D.
Schneider A. C.
Skemer A. J.
Stephens D. C.
Sudarsky D.
Sudarsky D.
Sudarsky D.
Thomas R. Geballe
Tremblin P.
Tsuji T.
Visscher C.
Wright E. L.
Publication venue: 'American Astronomical Society'
Publication date: 01/04/2018
Field of study

The coldest brown dwarf, WISE 0855, is the closest known planetary-mass, free-floating object and has a temperature nearly as cold as the solar system gas giants. Like Jupiter, it is predicted to have an atmosphere rich in methane, water, and ammonia, with clouds of volatile ices. WISE 0855 is faint at near-infrared wavelengths and emits almost all its energy in the mid-infrared. Skemer et al. 2016 presented a spectrum of WISE 0855 from 4.5-5.1 micron (M band), revealing water vapor features. Here, we present a spectrum of WISE 0855 in L band, from 3.4-4.14 micron. We present a set of atmosphere models that include a range of compositions (metallicities and C/O ratios) and water ice clouds. Methane absorption is clearly present in the spectrum. The mid-infrared color can be better matched with a methane abundance that is depleted relative to solar abundance. We find that there is evidence for water ice clouds in the M band spectrum, and we find a lack of phosphine spectral features in both the L and M band spectra. We suggest that a deep continuum opacity source may be obscuring the near-infrared flux, possibly a deep phosphorous-bearing cloud, ammonium dihyrogen phosphate. Observations of WISE 0855 provide critical constraints for cold planetary atmospheres, bridging the temperature range between the long-studied solar system planets and accessible exoplanets. JWST will soon revolutionize our understanding of cold brown dwarfs with high-precision spectroscopy across the infrared, allowing us to study their compositions and cloud properties, and to infer their atmospheric dynamics and formation processes.Comment: 19 pages, 21 figures. Accepted for publication in Ap

arXiv.org e-Print Archive

Crossref

Dordt College