65 research outputs found
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We
compared standard CoT and CCoT prompts to see how conciseness impacts response
length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4
with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced
average response length by 48.70% for both GPT-3.5 and GPT-4 while having a
negligible impact on problem-solving performance. However, on math problems,
GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads
to an average per-token cost reduction of 22.67%. These results have practical
implications for AI systems engineers using LLMs to solve real-world problems
with CoT prompt-engineering techniques. In addition, these results provide more
general insight for AI researchers studying the emergent behavior of
step-by-step reasoning in LLMs.Comment: All code, data, and supplemental materials are available on GitHub at
https://github.com/matthewrenze/jhu-concise-co
Correction for Johansson et al., An open challenge to advance probabilistic forecasting for dengue epidemics.
Correction for “An open challenge to advance probabilistic forecasting for dengue epidemics,” by Michael A. Johansson, Karyn M. Apfeldorf, Scott Dobson, Jason Devita, Anna L. Buczak, Benjamin Baugher, Linda J. Moniz, Thomas Bagley, Steven M. Babin, Erhan Guven, Teresa K. Yamana, Jeffrey Shaman, Terry Moschou, Nick Lothian, Aaron Lane, Grant Osborne, Gao Jiang, Logan C. Brooks, David C. Farrow, Sangwon Hyun, Ryan J. Tibshirani, Roni Rosenfeld, Justin Lessler, Nicholas G. Reich, Derek A. T. Cummings, Stephen A. Lauer, Sean M. Moore, Hannah E. Clapham, Rachel Lowe, Trevor C. Bailey, Markel GarcĂa-DĂez, Marilia Sá Carvalho, Xavier RodĂł, Tridip Sardar, Richard Paul, Evan L. Ray, Krzysztof Sakrejda, Alexandria C. Brown, Xi Meng, Osonde Osoba, Raffaele Vardavas, David Manheim, Melinda Moore, Dhananjai M. Rao, Travis C. Porco, Sarah Ackley, Fengchen Liu, Lee Worden, Matteo Convertino, Yang Liu, Abraham Reddy, Eloy Ortiz, Jorge Rivero, Humberto Brito, Alicia Juarrero, Leah R. Johnson, Robert B. Gramacy, Jeremy M. Cohen, Erin A. Mordecai, Courtney C. Murdock, Jason R. Rohr, Sadie J. Ryan, Anna M. Stewart-Ibarra, Daniel P. Weikel, Antarpreet Jutla, Rakibul Khan, Marissa Poultney, Rita R. Colwell, Brenda Rivera-GarcĂa, Christopher M. Barker, Jesse E. Bell, Matthew Biggerstaff, David Swerdlow, Luis Mier-y-Teran-Romero, Brett M. Forshey, Juli Trtanj, Jason Asher, Matt Clay, Harold S. Margolis, Andrew M. Hebbeler, Dylan George, and Jean-Paul Chretien, which was first published November 11, 2019; 10.1073/pnas.1909865116. The authors note that the affiliation for Xavier RodĂł should instead appear as Catalan Institution for Research and Advanced Studies (ICREA) and Climate and Health Program, Barcelona Institute for Global Health (ISGlobal). The corrected author and affiliation lines appear below. The online version has been corrected
An open challenge to advance probabilistic forecasting for dengue epidemics.
A wide range of research has promised new tools for forecasting infectious disease dynamics, but little of that research is currently being applied in practice, because tools do not address key public health needs, do not produce probabilistic forecasts, have not been evaluated on external data, or do not provide sufficient forecast skill to be useful. We developed an open collaborative forecasting challenge to assess probabilistic forecasts for seasonal epidemics of dengue, a major global public health problem. Sixteen teams used a variety of methods and data to generate forecasts for 3 epidemiological targets (peak incidence, the week of the peak, and total incidence) over 8 dengue seasons in Iquitos, Peru and San Juan, Puerto Rico. Forecast skill was highly variable across teams and targets. While numerous forecasts showed high skill for midseason situational awareness, early season skill was low, and skill was generally lowest for high incidence seasons, those for which forecasts would be most valuable. A comparison of modeling approaches revealed that average forecast skill was lower for models including biologically meaningful data and mechanisms and that both multimodel and multiteam ensemble forecasts consistently outperformed individual model forecasts. Leveraging these insights, data, and the forecasting framework will be critical to improve forecast skill and the application of forecasts in real time for epidemic preparedness and response. Moreover, key components of this project-integration with public health needs, a common forecasting framework, shared and standardized data, and open participation-can help advance infectious disease forecasting beyond dengue
Robust Classification of Emotion in Human Speech Using Spectrogram Features
The recognition of emotions, such as anger, anxiety, joy, etc . from tonal variations in human speech is an important task for research and applications in human computer interaction. The objective of this research is to design, implement and test a Speech Emotion Classification (SEC) engine that can extract useful features and accurately classify emotions in human speech in the presence of speaker-dependent characteristics variations and noise. Current approaches extract several standard global values from the temporal sequence of power spectra, such as pitch, formants, energy, and values from the time signal, such as attack and decay rates. In this work, the frequency dimension of the spectrogram is quantized to simulate the Bark scale in the human audition system, the time dimension of the spectrogram is quantized in units starting from 50 ms, and the linear regression coefficients of the surface of each spectrogram segment are combined into a feature vector. In this way, complete local features are extracted to establish a larger sample. The accumulated feature vectors for each category of emotion provide a robust training basis for a state of the art classifier, such as an SVM. In order to further improve the performance of the SEC engine and to demonstrate the flexibility and benefit of local features, a backward context scheme is introduced. A series of experiments have been designed and conducted using the EMO-DB and LDC-DB speech emotion databases to measure the performance of the SEC engine. First, the accuracy and the precision of the performance are measured in terms of seven to fifteen emotion categories when trained on the speech utterances by random sampling. Next, the generalization performance is measured through a speaker cross-validation scheme. Third, the generalization and robust performance of the SEC engine is measured by performing gender, language and speaker classification with the SEC engine, hence measuring the discrimination power of the engine related to the speaker characteristics variations. Finally, the robust performance of the SEC engine is measured when the SNR is varied between 10 and 50 dB
Amniotic Fluid Ischemia Modified Albumin as a Novel Prenatal Diagnostic Marker for Down Syndrome: A Prospective Case-Control Study
Aims: There is no study in the literature about ischemia-modified albumin (IMA) and hepatocyte growth factor (HGF) levels in amniotic fluid for Down syndrome cases. The aim of this study was to investigate the changes of IMA and HGF in Down syndrome cases at 16-20 weeks of gestation compared to normal fetuses.Methods: For this prospective case-control study, following reaching the number of 20 women (study group) who had the prenatal diagnosis of Down syndrome, maternal and gestational age-matched pregnant women with normal constitutional karyotype were selected for the control group (n = 74) from the stored amniotic fluid samples. Results: Mean women and gestational ages were comparable between the two groups. Amniotic fluid IMA (1.32 ± 0.13 vs. 1.11 ± 0.11 ABSU, respectively, p < 0.001) and HGF (2743.53 ± 1389.28 vs. 2160.12 ± 654.63 pg/mL, respectively, p = 0.008). Levels were significantly higher in pregnant women having Down syndrome fetuses compared with women having normal fetuses. The amniotic fluid IMA levels for the diagnosis of Down syndrome, and the sensitivity and specificity were calculated as 95.0% and 71.6% for the limit value 1.171 cm3, respectively. Conclusion: In cases with suspected Down syndrome, the diagnosis of Down Syndrome may be made in approximately 1 hour with high sensitivity and specificity by measuring the IMA level in the amniotic fluid sample taken for fetal karyotyping
Analytic Biosurveillance Methods for Resource-Limited Settings
The authors describe the challenges of disease surveillance in settings lacking infrastructure and access to medical care. They address the role of analytic methods and evaluate open-source temporal alerting algorithms chosen for the Suite for Automated Global Electronic bioSurveillance (SAGES), collection of modular, freely-available software tools to enable electronic surveillance in these settings. An algorithm test-bed is described and used to compare algorithm alerting performance for both daily and weekly data streams. Multiple detection performance measures are defined, and a practical means of combining them is applied to recommend preferred alerting methods for common scenarios
Analytic Biosurveillance Methods for Resource-Limited Settings
The authors describe the challenges of disease surveillance in settings lacking infrastructure and access to medical care. They address the role of analytic methods and evaluate open-source temporal alerting algorithms chosen for the Suite for Automated Global Electronic bioSurveillance (SAGES), collection of modular, freely-available software tools to enable electronic surveillance in these settings. An algorithm test-bed is described and used to compare algorithm alerting performance for both daily and weekly data streams. Multiple detection performance measures are defined, and a practical means of combining them is applied to recommend preferred alerting methods for common scenarios
- …