Search CORE

9 research outputs found

Divide-or-Conquer? Which Part Should You Distill Your LLM?

Author: Bai He
Gu Jiatao
Jaitly Navdeep
Vydiswaran VG Vinod
Wu Zhuofeng
Zhang Aonan
Zhang Yizhe
Publication venue
Publication date: 22/02/2024
Field of study

Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation

arXiv.org e-Print Archive

Using social media to assess the consumer nutrition environment: comparing Yelp reviews with a direct observation audit instrument for grocery stores

Author: Berrocal Veronica J
Clarke Philippa
Gomez-Lopez Iris N
Goodspeed Robert
Hill Alex B
Romero Daniel M
Shen Ying
Veinot Tiffany C
Vydiswaran VG Vinod
Publication venue: DigitalCommons@WayneState
Publication date: 08/11/2018
Field of study

Objective To examine the feasibility of using social media to assess the consumer nutrition environment by comparing sentiment expressed in Yelp reviews with information obtained from a direct observation audit instrument for grocery stores. Design Trained raters used the Nutrition Environment Measures Survey in Stores (NEMS-S) in 100 grocery stores from July 2015 to March 2016. Yelp reviews were available for sixty-nine of these stores and were retrieved in February 2017 using the Yelp Application Program Interface. A sentiment analysis was conducted to quantify the perceptions of the consumer nutrition environment in the review text. Pearson correlation coefficients (ρ) were used to compare NEMS-S scores with Yelp review text on food availability, quality, price and shopping experience. Setting Detroit, Michigan, USA. Participants None. Results Yelp reviews contained more comments about food availability and the overall shopping experience than food price and food quality. Negative sentiment about food prices in Yelp review text and the number of dollar signs on Yelp were positively correlated with observed food prices in stores (ρ=0·413 and 0·462, respectively). Stores with greater food availability were rated as more expensive on Yelp. Other aspects of the food store environment (e.g. overall quality and shopping experience) were captured only in Yelp. Conclusions While Yelp cannot replace in-person audits for collecting detailed information on the availability, quality and cost of specific food items, Yelp holds promise as a cost-effective means to gather information on the overall cost, quality and experience of food stores, which may be relevant for nutrition outcomes

Digital Commons@Wayne State University

Uncovering the relationship between food-related discussion on Twitter and neighborhood characteristics.

Author: Vydiswaran VG Vinod,
Publication venue
Publication date: 22/06/2023
Field of study

Ezid

Recommended from our members

Dementia and electronic health record phenotypes: a scoping review of available phenotypes and opportunities for future research.

Author: Bennett Antonia V
Pevnick Joshua
Ritchie Christine S
Vydiswaran VG Vinod
Walling Anne M
Publication venue: eScholarship, University of California
Publication date: 01/06/2023
Field of study

ObjectiveWe performed a scoping review of algorithms using electronic health record (EHR) data to identify patients with Alzheimer's disease and related dementias (ADRD), to advance their use in research and clinical care.Materials and methodsStarting with a previous scoping review of EHR phenotypes, we performed a cumulative update (April 2020 through March 1, 2023) using Pubmed, PheKB, and expert review with exclusive focus on ADRD identification. We included algorithms using EHR data alone or in combination with non-EHR data and characterized whether they identified patients at high risk of or with a current diagnosis of ADRD.ResultsFor our cumulative focused update, we reviewed 271 titles meeting our search criteria, 49 abstracts, and 26 full text papers. We identified 8 articles from the original systematic review, 8 from our new search, and 4 recommended by an expert. We identified 20 papers describing 19 unique EHR phenotypes for ADRD: 7 algorithms identifying patients with diagnosed dementia and 12 algorithms identifying patients at high risk of dementia that prioritize sensitivity over specificity. Reference standards range from only using other EHR data to in-person cognitive screening.ConclusionA variety of EHR-based phenotypes are available for use in identifying populations with or at high-risk of developing ADRD. This review provides comparative detail to aid in choosing the best algorithm for research, clinical care, and population health projects based on the use case and available data. Future research may further improve the design and use of algorithms by considering EHR data provenance

eScholarship - University of California

Using Social Media to Identify Sources of Healthy Food in Urban Neighborhoods.

Author: Berrocal Veronica J
Clarke Philippa
Gomez-Lopez Iris N
Goodspeed Robert
Hill Alex B
Romero Daniel M
Veinot Tiffany C
Vinod Vydiswaran VG
Publication venue: eScholarship, University of California
Publication date: 28/04/2017
Field of study

An established body of research has used secondary data sources (such as proprietary business databases) to demonstrate the importance of the neighborhood food environment for multiple health outcomes. However, documenting food availability using secondary sources in low-income urban neighborhoods can be particularly challenging since small businesses play a crucial role in food availability. These small businesses are typically underrepresented in national databases, which rely on secondary sources to develop data for marketing purposes. Using social media and other crowdsourced data to account for these smaller businesses holds promise, but the quality of these data remains unknown. This paper compares the quality of full-line grocery store information from Yelp, a crowdsourced content service, to a "ground truth" data set (Detroit Food Map) and a commercially-available dataset (Reference USA) for the greater Detroit area. Results suggest that Yelp is more accurate than Reference USA in identifying healthy food stores in urban areas. Researchers investigating the relationship between the nutrition environment and health may consider Yelp as a reliable and valid source for identifying sources of healthy food in urban environments

Crossref

eScholarship - University of California

Recommended from our members

Using social media to assess the consumer nutrition environment: comparing Yelp reviews with a direct observation audit instrument for grocery stores.

Author: Berrocal Veronica J
Clarke Philippa
Gomez-Lopez Iris N
Goodspeed Robert
Hill Alex B
Romero Daniel M
Shen Ying
Veinot Tiffany C
Vydiswaran Vg Vinod
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

ObjectiveTo examine the feasibility of using social media to assess the consumer nutrition environment by comparing sentiment expressed in Yelp reviews with information obtained from a direct observation audit instrument for grocery stores.DesignTrained raters used the Nutrition Environment Measures Survey in Stores (NEMS-S) in 100 grocery stores from July 2015 to March 2016. Yelp reviews were available for sixty-nine of these stores and were retrieved in February 2017 using the Yelp Application Program Interface. A sentiment analysis was conducted to quantify the perceptions of the consumer nutrition environment in the review text. Pearson correlation coefficients (ρ) were used to compare NEMS-S scores with Yelp review text on food availability, quality, price and shopping experience.SettingDetroit, Michigan, USA.ParticipantsNone.ResultsYelp reviews contained more comments about food availability and the overall shopping experience than food price and food quality. Negative sentiment about food prices in Yelp review text and the number of dollar signs on Yelp were positively correlated with observed food prices in stores (ρ=0·413 and 0·462, respectively). Stores with greater food availability were rated as more expensive on Yelp. Other aspects of the food store environment (e.g. overall quality and shopping experience) were captured only in Yelp.ConclusionsWhile Yelp cannot replace in-person audits for collecting detailed information on the availability, quality and cost of specific food items, Yelp holds promise as a cost-effective means to gather information on the overall cost, quality and experience of food stores, which may be relevant for nutrition outcomes

eScholarship - University of California

Using social media to assess the consumer nutrition environment: comparing Yelp reviews with a direct observation audit instrument for grocery stores

Author: Alex B Hill
Daniel M Romero
Harrison
Honeycutt
Iris N Gomez-Lopez
Morland
Philippa Clarke
Robert Goodspeed
Tiffany C Veinot
Veinot
Veronica J Berrocal
VG Vinod Vydiswaran
Ying Shen
Young
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Recommended from our members

Uncovering the relationship between food-related discussion on Twitter and neighborhood characteristics

Author: Baylin Ana
Berrocal Veronica J
Clarke Philippa
Gomez-Lopez Iris
Goodspeed Robert
Iott Bradley E
Jansen Erica C
Lu Jin Xiu
Romero Daniel M
Veinot Tiffany C
Vydiswaran VG Vinod
Yu Deahan
Zhao Xinyan
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

ObjectiveInitiatives to reduce neighborhood-based health disparities require access to meaningful, timely, and local information regarding health behavior and its determinants. We examined the validity of Twitter as a source of information for neighborhood-level analysis of dietary choices and attitudes.Materials and methodsWe analyzed the "healthiness" quotient and sentiment in food-related tweets at the census tract level, and associated them with neighborhood characteristics and health outcomes. We analyzed keywords driving the differences in food healthiness between the most and least-affluent tracts, and qualitatively analyzed contents of a random sample of tweets.ResultsSignificant, albeit weak, correlations existed between healthiness and sentiment in food-related tweets and tract-level measures of affluence, disadvantage, race, age, U.S. density, and mortality from conditions associated with obesity. Analyses of keywords driving the differences in food healthiness revealed foods high in saturated fat (eg, pizza, bacon, fries) were mentioned more frequently in less-affluent tracts. Food-related discussion referred to activities (eating, drinking, cooking), locations where food was consumed, and positive (affection, cravings, enjoyment) and negative attitudes (dislike, personal struggles, complaints).DiscussionTweet-based healthiness scores largely correlated with offline phenomena in the expected directions. Social media offer less resource-intensive data collection methods than traditional surveys do. Twitter may assist in informing local health programs that focus on drivers of food consumption and could inform interventions focused on attitudes and the food environment.ConclusionsTwitter provided weak but significant signals concerning food-related behavior and attitudes at the neighborhood level, suggesting its potential usefulness for informing local health disparity reduction efforts

eScholarship - University of California

Assessing the readability of ClinicalTrials.gov

Author: Aberdeen
Adnan
Adnan
Agarwal
Angelis
Atkinson
Baker
Chall
Dale
Danny TY Wu
David A Hanauer
Davis
D’Alessandro
Garner
Grossman
Grundner
Gunning
Hanauer
Hopper
Joshua Proulx
Jüni
Kai Zheng
Keselman
Kevyn Collins-Thompson
Kim
Kincaid
Langford
Lawrence C An
Ley
McLaughlin
Meade
Paasche-Orlow
Patricia M Clark
Qiaozhu Mei
Qing T Zeng
Risoldi Cochrane
Rosemblat
Smith
Tarnowski
Terblanche
Tian
VG Vinod Vydiswaran
Wu
Yin
Zeng-Treitler
Zeng-Treitler
Zheng
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/08/2015
Field of study

Objective ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. Materials and Methods The analysis included all 165 988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100 000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. Results Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. Discussion and Conclusion Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov’s goal of facilitating information dissemination and subject recruitment

Crossref

PubMed Central

eScholarship - University of California