35 research outputs found
Towards Coding Social Science Datasets with Language Models
Researchers often rely on humans to code (label, annotate, etc.) large sets
of texts. This kind of human coding forms an important part of social science
research, yet the coding process is both resource intensive and highly variable
from application to application. In some cases, efforts to automate this
process have achieved human-level accuracies, but to achieve this, these
attempts frequently rely on thousands of hand-labeled training examples, which
makes them inapplicable to small-scale research studies and costly for large
ones. Recent advances in a specific kind of artificial intelligence tool -
language models (LMs) - provide a solution to this problem. Work in computer
science makes it clear that LMs are able to classify text, without the cost (in
financial terms and human effort) of alternative methods. To demonstrate the
possibilities of LMs in this area of political science, we use GPT-3, one of
the most advanced LMs, as a synthetic coder and compare it to human coders. We
find that GPT-3 can match the performance of typical human coders and offers
benefits over other machine learning methods of coding text. We find this
across a variety of domains using very different coding procedures. This
provides exciting evidence that language models can serve as a critical advance
in the coding of open-ended texts in a variety of applications
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels
Pre-trained language models derive substantial linguistic and factual
knowledge from the massive corpora on which they are trained, and prompt
engineering seeks to align these models to specific tasks. Unfortunately,
existing prompt engineering methods require significant amounts of labeled
data, access to model parameters, or both. We introduce a new method for
selecting prompt templates \textit{without labeled examples} and
\textit{without direct access to the model}. Specifically, over a set of
candidate templates, we choose the template that maximizes the mutual
information between the input and the corresponding model output. Across 8
datasets representing 7 distinct NLP tasks, we show that when a template has
high mutual information, it also has high accuracy on the task. On the largest
model, selecting prompts with our method gets 90\% of the way from the average
prompt accuracy to the best prompt accuracy and requires no ground truth
labels
Invasive Mold Infections in Pediatric Cancer Patients Reflect Heterogeneity in Etiology, Presentation, and Outcome: A 10-Year, Single-Institution, Retrospective Study
Background. There is scarcity of data regarding invasive mold infections (IMIs) in children with cancer.
Methods. We retrospectively identified patients (18 years old or younger) with malignant disease who developed proven or probable IMIs (European Organization for Research on the Treatment of Cancer/Mycoses Study Group criteria) during a 10-year period (1998-2008). We reviewed their risk factors and clinical characteristics and assessed their crude mortality rates and treatment outcomes 12 weeks after IMI diagnosis.
Results. Forty-eight patients (30 males) were identified, 30 (63%) of whom had a proven IMI. The most prevalent mold were Aspergillus species (40%), followed by Mucorales (20%) and Fusarium species (11%). Acute leukemia was the most common underlying malignancy (39 patients, [81%]). Twenty-three (59%) of them had refractory leukemia. Neutropenia was present at the day of IMI diagnosis in 67% of the patients. Sixty-two percent of the patients received prior corticosteroids. The dominant site of infection was the lungs (79%), followed by skin (29%) and sinuses (10%). Seventy-one percent of patients had radiological findings suggestive of fungal pneumonia (either nodules or masses). The mainstay of antifungal therapy was a lipid formulation of amphotericin B. Antifungal therapy resulted in 54% response rate (33% complete) at 12 weeks. The crude 12-week mortality rate was 31%. Logistic regression analysis demonstrated that monocytopenia (P = .013), malnutrition (P = .012), and intensive care admission in the month prior to IMI diagnosis (P = .027) were risk factors for death within 12 weeks.
Conclusions. Although Aspergillus spp. was the most common mold in our pediatric cancer population, the epidemiology of the IMIs was diverse. Adults and children share similar risk factors for and epidemiology of IMIs
Recommended from our members
Clinical next generation sequencing of pediatric-type malignancies in adult patients identifies novel somatic aberrations
Pediatric malignancies in adults, in contrast to the same diseases in children are clinically more aggressive, resistant to chemotherapeutics, and carry a higher risk of relapse. Molecular profiling of tumor sample using next generation sequencing (NGS) has recently become clinically available. We report the results of targeted exome sequencing of six adult patients with pediatric-type malignancies : Wilms tumor(n=2), medulloblastoma(n=2), Ewing's sarcoma( n=1) and desmoplastic small round cell tumor (n=1) with a median age of 28.8 years. Detection of druggable somatic aberrations in tumors is feasible. However, identification of actionable target therapies in these rare adult patients with pediatric-type malignancies is challenging. Continuous efforts to establish a rare disease registry are warranted
Emotions in Polish and Lithuanian Social Media
Teams of trained, native coders in Poland and Lithuania independently annotated 3,659 Polish and 1,946 Lithuanian Facebook posts (in-language and in-country) including all multimedia content but not the comments. These data were sampled from a larger dataset pulled from specific Polish and Lithuanian sociopolitical entities from 2015-2020. These annotations were on a 0 (none) to 100 (most frequent/intense) for each 23 emotions (and Positive and Negative Other) for each post. They were also rated for the personal reactions using the same emotion scheme for each post, but that data are not shared here. The annotators also coded each post for media type (e.g., text and what type vs. not, image vs. not), language, and primary and secondary topic, with topic coded using an adapted and expanded version of the Comparative Agenda’s Project’s scheme. These independent annotations went through a consensus process, and only the consensus numbers are deposited here. More detail about the sampling, consensus process, and other methodological details are available via emailing [email protected] for draft papers. This corpus also includes the number of Facebook shares and different Facebook reactions along with other useful information (e.g., name of account, link to post).We applied modern psychology theory of emotions and cross-cultural psychology methods to a range of issues surrounding emotions and social media. We developed an annotation guide for three languages and identified 365 Polish and 188 Lithuanian sociopolitical entities, and we developed a consensus annotated corpus for over 3,000 Polish and over 1,500 Lithuanian Facebook posts for emotional content, primary topic, post shares, and more. This corpus represents data we intend to have as sharable that was used in papers we hope to publish. More detail can be gained by reading the methodology description and by contacting the study PI, Susannah Paletz, at [email protected] of Naval Research / Minerva Research Initiative Grant number N00014-19-1-2506; Program Manager Dr. Rebecca Goolsb