44 research outputs found

    LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

    Full text link
    We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents

    Multiclass Classification of Policy Documents with Large Language Models

    Get PDF
    Classifying policy documents into policy issue topics has been a long-time effort in political science and communication disciplines. Efforts to automate text classification processes for social science research purposes have so far achieved remarkable results, but there is still a large room for progress. In this work, we test the prediction performance of an alternative strategy, which requires human involvement much less than full manual coding. We use the GPT 3.5 and GPT 4 models of the OpenAI, which are pre-trained instruction-tuned Large Language Models (LLM), to classify congressional bills and congressional hearings into Comparative Agendas Project's 21 major policy issue topics. We propose three use-case scenarios and estimate overall accuracies ranging from %58-83 depending on scenario and GPT model employed. The three scenarios aims at minimal, moderate, and major human interference, respectively. Overall, our results point towards the insufficiency of complete reliance on GPT with minimal human intervention, an increasing accuracy along with the human effort exerted, and a surprisingly high accuracy achieved in the most humanly demanding use-case. However, the superior use-case achieved the %83 accuracy on the %65 of the data in which the two models agreed, suggesting that a similar approach to ours can be relatively easily implemented and allow for mostly automated coding of a majority of a given dataset. This could free up resources allowing manual human coding of the remaining %35 of the data to achieve an overall higher level of accuracy while reducing costs significantly

    The prevalence of nutritional anemia in pregnancy in an east Anatolian province, Turkey

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Anemia is considered a severe public health problem by World Health Organization when anemia prevalence is equal to or greater than 40% in the population. The purpose of this study was to determine the anemia prevalence with the associated factors in pregnant women and to determine the serum iron, folate and B12 vitamin status in anaemic pregnants in Malatya province.</p> <p>Methods</p> <p>This is a cross-sectional survey. A multi-sage stratified probability-proportional-to-size cluster sampling methodology was used. A total of 823 pregnant women from sixty clusters were studied. Women were administered a questionnaire related with the subject and blood samples were drawn. Total blood count was performed within four hours and serum iron, folate and B12 vitamin were studied after storing sera at -20 C for six months.</p> <p>Results</p> <p>Anemia prevalence was 27.1% (Hb < 11.0 gr/dl). Having four or more living children (OR = 2.2), being at the third trimester (OR = 2.3) and having a low family income (OR = 1.6) were determined as the independent predictors of anemia in pregnancy. Anemia was also associated with soil eating (PICA) in the univariate analysis (p < 0.05). Of anaemic women, 50.0% had a transferrin saturation less than 10% indicating iron deficiency, 34.5% were deficient in B12 vitamin and 71.7% were deficient in folate. Most of the anemias were normocytic-normochromic (56.5%) indicating mixed anemia.</p> <p>Conclusions</p> <p>In Malatya, for pregnant women anemia was a moderate public health problem. Coexisting of iron, folate and B vitamin deficiencies was observed among anaemics. To continue anemia control strategies with reasonable care and diligence was recommended.</p

    Using graphs and random walks for discovering latent semantic relationships in text.

    Full text link
    We propose a graph-based representation of text collections where the nodes are textual units such as sentences or documents, and the edges represent the pairwise similarity function between these units. We show how random walks on such a graph can give us better approximations for the latent similarities between two natural language strings. We also derive algorithms based on random walk models to rank the nodes in a text similarity graph to address the text summarization problem in information retrieval. The similarity functions used in the graphs are intentionally chosen to be very simple and language-independent to make our methods as generic as possible, and to show that significant improvements can be achieved even by starting with such similarity functions. We put special emphasis on language modeling-based similarity functions since we use them for the first time on problems such as document clustering and classification, and get improved results compared to the classical similarity functions such as cosine. Our graph-based methods are applicable to a diverse set of problems including generic and focused summarization, document clustering, and text classification. The text summarization system we have developed has ranked as one of the top systems in Document Understanding Conferences over the past few years. In document clustering and classification, using language modeling functions performs consistently better than using the classical cosine measure reaching as high as 25% improvement in accuracy. Random walks on the similarity graph achieve additional significant improvements on top of this. We also revisit the nearest neighbor text classification methods and derive semi-supervised versions by using random walks that rival the state-of-the-art classification algorithms such as Suppor Vector Machines.Ph.D.Applied SciencesComputer scienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/126680/2/3276149.pd

    Investigation of sociodemographic and health characteristics of mothers in low birth weight newborns in Malatya city center

    No full text
    The prerequisite for a healthy life is to be born healty. Low birth weight (LBW) is an important risk factor for morbidity and mortality in early or late period of life. So reducing the incidence of low birth weight not only lowers infant mortality rates but also has multiple benefits over the life cycle.The purpose of this study is to find out whether the differences in terms of socio-demographic and health characteristics of newborns mothers with LBW than normal weight ones. This study covers to the mothers who gave birth in obstetrics and gynecology clinics of two hospitals during March and June 2010 in Malatya city center. It is a case-control study according to the time scheduled beeing a cross-sectional. The 45-item questionnaire surveyed to 350 mothers of newbors selected by random procedure was performed by the method of face-to-face interview. 123 newborns under 2500 grams as case group, 227 infants 2500 grams and over were taken as control. For evaluation of the data used by SPSS program, chi-square test for independent samples was performed in analysis. 95% confidence interval, and error level of p = 0.05 was chosen. 58.3% of the mothers\' ages ranged from 20 to 30 in the study included. 28.0% of the mothers\' education level was primary school or less, 28.9% had seen in the higher-level education. 57,4% of mothers were housewives and 26.9% of those had a monthly income 550 $ below. 85.1% of mothers have lived in urban area and 14.9% in the villages. At the end of this study; 5.4 times (95% CI = 2.2 [Med-Science 2013; 2(3.000): 665-78

    Atypical Metastasis to the Head and Neck Region: An Analysis of 11 Patients

    No full text
    Objective: We present 11 patients with distant metastases to the head and neck from an infraclavicularly located primary tumor and discuss the management strategies including the clinical presentation, treatment modalities, and prognosis

    Risk, profit, or safety: Sociotechnical systems under stress

    No full text
    Sociotechnical systems are designed to perform technical functions under organizational management for the benefit of society, but face major challenges in high risk operations such as mining. The mining industry in Turkey confronts a set of conflicting goals. Underground mining is a dangerous operation that creates continuing exposure to risk for miners who extract the coal. Yet, coal is an essential commodity for the growing Turkish economy, with mining operations now largely conducted by private companies seeking to maximize profit. Known strategies for managing mining operations to increase workers' safety exist and have been legally adopted in law and policy in Turkey, but require substantial investment of resources and time to put into practice. These same requirements in practice reduce profit to mining companies and slow production. The challenge is to balance these conflicting pressures in the mining industry to achieve low-cost energy for society, maintain safety for the miners, and ensure reasonable return on investment for mining companies. Achieving this balance in practice represents a classic collective action problem in which maximum benefit to the whole society can only be achieved by reasoned, informed action taken by multiple actors adapting to changing conditions under constraints of limited time and resources. These conflicting demands require a continual process of monitoring uncertain conditions, calibrating investment in safety in relation to cost of failure, and adapting to changing operating conditions in near-real time. We explore this set of conflicting pressures as a policy issue that confronts the mining industry globally, but inquire specifically into conditions that led to the deadly mine fire in Soma, Manisa, Turkey on May 13, 2014 as a study of a sociotechnical system under stress
    corecore