7 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Investigation of new natural coagulant - cationic hemicellulose associated with cationic tannin - for coagulation/dissolved air flotation (C/DAF) in the treatment of industrial effluent

    No full text
    The use of plant-based coagulants (natural coagulants) in wastewater treatments has potential advantages over the inorganic coagulants used commercially. This study evaluated organic coagulants cationic hemicelluloses (CH) synthesized from peanut shell and associated with commercial cationic tannin (TSG) for use as the primary coagulation/flocculation treatment, followed by solid-liquid separation via sedimentation/flotation by dissolved air (DAF). The assay was carried out in a jar test on effluent from a multinational industry in the grain processing sector, located in the city of Uberlândia-MG. Coagulation diagrams were determined using the data spatial interpolation method of the Kringing regression model and the Tukey test was used to assess the difference in the results obtained. The optimum removal points of turbidity removal efficiency (TRE), greater than 98%, were achieved for the TSG/CH association with 200 mg L-1 (pH 10.72), 350 mg L-1 (pH 9.72), 500 mg L-1 (pH 9.56) in sedimentation. For the separation by DAF, the association of TSG/CH resulted in TRE values greater than 95% at dosages of 350 mg L-1 (pH 9.59) and 500 mg L-1 (pH 7.92). Furthermore, the results indicate that the associated use of TSG/CH, a coagulation aid, favored the coupling of the DAF bubble-particle, resulting in a smaller volume of sludge. In addition, CH expanded the action of TSG to the basic region

    Prevalence and Associated Factors of Alcohol Consumption and Smoking among Medical Students in Northeastern Brazil

    No full text
    <p></p><p>ABSTRACT Introduction Tobacco and alcohol consumption is considered a major cause of diseases and disorders in the world. In Brazil, there has been increased consumption of these drugs among young people, especially university students. Objective To discover the prevalence of and factors associated to smoking and alcohol consumption among medical students, as well as their level of knowledge about techniques to stop smoking at different times of their academic life. Methods Analytical study of prevalence among medical students in Fortaleza, Ceará, Brazil. The study sample included all the city’s medical schools and their first year (S1/S2) and fourth year (S7/S8) students and students in the final year of their internship (I3/I4). The sample was calculated considering an expected smoker frequency of 10%, with a 3% margin of error, estimating 726 students in the four institutions. A structured questionnaire containing 46 questions was applied. Data were analyzed using Stata 11.2 software. Results 1,035 students were interviewed, distributed proportionally in the three periods: 392 (37.87%) from the first year (S1 / S2), 319 (30.82%) from the fourth year (S7/S8) and 324 (31.30%) interns (I3/I4). 553 students (53.4%) were female; most of the students were single (993; 96.3%), born in Fortaleza (748; 72.4%), living with their parents (896; 86.8%) and with a household income of more than 10 minimum wages (652; 61.8%). In total, 533 (51.5%) were students at private institutions. Of the total, 254 (24.6%) had smoked. This consumption was significantly higher among males (p = 0.025), with no difference in relation to marital status (p = 0.247) or household income (p = 0.191). All the students who reported having experienced any tobacco derivative also reported using alcohol in their lifetime (p < 0.000). Alcohol consumption was reported by more than 80% of the students, and was higher among those whose family income was more than nine times the minimum wage (p = 0.001). Alcoholic intoxication was reported by over 70% of the students – where this had occurred before the age of 18 years. Beer and vodka are the most consumed beverages. Only 39.5% said they were inclined to advise a patient to avoid alcoholic beverages and only 28.4% had received training on the subject at their university. Conclusion The prevalence of alcohol consumption is very high among medical students, especially among those who reported smoking. These issues are addressed in a primitive manner in their training. We must strengthen these aspects in the training of future health professionals.</p><p></p

    Evaluation of the Relationships between Simple Anthropometric Measures and Bioelectrical Impedance Assessment Variables with Multivariate Linear Regression Models to Estimate Body Composition and Fat Distribution in Adults: Preliminary Results

    No full text
    Background: Overweight and obesity are conditions associated with sedentary lifestyle and accumulation of abdominal fat, determining increased mortality, favoring chronic diseases, and increasing cardiovascular risk. Although the evaluation of body composition and fat distribution are highly relevant, the high cost of the gold standard techniques limits their wide utilization. Therefore, the aim of this work was to explore the relationships between simple anthropometric measures and BIA variables using multivariate linear regression models to estimate body composition and fat distribution in adults. Methods: In this cross-sectional study, sixty-eight adult individuals (20 males and 48 females) were subjected to bioelectrical impedance analysis (BIA), anthropometric measurements (waist circumference (WC), neck circumference (NC), mid-arm circumference (MAC)), allowing the calculation of conicity index (C-index), fat mass/fat-free mass (FM/FFM) ratios, body mass index (BMI) and body shape index (ABSI). Statistical analyzes were performed with the R program. Nonparametric Statistical tests were applied to compare the characteristics of participants of the groups (normal weight, overweight and obese). For qualitative variables, the Fisher&rsquo;s exact test was applied, and for quantitative variables, the paired Wilcoxon signed-rank test. To evaluate the linear association between each pair of variables, the Pearson correlation coefficient was calculated, and Multivariate linear regression models were adjusted using the stepwise variable selection method, with Akaike Information Criterion (p &le; 0.05). Results: BIA variables with the highest correlations with anthropometric measures were total body water (TBW), body fat percentage (BFP), FM, FFM and FM/FFM. The multiple linear regression analysis showed, in general, that the same variables can be estimated through simple anthropometric measures. Conclusions: The assessment of fat distribution in the body is desirable for the diagnosis and definition of obesity severity. However, the high cost of the instruments (dual energy X-ray absorptiometry, hydrostatic weighing, air displacement plethysmography, computed tomography, magnetic resonance) to assess it, favors the use of BMI in the clinical practice. Nevertheless, BMI does not represent a real fat distribution and body fat percentage. This highlights the relevance of the findings of the current study, since simple anthropometric variables can be used to estimate important BIA variables that are related to fat distribution and body composition

    NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics

    No full text
    Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press
    corecore