156 research outputs found

    WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

    Full text link
    Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the 99 largest language editions. The dataset contains yearly snapshots of the network and spans 1717 years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

    Development and Preliminary Validation of an Electromyography-Scoring Protocol for the Assessment and Grading of Muscle Involvement in Patients With Juvenile Idiopathic Inflammatory Myopathies.

    Get PDF
    Abstract Introduction We performed a pilot study in order to investigate the feasibility of an electromyography (EMG)-scoring protocol for the assessment of disease activity in juvenile idiopathic inflammatory myopathies (JIIM). Methods Children with JIIM followed up in a tertiary-level care center underwent standardized clinical, laboratory, and EMG assessment. An EMG-scoring protocol was devised by a consensus panel including a pediatric neurophysiologist and two pediatric rheumatologists, based on a combined score obtained as the sum of (1) the presence of denervation signs (fibrillation potentials) and (2) motor unit remodeling (mixed pattern of short- and long-duration motor unit action potentials). The EMG-scoring protocol was then validated following the Outcome Measures in Rheumatoid Arthritis Clinical Trials filter for outcome measures in rheumatology and the consensus-based standards for the selection of health measurement instruments methodology. Results Thirteen children (77% females) were included in the study, with a median age of 10 years (interquartile range: 7-17 years) and median disease duration of 11.8 months (interquartile range: 2.1-44.5). A total of 39 EMG examinations were evaluated. A strong positive association between a standardized tool for muscle strength assessment and the combined score was observed. No significant associations were found with both creatine kinase and erythrocyte sedimentation rate levels. Discussion Our EMG-scoring protocol is the first standardized and reproducible tool for the neurophysiologic evaluation and grading of muscle involvement in patients with JIIM and could provide relevant additional information in the assessment and follow-up of these rare conditions

    Cone-Beam Computed Tomographic Assessment of the Mandibular Condylar Volume in Different Skeletal Patterns: A Retrospective Study in Adult Patients

    Get PDF
    The aim of this study was to assess the condylar volume in adult patients with different skeletal classes and vertical patterns using cone‐beam computed tomography (CBCT). CBCT scans of 146 condyles from 73 patients (mean age 30   12 years old; 49 female, 24 male) were selected from the archive of the Department of Dentistry and Maxillofacial Surgery of Fondazione IRCCS Ca’ Granda, Milan, Italy, and retrospectively analyzed. The following inclusion criteria were used: adult patients; CBCT performed with the same protocol (0.4 mm slice thickness, 16   22 cm field of view, 20 s scan time); no systemic diseases; and no previous orthodontic treatments. Three‐dimensional cephalometric tracings were performed for each patient, the mandibular condyles were segmented and the relevant volumes calculated using Mimics Materialize 20.0  software (Materialise, Leuven, Belgium). Right and left variables were analyzed together using random‐intercept linear regression models. No significant association between condylar volumes and skeletal class was found. On the other hand, in relation to vertical patterns, the mean values of the mandibular condyle volumes in hyperdivergent subjects (688 mm3) with a post‐rotation growth pattern (625 mm3) were smaller than in hypodivergent patients (812 mm3) with a horizontal growth pattern (900 mm3). Patients with an increased divergence angle had smaller condylar volumes than subjects with normal or decreased mandibular plane divergence. This relationship may help the clinician when planning orthodontic treatment

    Geostatistical integration and uncertainty in pollutant concentration surface under preferential sampling

    Get PDF
    In this paper the focus is on environmental statistics, with the aim of estimating the concentration surface and related uncertainty of an air pollutant. We used air quality data recorded by a network of monitoring stations within a Bayesian framework to overcome difficulties in accounting for prediction uncertainty and to integrate information provided by deterministic models based on emissions meteorology and chemico-physical characteristics of the atmosphere. Several authors have proposed such integration, but all the proposed approaches rely on representativeness and completeness of existing air pollution monitoring networks. We considered the situation in which the spatial process of interest and the sampling locations are not independent. This is known in the literature as the preferential sampling problem, which if ignored in the analysis, can bias geostatistical inferences. We developed a Bayesian geostatistical model to account for preferential sampling with the main interest in statistical integration and uncertainty. We used PM10 data arising from the air quality network of the Environmental Protection Agency of Lombardy Region (Italy) and numerical outputs from the deterministic model. We specified an inhomogeneous Poisson process for the sampling locations intensities and a shared spatial random component model for the dependence between the spatial location of monitors and the pollution surface. We found greater predicted standard deviation differences in areas not properly covered by the air quality network. In conclusion, in this context inferences on prediction uncertainty may be misleading when geostatistical modelling does not take into account preferential sampling

    Phase I Metabolic Genes and Risk of Lung Cancer: Multiple Polymorphisms and mRNA Expression

    Get PDF
    Polymorphisms in genes coding for enzymes that activate tobacco lung carcinogens may generate inter-individual differences in lung cancer risk. Previous studies had limited sample sizes, poor exposure characterization, and a few single nucleotide polymorphisms (SNPs) tested in candidate genes. We analyzed 25 SNPs (some previously untested) in 2101 primary lung cancer cases and 2120 population controls from the Environment And Genetics in Lung cancer Etiology (EAGLE) study from six phase I metabolic genes, including cytochrome P450s, microsomal epoxide hydrolase, and myeloperoxidase. We evaluated the main genotype effects and genotype-smoking interactions in lung cancer risk overall and in the major histology subtypes. We tested the combined effect of multiple SNPs on lung cancer risk and on gene expression. Findings were prioritized based on significance thresholds and consistency across different analyses, and accounted for multiple testing and prior knowledge. Two haplotypes in EPHX1 were significantly associated with lung cancer risk in the overall population. In addition, CYP1B1 and CYP2A6 polymorphisms were inversely associated with adenocarcinoma and squamous cell carcinoma risk, respectively. Moreover, the association between CYP1A1 rs2606345 genotype and lung cancer was significantly modified by intensity of cigarette smoking, suggesting an underling dose-response mechanism. Finally, increasing number of variants at CYP1A1/A2 genes revealed significant protection in never smokers and risk in ever smokers. Results were supported by differential gene expression in non-tumor lung tissue samples with down-regulation of CYP1A1 in never smokers and up-regulation in smokers from CYP1A1/A2 SNPs. The significant haplotype associations emphasize that the effect of multiple SNPs may be important despite null single SNP-associations, and warrants consideration in genome-wide association studies (GWAS). Our findings emphasize the necessity of post-GWAS fine mapping and SNP functional assessment to further elucidate cancer risk associations

    Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival

    Get PDF
    Tobacco smoking is responsible for over 90% of lung cancer cases, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained obscure.We performed gene expression analysis using HG-U133A Affymetrix chips on 135 fresh frozen tissue samples of adenocarcinoma and paired noninvolved lung tissue from current, former and never smokers, with biochemically validated smoking information. ANOVA analysis adjusted for potential confounders, multiple testing procedure, Gene Set Enrichment Analysis, and GO-functional classification were conducted for gene selection. Results were confirmed in independent adenocarcinoma and non-tumor tissues from two studies. We identified a gene expression signature characteristic of smoking that includes cell cycle genes, particularly those involved in the mitotic spindle formation (e.g., NEK2, TTK, PRC1). Expression of these genes strongly differentiated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue (p<0.001 and fold-change >1.5, for each comparison), consistent with an important role for this pathway in lung carcinogenesis induced by smoking. These changes persisted many years after smoking cessation. NEK2 (p<0.001) and TTK (p = 0.002) expression in the noninvolved lung tissue was also associated with a 3-fold increased risk of mortality from lung adenocarcinoma in smokers.Our work provides insight into the smoking-related mechanisms of lung neoplasia, and shows that the very mitotic genes known to be involved in cancer development are induced by smoking and affect survival. These genes are candidate targets for chemoprevention and treatment of lung cancer in smokers

    Cancer incidence in the population exposed to dioxin after the "Seveso accident": twenty years of follow-up

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Seveso, Italy accident in 1976 caused the contamination of a large population by 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Possible long-term effects have been examined through mortality and cancer incidence studies. We have updated the cancer incidence study which now covers the period 1977-96.</p> <p>Methods</p> <p>The study population includes subjects resident at the time of the accident in three contaminated zones with decreasing TCDD soil levels (zone A, very high; zone B, high; zone R, low) and in a surrounding non-contaminated reference territory. Gender-, age-, and period-adjusted rate ratios (RR) and 95% confidence intervals (95% CI) were calculated by using Poisson regression for subjects aged 0-74 years.</p> <p>Results</p> <p>All cancer incidence did not differ from expectations in any of the contaminated zones. An excess of lymphatic and hematopoietic tissue neoplasms was observed in zones A (four cases; RR, 1.39; 95% CI, 0.52-3.71) and B (29 cases; RR, 1.56; 95% CI, 1.07-2.27) consistent with the findings of the concurrent mortality study. An increased risk of breast cancer was detected in zone A females after 15 years since the accident (five cases, RR, 2.57; 95% CI, 1.07-6.20). No cases of soft tissue sarcomas occurred in the most exposed zones (A and B, 1.17 expected). No cancer cases were observed among subjects diagnosed with chloracne early after the accident.</p> <p>Conclusion</p> <p>The extension of the Seveso cancer incidence study confirmed an excess risk of lymphatic and hematopoietic tissue neoplasms in the most exposed zones. No clear pattern by time since the accident and zones was evident partly because of the low number of cases. The elevated risk of breast cancer in zone A females after 15 years since the accident deserves further and thorough investigation. The follow-up is continuing in order to cover the long time period (even decades) usually elapsing from exposure to carcinogenic chemicals and disease occurrence.</p

    Environment And Genetics in Lung cancer Etiology (EAGLE) study: An integrative population-based case-control study of lung cancer

    Get PDF
    Background: Lung cancer is the leading cause of cancer mortality worldwide. Tobacco smoking is its primary cause, and yet the precise molecular alterations induced by smoking in lung tissue that lead to lung cancer and impact survival have remained obscure. A new framework of research is needed to address the challenges offered by this complex disease. Methods/Design: We designed a large population-based case-control study that combines a traditional molecular epidemiology design with a more integrative approach to investigate the dynamic process that begins with smoking initiation, proceeds through dependency/smoking persistence, continues with lung cancer development and ends with progression to disseminated disease or response to therapy and survival. The study allows the integration of data from multiple sources in the same subjects (risk factors, germline variation, genomic alterations in tumors, and clinical endpoints) to tackle the disease etiology from different angles. Before beginning the study, we conducted a phone survey and pilot investigations to identify the best approach to ensure an acceptable participation in the study from cases and controls. Between 2002 and 2005, we enrolled 2101 incident primary lung cancer cases and 2120 population controls, with 86.6% and 72.4% participation rate, respectively, from a catchment area including 216 municipalities in the Lombardy region of Italy. Lung cancer cases were enrolled in 13 hospitals and population controls were randomly sampled from the area to match the cases by age, gender and residence. Detailed epidemiological information and biospecimens were collected from each participant, and clinical data and tissue specimens from the cases. Collection of follow-up data on treatment and survival is ongoing. Discussion: EAGLE is a new population-based case-control study that explores the full spectrum of lung cancer etiology, from smoking addiction to lung cancer outcome, through examination of epidemiological, molecular, and clinical data. We have provided a detailed description of the study design, field activities, management, and opportunities for research following this integrative approach, which allows a sharper and more comprehensive vision of the complex nature of this disease. The study is poised to accelerate the emergence of new preventive and therapeutic strategies with potentially enormous impact on public health
    • 

    corecore