10 research outputs found
SWEGRAM : A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
We present SWEGRAM, a web-based tool for the automatic linguistic annotation and quantitative analysis of Swedish text, enabling researchers in the humanities and social sciences to annotate their own text and produce statistics on linguistic and other text-related features on the basis of this annotation. The tool allows users to upload one or several documents, which are automatically fed into a pipeline of tools for tokenization and sentence segmentation, spell checking, part-of-speech tagging and morpho-syntactic analysis as well as dependency parsing for syntactic annotation of sentences. The analyzer provides statistics on the number of tokens, words and sentences, the number of parts of speech (PoS), readability measures, the average length of various units, and frequency lists of tokens, lemmas, PoS, and spelling errors. SWEGRAM allows users to create their own corpus or compare texts on various linguistic levels.SWE-CLARI
SWEGRAM : A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
We present SWEGRAM, a web-based tool for the automatic linguistic annotation and quantitative analysis of Swedish text, enabling researchers in the humanities and social sciences to annotate their own text and produce statistics on linguistic and other text-related features on the basis of this annotation. The tool allows users to upload one or several documents, which are automatically fed into a pipeline of tools for tokenization and sentence segmentation, spell checking, part-of-speech tagging and morpho-syntactic analysis as well as dependency parsing for syntactic annotation of sentences. The analyzer provides statistics on the number of tokens, words and sentences, the number of parts of speech (PoS), readability measures, the average length of various units, and frequency lists of tokens, lemmas, PoS, and spelling errors. SWEGRAM allows users to create their own corpus or compare texts on various linguistic levels.SWE-CLARI
SWEGRAM: Annotering och analys av svenska texter
Dokumentet syftar till att beskriva verktyget swegram med vars hjälp du kan genomföra automatisk annotering och lingvistisk analys av svenska och engelska texter eller skapa din egen, lingvistiskt annoterade textsamling, en så kallad korpus. Vi presenterar verktygets beståndsdelar och ger förslag på hur man kan genomföra storskalig, empirisk språklig analys med hjälp av verktyget. swe-clari
SWEGRAM: Annotering och analys av svenska texter
Dokumentet syftar till att beskriva verktyget swegram med vars hjälp du kan genomföra automatisk annotering och lingvistisk analys av svenska och engelska texter eller skapa din egen, lingvistiskt annoterade textsamling, en så kallad korpus. Vi presenterar verktygets beståndsdelar och ger förslag på hur man kan genomföra storskalig, empirisk språklig analys med hjälp av verktyget. swe-clari
SWEGRAM : A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
We present SWEGRAM, a web-based tool for the automatic linguistic annotation and quantitative analysis of Swedish text, enabling researchers in the humanities and social sciences to annotate their own text and produce statistics on linguistic and other text-related features on the basis of this annotation. The tool allows users to upload one or several documents, which are automatically fed into a pipeline of tools for tokenization and sentence segmentation, spell checking, part-of-speech tagging and morpho-syntactic analysis as well as dependency parsing for syntactic annotation of sentences. The analyzer provides statistics on the number of tokens, words and sentences, the number of parts of speech (PoS), readability measures, the average length of various units, and frequency lists of tokens, lemmas, PoS, and spelling errors. SWEGRAM allows users to create their own corpus or compare texts on various linguistic levels.SWE-CLARI
Recommended from our members
Prevalence of human papillomavirus (HPV) in oesophageal squamous cell carcinoma in relation to anatomical site of the tumour.
BackgroundThe prevalence and role of human papillomavirus (HPV) in the aetiology of oesophageal squamous cell carcinoma is uncertain. Based on the presence of HPV in the oral cavity and its causal association with squamous cell carcinoma of the oropharynx, we hypothesised that HPV is more strongly associated with proximal than distal oesophageal squamous cell carcinoma.MethodsA population-based study comparing HPV infection in relation to tumour site in patients diagnosed with oesophageal squamous cell carcinomas in the Stockholm County in 1999-2006. Multiplex polymerase chain reaction genotyping (PCR) with Luminex was conducted on pre-treatment endoscopic biopsies to identify type specify HPV. Carcinogenic activity of HPV was assessed by p16(INK4a) expression. Multivariable logistic regression was used to calculate odds ratios and 95% confidence intervals.ResultsAmong 204 patients, 20 (10%) had tumours harbouring HPV DNA, almost all (90%) of HPV high-risk type, mainly HPV16. Tumours containing HPV were not overrepresented in the upper compared to the middle or lower third of the oesophagus (odds ratio 0.6, 95% confidence interval 0.2-1.9). P16(INK4a) expression was similarly common (24% and 16%) in the HPV-positive and HPV-negative groups.ConclusionThis study found a limited presence of HPV in oesophageal squamous cell carcinoma of uncertain oncogenic relevance and did not demonstrate that HPV was more strongly associated with proximal than distal tumours
Swe-Clarin : Language Resources and Technology for Digital Humanities
CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an eresearch tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council. In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes. While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities
Risk of different sites for oesophageal squamous cell carcinoma when exposed to HPV, expressed as odds ratios (OR) with 95% confidence intervals (CI).
*<p>No adjustments made.</p><p># Adjustments made for sex, age and tumour differentiation.</p
Characteristics of HPV positive versus HPV negative participants.
*<p>p16 analysis was conducted on 130 (64%) out of 204 patients.</p><p>Percentages not adding to 100% are due to missing data.</p
Differences between included and excluded patients.
*<p>Nonparticipants include low DNA level (n = 51, 18%) and unable to collect the endoscopic biopsy (n = 26, 9%). Excluded participants include tumour misclassification (n = 3, 1%), tumour detected at autopsy (n = 11, 3%) and unavailable endoscopic material (n = 53, 15%).</p><p>#Tumour location was similar in the participants and non-participant/excluded groups (p = 0.113, Fisheŕs exact test) except for more missing in the non-participant/excluded group p<0.001, Fisheŕs exact test).</p