2,933 research outputs found

    Natural Language Processing and e-Government: Crime Information Extraction from Heterogeneous Data Sources

    Get PDF
    Much information that could help solve and prevent crimes is never gathered because the reporting methods available to citizens and law enforcement personnel are not optimal. Detectives do not have sufficient time to interview crime victims and witnesses. Moreover, many victims and witnesses are too scared or embarrassed to report incidents. We are developing an interviewing system that will help collect such information. We report here on one component, the crime information extraction module, which uses natural language processing to extract crime information from police reports, newspaper articles, and victims’ and witnesses’ crime narratives. We tested our approach with two types of document: police and witness narrative reports. Our algorithms extract crime-related information, namely weapons, vehicles, time, people, clothes, and locations. We achieved high precision (96%) and recall (83%) for police narrative reports and comparable precision (93%) but somewhat lower recall (77%) for witness narrative reports. The difference in recall was significant at p \u3c .05. We then used a spell checker to evaluate if this would help with witness narrative processing. We found that both precision (94 %) and recall (79%) improved slightly

    Seeking asylum in the digital era:social-media and mobile-device vetting in asylum procedures in five European countries

    Get PDF
    The increasing use of social media and mobile devices by asylum seekers offers new vetting opportunities for immigration authorities, to verify the identity or to assess national-security or 1F-exclusion aspects. Based on interviews with practitioners in Belgium, Germany, the Netherlands, Norway and Sweden, the first experiences with both of these new methods seem to be mixed, while formal evaluations of the results seem to be lacking. We argue that the increasing reliance on these methods, in combination with the further advancement of technology, raises important questions about possible infringements on the right to private life, as well as the risk of function creep and social sorting. It can be questioned to what extent the use of these new vetting tools and methods is proportional to the results they produce and to what extent fundamental human rights, including privacy, are sufficiently safeguarded

    Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives

    Get PDF
    The version of record can be found at http://www.euppublishing.com/doi/10.3366/ijhac.2016.0161.Contemporary and future historians need to grapple with and confront the challenges posed by web archives. These large collections of material, accessed either through the Internet Archive's Wayback Machine or through other computational methods, represent both a challenge and an opportunity to historians. Through these collections, we have the potential to access the voices of millions of non-elite individuals (recognizing of course the cleavages in both Web access as well as method of access). To put this in perspective, the Old Bailey Online currently describes its monumental holdings of 197,745 trials between 1674 and 1913 as the "largest body of texts detailing the lives of non-elite people ever published." GeoCities.com, a platform for everyday web publishing in the mid-to-late 1990s and early 2000s, amounted to over thirty-eight million individual webpages. Historians will have access, in some form, to millions of pages: written by everyday people of various classes, genders, ethnicities, and ages. While the Web was not a perfect democracy by any means – it was and is unevenly accessed across each of those categories – this still represents a massive collection of non-elite speech. Yet a figure like thirty-eight million webpages is both a blessing and a curse. We cannot read every website, and must instead rely upon discovery tools to find the information that we need. Yet these tools largely do not exist for web archives, or are in a very early state of development: what will they look like? What information do historians want to access? We cannot simply map over web tools optimized for discovering current information through online searches or metadata analysis. We need to find information that mattered at the time, to diverse and very large communities. Furthermore, web pages cannot be viewed in isolation, outside of the networks that they inhabited. In theory, amongst corpuses of millions of pages, researchers can find whatever they want to confirm. The trick is situating it into a larger social and cultural context: is it representative? Unique? In this paper, "Lost in the Infinite Archive," I explore what the future of digital methods for historians will be when they need to explore web archives. Historical research of periods beginning in the mid-1990s will need to use web archives, and right now we are not ready. This article draws on first-hand research with the Internet Archive and Archive-It web archiving teams. It draws upon three exhaustive datasets: the large Web ARChive (WARC) files that make up Wide Web Scrapes of the Web; the metadata-intensive WAT files that provide networked contextual information; and the lifted-straight-from-the-web guerilla archives generated by groups like Archive Team. Through these case studies, we can see – hands-on – what richness and potentials lie in these new cultural records, and what approaches we may need to adopt. It helps underscore the need to have humanists involved at this early, crucial stage.Social Sciences and Humanities Research Council || 430-2013-0616 Ontario Early Researcher Awar

    Happenstance: Utilizing Semantic Search to Track Russian State Media Narratives about the Russo-Ukrainian War On Reddit

    Full text link
    In the buildup to and in the weeks following the Russian Federation's invasion of Ukraine, Russian state media outlets output torrents of misleading and outright false information. In this work, we study this coordinated information campaign in order to understand the most prominent state media narratives touted by the Russian government to English-speaking audiences. To do this, we first perform sentence-level topic analysis using the large-language model MPNet on articles published by ten different pro-Russian propaganda websites including the new Russian "fact-checking" website waronfakes.com. Within this ecosystem, we show that smaller websites like katehon.com were highly effective at publishing topics that were later echoed by other Russian sites. After analyzing this set of Russian information narratives, we then analyze their correspondence with narratives and topics of discussion on the r/Russia and 10 other political subreddits. Using MPNet and a semantic search algorithm, we map these subreddits' comments to the set of topics extracted from our set of Russian websites, finding that 39.6% of r/Russia comments corresponded to narratives from pro-Russian propaganda websites compared to 8.86% on r/politics.Comment: Accepted to ICWSM 202

    Restorative and Transformative Justice Responses to Sexual Violence

    Get PDF
    Background: #MeToo movement raised the profiles of restorative justice (RJ) and transformative justice (TJ) in the United States (US) as approaches to repairing harm resulting from sexual violence that center survivors’ needs and emphasize meaningful accountability for persons responsible for harm. This focus on RJ and TJ as viable approaches to sexual violence represents a departure from carceral interventions, which has dominated the US public discourse for decades. Given the shift, mapping the current state of knowledge is critical for practice, policy and research. This scoping review aims to map the available literature to provide an overview of RJ and TJ as responses to sexual violence. Methods/Design: The proposed scoping review will be conducted in accordance with the Joana Briggs Institute methodology for scoping reviews (Peters, Godfrey-Smith, & Mcinerney, 2017). The concept of interest is the use of RJ and TJ as responses to sexual violence. This scoping review will include both peer-reviewed and grey literature. We will employ a standardized extraction form and represent the data using a descriptive summary, charts and tables that align with the stated objectives. Discussion: Since the #MeToo movement emerged in 2017, public interest in RJ and TJ as meaningful responses to sexual violence has grown. This comprehensive scoping review will systematically organize the literature in order to understand the current landscape of evidence related to these approaches. Given the transformative potential of these interventions, past controversies, and current public interest in the approaches, understanding the current state of knowledge is critical for practice, policy and research

    Audiencing Strategies and Student Collaboration in Digitally-mediated Genres of Writing in English

    Get PDF
    This thesis presents an investigation into the experience of ESL Higher Education young writers when composing three online genres: academic text, diary texts, and blog texts. Central to this investigation is the authenticity of audience and directing texts to ‘real’ readers. Hence, technological tools are utilised in order to approximate such experience of writing for real readers. A qualitative case study was employed over three months of an academic semester at an Omani Higher Education College. Two cases participated in the study of overall 17 students across both cases: 5 males and 12 females and 10 students in case 1 and 7 students in case 2. To attain an in-depth understanding of the cases; different tools of data collection were deployed, including: interviews, classroom observation, reflective diary for recording student perceptions and experiences, and three forms of written texts were collected from the participating students: academic essay, diary, and blog. Thus the reflective diary was both a genre of writing and a data collection method. The study findings highlight that having only a teacher as an ‘audience’ restricted students’ attempts to focus on content, and most of this focus was given to shaping texts in accordance with student perceptions of teacher approved organisation and representation of text. Whereas blogging provided an opportunity to think of a wider range of readers and therefore a greater tendency to author personally selected texts. Also, diary was mostly associated with teacher-audience; though some writers enjoyed writing diary for personal use, the fact that these diary texts vary in accordance with these different understandings of audience offers further credence to claims about the role of real and assumed readers in shaping texts. The significance of the current study is that it offers practical and pedagogical thinking for teaching writing in ESL exploiting the affordances of technology in teaching process writing. It suggests that varying both audience and genres in relation to classroom writing tasks can have benefits for student writers in terms of their understanding of audience, their shaping of text for an audience and increased investment in the content of what they write. It offers insights into problems and issues felt by young writers that are usually unknown to the teachers. Based on those insights, differing issues such as collaboration, process writing and grading are re-evaluated.Ministry of Higher Education (Oman
    • …
    corecore