38 research outputs found

    Segmenting Chinese Unknown Words by Heuristic Method

    Get PDF
    Abstract. Chinese text segmentation is important in Chinese text indexing. Due to the lack of word delimiters in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, the segmentation ambiguities and the occurrences of out-of-vocabulary words (i.e. unknown words) are the major challenges in Chinese segmentation. Many research works dealing with the problem of word segmentation have focused on the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. In this paper, we propose a heuristic method for Chinese test segmentation based on the statistical approach. The experimental result shows that our proposed heuristic method is promising to segment the unknown words as well as the known words. We have further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with our previous proposed technique, boundary detection

    Intelligent cities? Disentangling the symbolic and material effects of technopole planning practices in Cyberjaya, Malaysia.

    Get PDF
    Cyberjaya was heralded in the mid-1990s as the Multimedia Super Corridor's (MSG) flagship 'intelligent city' and designed to prepare Malaysia and its citizens for a giant leap forward into an imagined new 'information age'. The urban mega-project constituted a state led response to the much hyped 'Siliconisation of Asia' and was planned to fast-track national development through investment in information and communications technologies (ICTs). The thesis seeks to examine how the discursive architectures of the 'information society' were mobilised, by whom, and with what material consequences as technopole planning practices were inscribed on the Malaysian landscape. Ten years on from the excessive high-tech utopianism and urban boosterism that accompanied the city's launch, the thesis promotes qualitative methodologies to examine the critical human geographies of the MSG. Specifically, empirical analysis addresses the uneven socio-spatial consequences and 'splintering urbanisms' manifesting in Malaysia's emerging spaces of neoliberal modernity. Research methodologies included in-depth interviews with political and business elites in Malaysia, participant observation with residents and workers in Cyberjaya, and a critical discourse analysis of the MSG policy and promotional materials. To this end, the thesis seeks to disentangle the symbolic and material effects of technopole planning practices in Cyberjaya

    Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse

    Get PDF
    This publication is the second annual report of the Internet Monitor project at the Berkman Center for Internet & Society at Harvard University. As with the inaugural report, this year's edition is a collaborative effort of the extended Berkman community. Internet Monitor 2014: Reflections on the Digital World includes nearly three dozen contributions from friends and colleagues around the world that highlight and discuss some of the most compelling events and trends in the digitally networked environment over the past year.The result, intended for a general interest audience, brings together reflection and analysis on a broad range of issues and regions—from an examination of Europe's "right to be forgotten" to a review of the current state of mobile security to an exploration of a new wave of movements attempting to counter hate speech online—and offers it up for debate and discussion. Our goal remains not to provide a definitive assessment of the "state of the Internet" but rather to provide a rich compendium of commentary on the year's developments with respect to the online space.Last year's report examined the dynamics of Internet controls and online activity through the actions of government, corporations, and civil society. We focus this year on the interplay between technological platforms and policy; growing tensions between protecting personal privacy and using big data for social good; the implications of digital communications tools for public discourse and collective action; and current debates around the future of Internet governance.The report reflects the diversity of ideas and input the Internet Monitor project seeks to invite. Some of the contributions are descriptive; others prescriptive. Some contain purely factual observations; others offer personal opinion. In addition to those in traditional essay format, contributions this year include a speculative fiction story exploring what our increasingly data-driven world might bring, a selection of "visual thinking" illustrations that accompany a number of essays, a "Year in Review" timeline that highlights many of the year's most fascinating Internet-related news stories (and an interactive version of which is available at thenetmonitor.org), and a slightly tongue-in-cheek "By the Numbers" section that offers a look at the year's important digital statistics. We believe that each contribution offers insights, and hope they provoke further reflection, conversation, and debate in both offline and online settings around the globe

    A natural language based indexing technique for Chinese information retrieval.

    Get PDF
    Pang Chun Kiu.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 101-107).Chapter 1 --- Introduction --- p.2Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6Chapter 1.2 --- Objectives --- p.8Chapter 1.3 --- An Overview of the Thesis --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10Chapter 2.2 --- Related Work --- p.13Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13Chapter 2.2.2 --- Syntactical approaches --- p.15Chapter 2.2.3 --- Semantic approaches --- p.17Chapter 2.2.4 --- Noun Phrases Approach --- p.18Chapter 2.2.5 --- Chinese Information Retrieval --- p.20Chapter 2.3 --- Our Approach --- p.21Chapter 3 --- Chinese Noun Phrases --- p.23Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23Chapter 3.2 --- Ambiguous noun phrases --- p.27Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28Chapter 3.2.3 --- Statistical data on the three NPs --- p.33Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35Chapter 4.1 --- Word Segmentation --- p.36Chapter 4.2 --- Part-of-speech tagging --- p.37Chapter 4.3 --- Noun Phrase Extraction --- p.37Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38Chapter 4.5 --- Handling Parsing Ambiguity --- p.40Chapter 4.6 --- Index Building Strategy --- p.41Chapter 4.7 --- The cross-set generation rules --- p.44Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48Chapter 4.10 --- Experimental results and Discussion --- p.49Chapter 5 --- Indexing Compound Nouns --- p.52Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55Chapter 5.2.1 --- About the thesaurusă€ŠćŒçŸ©è©žè©žæž—ă€‹ --- p.56Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58Chapter 5.4 --- Corpus learning approach --- p.59Chapter 5.4.1 --- An Example --- p.60Chapter 5.4.2 --- Experimental Setup --- p.63Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66Chapter 5.5 --- Contextual Approach --- p.68Chapter 5.5.1 --- The algorithm --- p.69Chapter 5.5.2 --- An Illustrative Example --- p.71Chapter 5.5.3 --- Experiments on compound nouns --- p.72Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76Chapter 5.5.7 --- The Final Algorithm --- p.79Chapter 5.5.8 --- Experiments on other compounds --- p.82Chapter 5.5.9 --- Discussion --- p.83Chapter 6 --- Overall Effectiveness --- p.85Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86Chapter 6.2 --- Experimental Setup --- p.90Chapter 6.3 --- Experimental Results & Discussion --- p.91Chapter 7 --- Conclusion --- p.95Chapter 7.1 --- Summary --- p.95Chapter 7.2 --- Contributions --- p.97Chapter 7.3 --- Future Directions --- p.98Chapter 7.3.1 --- Word-sense determination --- p.98Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99Chapter A --- Cross-set Generation Rules --- p.108Chapter B --- Tag set by Tsinghua University --- p.110Chapter C --- Noun Phrases Test Set --- p.113Chapter D --- Compound Nouns Test Set --- p.124Chapter D.l --- Three-term Compound Nouns --- p.125Chapter D.1.1 --- NVN --- p.125Chapter D.1.2 --- Other three-term compound nouns --- p.129Chapter D.2 --- Four-term Compound Nouns --- p.133Chapter D.3 --- Five-term and six-term Compound Nouns --- p.13

    The World We Want to Live In

    Get PDF
    Digitalisation, digital networks, and artificial intelligence are fundamentally changing our lives! We must understand the various developments and assess how they interact and how they affect our regular, analogue lives. What are the consequences of such changes for me personally and for our society? Digital networks and artificial intelligence are seminal innovations that are going to permeate all areas of society and trigger a comprehensive, disruptive structural change that will evoke numerous new advances in research and development in the coming years. Even though there are numerous books on this subject matter, most of them cover only specific aspects of the profound and multifaceted effects of the digital transformation. An overarching assessment is missing. In 2016, the Federation of German Scientists (VDW) has founded a study group to assess the technological impacts of digitalisation holistically. Now we present this compendium to you. We address the interrelations and feedbacks of digital innovation on policy, law, economics, science, and society from various scientific perspectives. Please consider this book as an invitation to contemplate with other people and with us, what kind of world we want to live in

    Reputational Privacy and the Internet: A Matter for Law?

    Get PDF
    Reputation - we all have one. We do not completely comprehend its workings and are mostly unaware of its import until it is gone. When we lose it, our traditional laws of defamation, privacy, and breach of confidence rarely deliver the vindication and respite we seek due, primarily, to legal systems that cobble new media methods of personal injury onto pre-Internet laws. This dissertation conducts an exploratory study of the relevance of law to loss of individual reputation perpetuated on the Internet. It deals with three interrelated concepts: reputation, privacy, and memory. They are related in that the increasing lack of privacy involved in our online activities has had particularly powerful reputational effects, heightened by the Internet’s duplicative memory. The study is framed within three research questions: 1) how well do existing legal mechanisms address loss of reputation and informational privacy in the new media environment; 2) can new legal or extra-legal solutions fill any gaps; and 3) how is the role of law pertaining to reputation affected by the human-computer interoperability emerging as the Internet of Things? Through a review of international and domestic legislation, case law, and policy initiatives, this dissertation explores the extent of control held by the individual over her reputational privacy. Two emerging regulatory models are studied for improvements they offer over current legal responses: the European Union’s General Data Protection Regulation, and American Do Not Track policies. Underscoring this inquiry are the challenges posed by the Internet’s unique architecture and the fact that the trove of references to reputation in international treaties is not making its way into domestic jurisprudence or daily life. This dissertation examines whether online communications might be developing a new form of digital speech requiring new legal responses and new gradients of personal harm; it also proposes extra-legal solutions to the paradox that our reputational needs demand an overt sociality while our desire for privacy has us shunning the limelight. As we embark on the Web 3.0 era of human-machine interoperability and the Internet of Things, our expectations of the role of law become increasingly important
    corecore