774 research outputs found
Semantic multimedia modelling & interpretation for annotation
The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems
Recommended from our members
Internet Filtering in China 2004-2005
China's Internet filtering regime is the most sophisticated effort of its kind in the world. Compared to similar efforts in other states, China's filtering regime is pervasive, sophisticated, and effective. It comprises multiple levels of legal regulation and technical control. It involves numerous state agencies and thousands of public and private personnel. It censors content transmitted through multiple methods, including Web pages, Web logs, on-line discussion forums, university bulletin board systems, and e-mail messages. Our testing found efforts to prevent access to a wide range of sensitive materials, from pornography to religious material to political dissent. We sought to determine the degree to which China filters sites on topics that the Chinese government finds sensitive, and found that the state does so extensively. Chinese citizens seeking access to Web sites containing content related to Taiwanese and Tibetan independence, Falun Gong, the Dalai Lama, the Tiananmen Square incident, opposition political parties, or a variety of anti-Communist movements will frequently find themselves blocked. Contrary to anecdote, we found that most major American media sites, such as CNN, MSNBC, and ABC, are generally available in China (though the BBC remains blocked). Moreover, most sites we tested in our global list's human rights and anonymizer categories are accessible as well. While it is difficult to describe this widespread filtering with precision, our research documents a system that imposes strong controls on its citizens' ability to view and to publish Internet content. This report was produced by the OpenNet Initiative, a partnership among the Advanced Network Research Group, Cambridge Security Programme at Cambridge University, the Citizen Lab at the Munk Centre for International Studies, University of Toronto, and the Berkman Center for Internet & Society at Harvard Law School
Recommended from our members
Hindi Complex Predicates: Linguistic and Computational Approaches
Complex predicates that comprise of a noun and verb e.g. yaad kar 'memory do; remember' are a productive class of multi-words in Hindi. In this thesis, we examine the challenges of identification and representation for these complex predicates in Hindi. We design and implement their representation in a lexical semantic resource as well as in lexicalized computational grammars. As productive multi-word predicates, their accurate identification is a necessity for natural language processing applications. We use a combination of linguistic and computational approaches to address these challenges. We use these methods to demonstrate the semi-automatic creation of subcategorization frames for Hindi and the development of classes for nominal predicates. Finally, we demonstrate how linguistic features and computational tools can be used in tandem to automatically identify complex predicates from unseen text
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
From Information Overload to Knowledge Graphs: An Automatic Information Process Model
Continuously increasing text data such as news, articles, and scientific papers from the Internet have caused the information overload problem. Collecting valuable information as well as coding the information efficiently from enormous amounts of unstructured textual information becomes a big challenge in the information explosion age. Although many solutions and methods have been developed to reduce information overload, such as the deduction of duplicated information, the adoption of personal information management strategies, and so on, most of the existing methods only partially solve the problem. What’s more, many existing solutions are out of date and not compatible with the rapid development of new modern technology techniques. Thus, an effective and efficient approach with new modern IT (Information Technology) techniques that can collect valuable information and extract high-quality information has become urgent and critical for many researchers in the information overload age. Based on the principles of Design Science Theory, the paper presents a novel approach to tackle information overload issues. The proposed solution is an automated information process model that employs advanced IT techniques such as web scraping, natural language processing, and knowledge graphs. The model can automatically process the full cycle of information flow, from information Search to information Collection, Information Extraction, and Information Visualization, making it a comprehensive and intelligent information process tool. The paper presents the model capability to gather critical information and convert unstructured text data into a structured data model with greater efficiency and effectiveness. In addition, the paper presents multiple use cases to validate the feasibility and practicality of the model. Furthermore, the paper also performed both quantitative and qualitative evaluation processes to assess its effectiveness. The results indicate that the proposed model significantly reduces the information overload and is valuable for both academic and real-world research
Detecting New, Informative Propositions in Social Media
The ever growing quantity of online text produced makes it increasingly challenging to find new important or useful information. This is especially so when topics of potential interest are not known a-priori, such as in “breaking news stories”. This thesis examines techniques for detecting the emergence of new, interesting information in Social Media. It sets the investigation in the context of a hypothetical knowledge discovery and acquisition system, and addresses two objectives. The first objective addressed is the detection of new topics. The second is filtering of non-informative text from Social Media. A rolling time-slicing approach is proposed for discovery, in which daily frequencies of nouns, named entities, and multiword expressions are compared to their expected daily frequencies, as estimated from previous days using a Poisson model. Trending features, those showing a significant surge in use, in Social Media are potentially interesting. Features that have not shown a similar recent surge in News are selected as indicative of new information. It is demonstrated that surges in nouns and news entities can be detected that predict corresponding surges in mainstream news. Co-occurring trending features are used to create clusters of potentially topic-related documents. Those formed from co-occurrences of named entities are shown to be the most topically coherent.
Machine learning based filtering models are proposed for finding informative text in Social Media. News/Non-News and Dialogue Act models are explored using the News annotated Redites corpus of Twitter messages. A simple 5-act Dialogue scheme, used to annotate a small sample thereof, is presented. For both News/Non-News and Informative/Non-Informative classification tasks, using non-lexical message features produces more discriminative and robust classification models than using message terms alone. The
combination of all investigated features yield the most accurate models
Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains
Siirretty Doriast
Recommended from our members
The Crisis of Language in Contemporary Japan: Reading, Writing, and New Technology
My dissertation is an ethnographically inspired theoretical exploration of the crises of reading and writing in contemporary Japan. Each of the five chapters examines concrete instances of reading and writing practices that have been problematized in recent decades. By calling attention to underlying moral assumptions, established sociocultural protocols, and socio-technological conditions of the everyday, I theorize the concept of embodied reading and writing thresholds. The scope of analysis is partly informed by popular discourse decrying a perceived decline in reading and writing proficiency among Japanese youth. This alleged failing literacy figures as a national crisis under the assumption that the futurity of children's national language proficiency metonymically correlates with the future well being of its national cultural body. In light of heightened interests in the past, present, and future of books, and a series of recent state interventions on the prospect of "national" text culture, it is my argument that ongoing tensions surrounding the changing media landscape and symbolic relations to the world do not merely reflect changes in styles of language, structures of spatiotemporal awareness, or forms of knowledge production. Rather, they indicate profound transformations and apprehensions among the lives mediated and embodied by the very system of signification that has come under scrutiny in the post-Lost Decade Japan (03/1991-01/2002). My dissertation offers an unique point of critical intervention into 1) various forms of tension arising from the overlapping media technologies and polarized population, 2) formations of reading and writing body (embodiment) at an intersection of heterogeneous elements and everyday disciplining, 3) culturally specific conditions and articulations of the effects of "universal" technologies, 4) prospects of "proper" national reading and writing culture, and 5) questions of cultural transformation and transmission. I hope that the diverse set of events explored in respective chapters provide, as a whole, a broader perspective of the institutional and technological background as well as an intimate understanding of culturally specific circumstances in Japan. Insofar as this is an attempt to conduct a nuanced inquiry into the culturally specific configurations and articulations of a global phenomenon, each ethnographic moment is carefully contextualized to reflect Japan specific conditions while avoiding the pitfall of culturalist assumptions. Understanding how an existing system of representation, technological imperatives and sociohistorical predicaments have coalesced to form a unique constellation is the first step in identifying how the practice of reading and writing becomes a site of heated national debate in Japan. Against theories that problematize the de-corporealizing effects of digital technology within reading and writing, I emphasize the material specificity of contemporary reading and writing practices
- …