Search CORE

5,418 research outputs found

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

DESIGN WITH EMOTION: IMPROVING WEB SEARCH EXPERIENCE FOR OLDER ADULTS

Author: Abegaz Tamirat
Publication venue: Clemson University Libraries
Publication date: 01/12/2014
Field of study

Research indicates that older adults search for information all together about 15% less than younger adults prior to making decisions. Prior research findings associated such behavior mainly with age-related cognitive difficulties. However, recent studies indicate that emotion is linked to influence search decision quality. This research approaches questions about why older adults search less and how this search behavior could be improved. The research is motivated by the broader issues of older users\u27 search behavior, while focusing on the emotional usability of search engine user interfaces. Therefore, this research attempts to accomplish the following three objectives: a) to explore the usage of low level design elements as emotion manipulation tools b) to seamlessly integrate these design elements into currently existing search engine interfaces, and finally c) to evaluate the impact of emotional design elements on search performance and user satisfaction. To achieve these objectives, two usability studies were conducted. The aim of the first study was to explore emotion induction capabilities of colors, shapes, and combination of both. The study was required to determine if the proposed design elements have strong mood induction capabilities. The results demonstrated that low level design elements such as color and shape have high visceral effects that could be used as potentially viable alternatives to induce the emotional states of users without the users having knowledge of their presence. The purpose of the second study was to evaluate alternative search engine user interfaces, derived from this research, for search thoroughness and user preference. In general, search based performance variables showed that participants searched more thoroughly using interface types that integrate angular shape features. In addition, user preference variables also indicated that participants seemed to enjoy search tasks using search engine interfaces that used color/shape combinations. Overall, the results indicated that seamless integration of low level emotional design elements into currently existing search engine interfaces could potentially improve web search experience

Clemson University: TigerPrints

Thumbs up? Sentiment Classification using Machine Learning Techniques

Author: Lee Lillian
Pang Bo
Vaithyanathan Shivakumar
Publication venue
Publication date: 01/01/2002
Field of study

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200

arXiv.org e-Print Archive

CiteSeerX

Automatically extracting news articles from the Internet

Author: Jasselette Arnaud
Vanderwhale Mathieu
Publication venue
Publication date: 01/01/2005
Field of study

Repository of the University of Namur

Tag disambiguation based on social network information

Author: Qasim Syed Sumair
Publication venue
Publication date: 23/09/2011
Field of study

Within 20 years the Web has grown from a tool for scientists at CERN into a global information space. While returning to its roots as a read/write tool, its entering a more social and participatory phase. Hence a new, improved version called the Social Web where users are responsible for generating and sharing content on the global information space, they are also accountable for replicating the information. This collaborative activity can be observed in two of the most widely practised Social Web services such as social network sites and social tagging systems. Users annotate their interests and inclinations with free form keywords while they share them with their social connections. Although these keywords (tag) assist information organization and retrieval, theysuffer from polysemy.In this study we employ the effectiveness of social network sites to address the issue of ambiguity in social tagging. Moreover, we also propose that homophily in social network sites can be a useful aspect is disambiguating tags. We have extracted the ‘Likes’ of 20 Facebook users and employ them in disambiguation tags on Flickr. Classifiers are generated on the retrieved clusters from Flickr using K-Nearest-Neighbour algorithm and then their degree of similarity is calculated with user keywords. As tag disambiguation techniques lack gold standards for evaluation, we asked the users to indicate the contexts and used them as ground truth while examining the results. We analyse the performance of our approach by quantitative methods and report successful results. Our proposed method is able classify images with an accuracy of 6 out of 10 (on average). Qualitative analysis reveal some factors that affect the findings, and if addressed can produce more precise results

Southampton (e-Prints Soton)

Multimodal integration of disparate information sources with attribution

Author
Publication venue: Sloan School of Management, Massachusetts Institute of Technology
Publication date: 01/01/1997
Field of study

Cover title.Includes bibliographical references (p. [9]-[10]).Thomas Y. Lee & Stephane Bressan

DSpace@MIT

Spatially Aware Computing for Natural Interaction

Author: Roudaki Amin
Publication venue: North Dakota State University
Publication date: 01/01/2013
Field of study

Spatial information refers to the location of an object in a physical or digital world. Besides, it also includes the relative position of an object related to other objects around it. In this dissertation, three systems are designed and developed. All of them apply spatial information in different fields. The ultimate goal is to increase the user friendliness and efficiency in those applications by utilizing spatial information. The first system is a novel Web page data extraction application, which takes advantage of 2D spatial information to discover structured records from a Web page. The extracted information is useful to re-organize the layout of a Web page to fit mobile browsing. The second application utilizes the 3D spatial information of a mobile device within a large paper-based workspace to implement interactive paper that combines the merits of paper documents and mobile devices. This application can overlay digital information on top of a paper document based on the location of a mobile device within a workspace. The third application further integrates 3D space information with sound detection to realize an automatic camera management system. This application automatically controls multiple cameras in a conference room, and creates an engaging video by intelligently switching camera shots among meeting participants based on their activities. Evaluations have been made on all three applications, and the results are promising. In summary, this dissertation comprehensively explores the usage of spatial information in various applications to improve the usability

NDSU Libraries Institutional Repository

A Multidisciplinary Approach to the Reuse of Open Learning Resources

Author: FRESCHI Sergio
Publication venue: Faculty of Engineering and Information Technologies, School of Electrical and Information Engineering
Publication date: 01/01/2008
Field of study

Educational standards are having a significant impact on e-Learning. They allow for better exchange of information among different organizations and institutions. They simplify reusing and repurposing learning materials. They give teachers the possibility of personalizing them according to the student’s background and learning speed. Thanks to these standards, off-the-shelf content can be adapted to a particular student cohort’s context and learning needs. The same course content can be presented in different languages. Overall, all the parties involved in the learning-teaching process (students, teachers and institutions) can benefit from these standards and so online education can be improved. To materialize the benefits of standards, learning resources should be structured according to these standards. Unfortunately, there is the problem that a large number of existing e-Learning materials lack the intrinsic logical structure required, and further, when they have the structure, they are not encoded as required. These problems make it virtually impossible to share these materials. This thesis addresses the following research question: How to make the best use of existing open learning resources available on the Internet by taking advantage of educational standards and specifications and thus improving content reusability?In order to answer this question, I combine different technologies, techniques and standards that make the sharing of publicly available learning resources possible in innovative ways. I developed and implemented a three-stage tool to tackle the above problem. By applying information extraction techniques and open e-Learning standards to legacy learning resources the tool has proven to improve content reusability. In so doing, it contributes to the understanding of how these technologies can be used in real scenarios and shows how online education can benefit from them. In particular, three main components were created which enable the conversion process from unstructured educational content into a standard compliant form in a systematic and automatic way. An increasing number of repositories with educational resources are available, including Wikiversity and the Massachusetts Institute of Technology OpenCourseware. Wikivesity is an open repository containing over 6,000 learning resources in several disciplines and for all age groups [1]. I used the OpenCourseWare repository to evaluate the effectiveness of my software components and ideas. The results show that it is possible to create standard compliant learning objects from the publicly available web pages, improving their searchability, interoperability and reusability

Sydney eScholarship

Proceedings of the International Workshop on Text Mining Research, Practice and Opportunities

Author
Publication venue
Publication date: 24/09/2005
Field of study

The University of Manchester - Institutional Repository