18 research outputs found

    Advances in Information Retrieval

    Full text link

    Automated retrieval and extraction of training course information from unstructured web pages

    Get PDF
    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance

    Lexical innovation on the web and social media

    Get PDF
    This dissertation investigates the emergence and diffusion of English neologisms on the web and social media, employing a data-driven methodology to identify a substantial sample of 851 neologisms. Neologisms are examined from their coining to successful dissemination within the community, with the study revealing a wide spectrum of degrees of diffusion. The exploration extends to studying the usage and diffusion of selected neologisms on the web and on Twitter, with a particular focus on social dynamics and variation among different speaker groups. Moreover, the dissertation probes into semantic innovation, demonstrating substantial socio-semantic variation and polarized public discourse surrounding certain neologisms. The research conducts an extensive analysis of semantic innovation and socio-semantic variation, elucidating significant socio-semantic discrepancies between various communities. The dissertation sheds light on the social and semantic dynamics underpinning the life cycle of neologisms within a linguistically diverse community

    学習モチベーションの社会構成主義アプローチ~ソーシャル・ネットワーク・サービスにおける多様ピアメッセージの推薦システム

    Get PDF
    Contemporary learning theories and their implementations associated with information and communication technologies increasingly integrate social constructivist approaches in order to assist and facilitate the construction of knowledge. Social constructivism also highlights the important role of culture, learning attitude and behavior in the cognitive process. Modern e-learning systems need to include these psychological aspects in addition to knowledge construction in order to connect with long-standing pedagogical issues such as the decrease and lack of motivation for education. Motivation is a central part of educational psychology and plays an important role in Computer-Supported Collaborative Learning (CSCL) environment. A prominent factor of motivation consists in the strong connection between pedagogical goals and purposes for learning because learners want to know the reasons why learning is important for them, to make it more meaningful. However, although pedagogical institutions provide structured curricula with specific outcomes, students are often unable to relate to these goals as they have various conceptual perceptions and learning purposes. This issue has even more consequence in informal and self-regulated learning environments where learners must monitor their own actions, motivation, and goals. Contemporary CSCL applications need therefore to integrate a larger social presence in order to provide more diverse purposes for achieving a shared goal. Current social networking services (SNS) provide a platform where peers can for instance express their passion, emotion and motivation towards learning. This research utilizes therefore this platform to recommend motivational contents from peers for learning motivation enhancement (i.e. learners’ perception of their goal and purpose for learning). The proposed system consists of an SNS platform for learners to 1) express and evaluate their own goals for learning, 2) observe diverse motivational messages expressed by peers who share a same goal and recommended by an LDA-based (Latent Dirichlet Allocation) model, and 3) evaluate their perceptions on motivational attributes after each observation. This platform initially requires a database of messages from peers publicly expressing on SNS their own purposes for learning various subjects. This part of the research focuses on collecting and analyzing messages from Twitter to determine linguistic features used to construct the meaning of expressing diverse learning purposes. The recommender system was implemented as a Web-based application using SNS environment to conduct an experiment over a semester, with students who could observe purposes expressed by other peers. Results compared evaluations from 77 students on motivational attributes before and after observing diverse or similar purposes from peers. Participants who observed diverse purposes significantly and positive enhanced their motivational perceptions, such as on goal specificity, attainability and on the confidence to achieve the desired outcome.電気通信大学201

    Learning Methods and Algorithms for Semantic Text Classification across Multiple Domains

    Get PDF
    Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one

    Data and Text Mining Techniques for In-Domain and Cross-Domain Applications

    Get PDF
    In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated

    Gathering Momentum: Evaluation of a Mobile Learning Initiative

    Get PDF

    Comparative study of two LMS applications

    Get PDF
    El objetivo de este trabajo ha sido realizar un análisis comparativo entre el LMS de la empresa Maat (la cual fue seleccionada como plataforma E-Learning para el proyecto EELA 2) y un LMS desarrollado bajo software libre (específicamente Moodle). Es de anotar, que en la actualidad el E-Learning se ha consolidado en una estrategia para crear una nueva tecnología del aprendizaje, por ello, gran cantidad de instituciones educativas están haciendo uso de LMS para apoyar sus procesos educativos. No obstante, la mayoría de las veces éstos, se eligen sin una investigación seria previa. El estudio realizado aborda aspectos tales como pruebas de stress, Herramientas del Profesor, Herramientas del Alumno, Especificaciones Técnicas, Herramientas de Administración, Usabilidad. De igual forma, se analiza la posibilidad de contar con un LMS que permita el almacenamiento masivo (y una posible federación de datos) para entidades educativas en las cuales se genera una amplia variedad contenidos. Finalmente se resalta que la evaluación, escogencia e implantación de un software para la administración y control de actividades de formación e-Learning va más allá de los meros aspectos tecnológicos, pues aquí impactan también las decisiones gerenciales de la institución en lo referente a recursos económicos y humanos.Universitat Oberta de Catalunya UOC1. INTRODUCCIÓN 19 2. CONCEPTUALIZACIÓN TEÓRICA 21 2.1. ESTADO DEL ARTE 21 2.2. MARCO TEÓRICO 27 2.2.1 E-learning 27 2.2.2 Sistemas de Administración de Aprendizaje (LMS) 28 2.2.3 Sistemas de administración de Contenidos de aprendizaje 31 2.2.4 Características de los sistemas de gestión de aprendizaje 32 2.2.5 Entornos integrales e-Learning 32 2.2.6 Aprendizaje colaborativo 33 2.2.7 CSCL (Aprendizaje Cooperativo Soportado por Computador) 34 2.2.8 Redes Sociales de Aprendizaje Sobre Mallas Computacionales (CSSNs) 35 2.2.9 Knowledge Grid 35 2.2.10 Modelos Pedagógicos y b-Learning 36 2.2.11 Estándares e-Learning 36 2.2.12 Objetos Distribuidos de Aprendizaje 37 2.2.13 Metodología para implantación de un Sistema E-Learning corporativo 38 2.2.14 Análisis de plataformas E-Learning 39 2.2.15 Maat Gknowledge 42 2.2.16 Moodle (Modular Object-Oriented Dynamic Learning Enviroment) 46 2.2.17 LMS de la empresa MAAT Plataforma LMS Gknowledge Learning Tools 55 3. TRABAJO DESARROLLADO 66 3.1. Proceso de instalación de ambas plataformas 66 3.1.1 Proceso de instalación Moodle 1.9.2+ 66 3.1.2 Descripción del proceso de instalación LMS de Maat 86 3.1.3 Descripción del proceso de configuración del grid de Maat 88 3.2. Descripción del proceso de evaluación para ambas plataformas 99 3.2.1 Pruebas Plataforma Moodle 100 3.2.2 Pruebas LMS de Maat 123 4. CONCLUSIONES 143 5. BIBLIOGRAFÍA 145 6. ANEXOS 151MaestríaThe objetive of this work has been realized a comparative analysis between the LMS of the company Maat (It was selected as platform E-Learning for the project EELA 2) and a open source LMS (specifically Moodle). Nowadays the E-Learning has been consolidated in a strategy to create a new technology of the learning, for this reason, great quantity of educational institutions are using in educational processes. Nevertheless, the majority of the times, they are chosen without a serious previous investigation. The study realiced includes stress tests, Teacher and Student Tools, Technical Specifications, Administración Tools and Usability. In the same form, there is analyzed the possibility of massive storage by educational entities in which a many contents are generated. Finally, is highlighted the evaluation, selection and implantation of a LMS goes beyond the technological aspects, are important also the management institution decisions, too the economic and human resources.Modalidad Presencia
    corecore