531 research outputs found

    Extraction of Semantics from Primitive Concepts

    Get PDF
    bstract Due to the need to organize a vast amount of documents available in the Internet, the automated semantic extraction representing webpages has become a popular research topic in both industry and academia. The purpose of this project is to introduce a new method to process documents to extract the original contextual representations and yet to extend additional and connect similar representations based on the semantics underneath the extracted representations in an automatic fashion. Among the purposed steps, the core of this project is to tackle the difficulty to construct a mechanism in which machines can computationally understand the lexical meaning of the extracted semantic representations. For instance, the word “good” has the same lexical meaning as the word “well”, so both should be equally treated. Furthermore, the 2-gram “wall street” should be kept as-is instead of tokenizing it into two single words, but “coffee or tea” should be tokenized into two single words “coffee” and “tea”. This is important in text mining to keep but not to destruct the original semantics so one can further process documents safely, efficiently, and accurately. In the project, I first discuss the adequate machine learning method introduced by Professor Lin to process documents to extract the original contextual representations, namely primitive concepts. Then, I introduce new methods to apply the extracted concepts to extract additional and connect similar representation based on the semantics underneath using the WordNet database. In the last section of the report, I examined the proposed data processing method with sample data and justified the empirical results with data provided by Google Search. The project well articulates the problems of computation cost reduction and prediction enhancement in contextual extraction for documents. In general, most of the machine-learning article is well written and informative for general readers with Mathematics background, but not necessarily for readers of engineering interest. In the report, an engineering mechanism is constructed with mathematical reasoning to 4persuade readers with theoretical background. Both readers from the engineering and mathematical communities are not to be left without an engineering and theoretical understanding of the methods introduced in the project

    P1_7 Row, Let's Row Away

    Get PDF
    This paper investigates the potential, energy difference between flying to a destination, compared to rowing to it. i.e. the energy burned per passenger on an airplane, compared to a passenger rowing across the sea to the destination. The destinations we consider are from London Heathrow to New York JFK airport, and we found that for a Boeing 747 and an Airbus A380, the energy used is 6069.1 MJ and 5677.8 MJ respectively. The amount of energy to row the distance in a single lightweight scull was found to be 128.67 MJ; in comparison to the Airbus the rower uses approximately 98% less energy

    P1_6 Geothermal Power

    Get PDF
    Geothermal power is a green power source that could provide substantial renewable power. This paper looks at the approximate energy that the planet could provide using the stored thermal energy beneath the surface of the Earth. It was calculated that the energy that could be used was 3.9x1030J. However, actually extracting this energy is unrealistic with today’s technology, as well as hazardous to the planet

    P1_10 Fus Ro Dah

    Get PDF
    The power of the “Thu’um†is unquestionable within the video game, Skyrim [1]. This paper investigates the possibility of knocking down an opponent using only their voice. It was calculated that the minimum amount of force required to do so is 121.2N, and that an average person can only produce 3.74N

    P1_2 Melting Mirrors

    Get PDF
    High powered lasers have been portrayed as being able to cut through almost anything, but a simple mirror seems to easily reflect them. The purpose of a mirror is to reflect optical light but surely there is a limit to the energy it can do so before it begins to deform and melt. Thorough research into this showed us that if we consider a mirror with an optical coating of silver and a 500 nm laser was applied to it, it would take 11.65 J to destroy the illuminated area, if conductivity and scattering were not considered

    Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

    Full text link
    Advances in self-supervised learning have significantly reduced the amount of transcribed audio required for training. However, the majority of work in this area is focused on read speech. We explore limited supervision in the domain of conversational speech. While we assume the amount of in-domain data is limited, we augment the model with open source read speech data. The XLS-R model has been shown to perform well with limited adaptation data and serves as a strong baseline. We use untranscribed data for self-supervised learning and semi-supervised training in an autoregressive encoder-decoder model. We demonstrate that by using the XLS-R model for pseudotranscription, a much smaller autoregressive model can outperform a finetuned XLS-R model when transcribed in-domain data is limited, reducing WER by as much as 8% absolute.Comment: Submitted to IEEE ICASSP 202

    eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous Ensembles

    Full text link
    In this paper, we introduce eipy--an open-source Python package for developing effective, multi-modal heterogeneous ensembles for classification. eipy simultaneously provides both a rigorous, and user-friendly framework for comparing and selecting the best-performing multi-modal data integration and predictive modeling methods by systematically evaluating their performance using nested cross-validation. The package is designed to leverage scikit-learn-like estimators as components to build multi-modal predictive models. An up-to-date user guide, including API reference and tutorials, for eipy is maintained at https://eipy.readthedocs.io . The main repository for this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy

    P1_1 Everybody knows the moon is made of cheese...

    Get PDF
    This report investigates the possible implications of our moon being made of cheese as suggested in the Wallace and Gromit film 'A Grand Day out'. If it were the same size as the current moon, and made of Wensleydale, then it would exert 13.1 x1019N less force on Earth. If it were to exert the same force then its radius would increase by 0.78 x 106m, appearing 144% larger in the night sky

    Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation

    Full text link
    Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a <dialogue act, topic> pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accomplishment process. However, there remains an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To address this, we propose an automatic dataset curation framework using a role-playing approach. Based on this framework, we construct a large-scale personalized target-oriented dialogue dataset, TopDial, which comprises about 18K multi-turn dialogues. The experimental results show that this dataset is of high quality and could contribute to exploring personalized target-oriented dialogue.Comment: Accepted to EMNLP-2023 main conferenc
    • …
    corecore