531 research outputs found
Extraction of Semantics from Primitive Concepts
bstract Due to the need to organize a vast amount of documents available in the Internet, the automated semantic extraction representing webpages has become a popular research topic in both industry and academia. The purpose of this project is to introduce a new method to process documents to extract the original contextual representations and yet to extend additional and connect similar representations based on the semantics underneath the extracted representations in an automatic fashion. Among the purposed steps, the core of this project is to tackle the difficulty to construct a mechanism in which machines can computationally understand the lexical meaning of the extracted semantic representations. For instance, the word “good” has the same lexical meaning as the word “well”, so both should be equally treated. Furthermore, the 2-gram “wall street” should be kept as-is instead of tokenizing it into two single words, but “coffee or tea” should be tokenized into two single words “coffee” and “tea”. This is important in text mining to keep but not to destruct the original semantics so one can further process documents safely, efficiently, and accurately. In the project, I first discuss the adequate machine learning method introduced by Professor Lin to process documents to extract the original contextual representations, namely primitive concepts. Then, I introduce new methods to apply the extracted concepts to extract additional and connect similar representation based on the semantics underneath using the WordNet database. In the last section of the report, I examined the proposed data processing method with sample data and justified the empirical results with data provided by Google Search. The project well articulates the problems of computation cost reduction and prediction enhancement in contextual extraction for documents. In general, most of the machine-learning article is well written and informative for general readers with Mathematics background, but not necessarily for readers of engineering interest. In the report, an engineering mechanism is constructed with mathematical reasoning to 4persuade readers with theoretical background. Both readers from the engineering and mathematical communities are not to be left without an engineering and theoretical understanding of the methods introduced in the project
P1_7 Row, Let's Row Away
This paper investigates the potential, energy difference between flying to a destination, compared to rowing to it. i.e. the energy burned per passenger on an airplane, compared to a passenger rowing across the sea to the destination. The destinations we consider are from London Heathrow to New York JFK airport, and we found that for a Boeing 747 and an Airbus A380, the energy used is 6069.1 MJ and 5677.8 MJ respectively. The amount of energy to row the distance in a single lightweight scull was found to be 128.67 MJ; in comparison to the Airbus the rower uses approximately 98% less energy
P1_6 Geothermal Power
Geothermal power is a green power source that could provide substantial renewable power. This paper looks at the approximate energy that the planet could provide using the stored thermal energy beneath the surface of the Earth. It was calculated that the energy that could be used was 3.9x1030J. However, actually extracting this energy is unrealistic with today’s technology, as well as hazardous to the planet
P1_10 Fus Ro Dah
The power of the “Thu’um†is unquestionable within the video game, Skyrim [1]. This paper investigates the possibility of knocking down an opponent using only their voice. It was calculated that the minimum amount of force required to do so is 121.2N, and that an average person can only produce 3.74N
P1_2 Melting Mirrors
High powered lasers have been portrayed as being able to cut through almost anything, but a simple mirror seems to easily reflect them. The purpose of a mirror is to reflect optical light but surely there is a limit to the energy it can do so before it begins to deform and melt. Thorough research into this showed us that if we consider a mirror with an optical coating of silver and a 500 nm laser was applied to it, it would take 11.65 J to destroy the illuminated area, if conductivity and scattering were not considered
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Advances in self-supervised learning have significantly reduced the amount of
transcribed audio required for training. However, the majority of work in this
area is focused on read speech. We explore limited supervision in the domain of
conversational speech. While we assume the amount of in-domain data is limited,
we augment the model with open source read speech data. The XLS-R model has
been shown to perform well with limited adaptation data and serves as a strong
baseline. We use untranscribed data for self-supervised learning and
semi-supervised training in an autoregressive encoder-decoder model. We
demonstrate that by using the XLS-R model for pseudotranscription, a much
smaller autoregressive model can outperform a finetuned XLS-R model when
transcribed in-domain data is limited, reducing WER by as much as 8% absolute.Comment: Submitted to IEEE ICASSP 202
eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous Ensembles
In this paper, we introduce eipy--an open-source Python package for
developing effective, multi-modal heterogeneous ensembles for classification.
eipy simultaneously provides both a rigorous, and user-friendly framework for
comparing and selecting the best-performing multi-modal data integration and
predictive modeling methods by systematically evaluating their performance
using nested cross-validation. The package is designed to leverage
scikit-learn-like estimators as components to build multi-modal predictive
models. An up-to-date user guide, including API reference and tutorials, for
eipy is maintained at https://eipy.readthedocs.io . The main repository for
this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy
P1_1 Everybody knows the moon is made of cheese...
This report investigates the possible implications of our moon being made of cheese as suggested in the Wallace and Gromit film 'A Grand Day out'. If it were the same size as the current moon, and made of Wensleydale, then it would exert 13.1 x1019N less force on Earth. If it were to exert the same force then its radius would increase by 0.78 x 106m, appearing 144% larger in the night sky
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation
Target-oriented dialogue systems, designed to proactively steer conversations
toward predefined targets or accomplish specific system-side goals, are an
exciting area in conversational AI. In this work, by formulating a <dialogue
act, topic> pair as the conversation target, we explore a novel problem of
personalized target-oriented dialogue by considering personalization during the
target accomplishment process. However, there remains an emergent need for
high-quality datasets, and building one from scratch requires tremendous human
effort. To address this, we propose an automatic dataset curation framework
using a role-playing approach. Based on this framework, we construct a
large-scale personalized target-oriented dialogue dataset, TopDial, which
comprises about 18K multi-turn dialogues. The experimental results show that
this dataset is of high quality and could contribute to exploring personalized
target-oriented dialogue.Comment: Accepted to EMNLP-2023 main conferenc
- …