Search CORE

531 research outputs found

Extraction of Semantics from Primitive Concepts

Author: Li Chak
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2012
Field of study

bstract Due to the need to organize a vast amount of documents available in the Internet, the automated semantic extraction representing webpages has become a popular research topic in both industry and academia. The purpose of this project is to introduce a new method to process documents to extract the original contextual representations and yet to extend additional and connect similar representations based on the semantics underneath the extracted representations in an automatic fashion. Among the purposed steps, the core of this project is to tackle the difficulty to construct a mechanism in which machines can computationally understand the lexical meaning of the extracted semantic representations. For instance, the word “good” has the same lexical meaning as the word “well”, so both should be equally treated. Furthermore, the 2-gram “wall street” should be kept as-is instead of tokenizing it into two single words, but “coffee or tea” should be tokenized into two single words “coffee” and “tea”. This is important in text mining to keep but not to destruct the original semantics so one can further process documents safely, efficiently, and accurately. In the project, I first discuss the adequate machine learning method introduced by Professor Lin to process documents to extract the original contextual representations, namely primitive concepts. Then, I introduce new methods to apply the extracted concepts to extract additional and connect similar representation based on the semantics underneath using the WordNet database. In the last section of the report, I examined the proposed data processing method with sample data and justified the empirical results with data provided by Google Search. The project well articulates the problems of computation cost reduction and prediction enhancement in contextual extraction for documents. In general, most of the machine-learning article is well written and informative for general readers with Mathematics background, but not necessarily for readers of engineering interest. In the report, an engineering mechanism is constructed with mathematical reasoning to 4persuade readers with theoretical background. Both readers from the engineering and mathematical communities are not to be left without an engineering and theoretical understanding of the methods introduced in the project

SJSU ScholarWorks

P1_7 Row, Let's Row Away

Author: East Oliver
Fletcher Mark
Li Chak
Longstaff Emma
Publication venue: The University of Leicester
Publication date: 20/11/2013
Field of study

This paper investigates the potential, energy difference between flying to a destination, compared to rowing to it. i.e. the energy burned per passenger on an airplane, compared to a passenger rowing across the sea to the destination. The destinations we consider are from London Heathrow to New York JFK airport, and we found that for a Boeing 747 and an Airbus A380, the energy used is 6069.1 MJ and 5677.8 MJ respectively. The amount of energy to row the distance in a single lightweight scull was found to be 128.67 MJ; in comparison to the Airbus the rower uses approximately 98% less energy

University of Leicester Open Journals

P1_6 Geothermal Power

Author: East Oliver
Fletcher Mark
Li Chak
Longstaff Emma
Publication venue: The University of Leicester
Publication date: 16/01/2014
Field of study

Geothermal power is a green power source that could provide substantial renewable power. This paper looks at the approximate energy that the planet could provide using the stored thermal energy beneath the surface of the Earth. It was calculated that the energy that could be used was 3.9x1030J. However, actually extracting this energy is unrealistic with todayâ€™s technology, as well as hazardous to the planet

University of Leicester Open Journals

P1_10 Fus Ro Dah

Author: East Oliver
Fletcher Mark
Li Chak
Longstaff Emma
Publication venue: The University of Leicester
Publication date: 11/12/2013
Field of study

The power of the â€œThuâ€™umâ€ is unquestionable within the video game, Skyrim [1]. This paper investigates the possibility of knocking down an opponent using only their voice. It was calculated that the minimum amount of force required to do so is 121.2N, and that an average person can only produce 3.74N

University of Leicester Open Journals

P1_2 Melting Mirrors

Author: East Oliver
Fletcher Mark
Li Chak
Longstaff Emma
Publication venue: The University of Leicester
Publication date: 13/11/2013
Field of study

High powered lasers have been portrayed as being able to cut through almost anything, but a simple mirror seems to easily reflect them. The purpose of a mirror is to reflect optical light but surely there is a limit to the energy it can do so before it begins to deform and melt. Thorough research into this showed us that if we consider a mirror with an optical coating of silver and a 500 nm laser was applied to it, it would take 11.65 J to destroy the illuminated area, if conductivity and scattering were not considered

University of Leicester Open Journals

Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

Author: Hartmann William
Keith Francis
Li Chak-Fai
Snover Matthew
Publication venue
Publication date: 26/10/2022
Field of study

Advances in self-supervised learning have significantly reduced the amount of transcribed audio required for training. However, the majority of work in this area is focused on read speech. We explore limited supervision in the domain of conversational speech. While we assume the amount of in-domain data is limited, we augment the model with open source read speech data. The XLS-R model has been shown to perform well with limited adaptation data and serves as a strong baseline. We use untranscribed data for self-supervised learning and semi-supervised training in an autoregressive encoder-decoder model. We demonstrate that by using the XLS-R model for pseudotranscription, a much smaller autoregressive model can outperform a finetuned XLS-R model when transcribed in-domain data is limited, reducing WER by as much as 8% absolute.Comment: Submitted to IEEE ICASSP 202

arXiv.org e-Print Archive

eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous Ensembles

Author: Bennett Jamie J. R.
Li Yan Chak
Pandey Gaurav
Publication venue
Publication date: 17/01/2024
Field of study

In this paper, we introduce eipy--an open-source Python package for developing effective, multi-modal heterogeneous ensembles for classification. eipy simultaneously provides both a rigorous, and user-friendly framework for comparing and selecting the best-performing multi-modal data integration and predictive modeling methods by systematically evaluating their performance using nested cross-validation. The package is designed to leverage scikit-learn-like estimators as components to build multi-modal predictive models. An up-to-date user guide, including API reference and tutorials, for eipy is maintained at https://eipy.readthedocs.io . The main repository for this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy

arXiv.org e-Print Archive

P1_1 Everybody knows the moon is made of cheese...

Author: East Oliver
Fletcher Mark
Li Chak Fong Edward
Longstaff Emma
Publication venue: The University of Leicester
Publication date: 27/10/2013
Field of study

This report investigates the possible implications of our moon being made of cheese as suggested in the Wallace and Gromit film 'A Grand Day out'. If it were the same size as the current moon, and made of Wensleydale, then it would exert 13.1 x1019N less force on Earth. If it were to exert the same force then its radius would increase by 0.78 x 106m, appearing 144% larger in the night sky

University of Leicester Open Journals

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation

Author: Cheng Yi
Leong Chak Tou
Li Wenjie
Lin Dongding
Wang Jian
Publication venue
Publication date: 13/10/2023
Field of study

Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a <dialogue act, topic> pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accomplishment process. However, there remains an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To address this, we propose an automatic dataset curation framework using a role-playing approach. Based on this framework, we construct a large-scale personalized target-oriented dialogue dataset, TopDial, which comprises about 18K multi-turn dialogues. The experimental results show that this dataset is of high quality and could contribute to exploring personalized target-oriented dialogue.Comment: Accepted to EMNLP-2023 main conferenc

arXiv.org e-Print Archive