Search CORE

944 research outputs found

Identification of issues faced by international students in first year project-based engineering classes

Author: Chen Shaun
Kavanagh Lydia
Publication venue: Griffith School of Engineering, Griffith University
Publication date: 01/01/2013
Field of study

TabuLa: Harnessing Language Models for Tabular Data Synthesis

Author: Birke Robert
Chen Lydia
Zhao Zilong
Publication venue
Publication date: 19/10/2023
Field of study

Given the ubiquitous use of tabular data in industries and the growing concerns in data privacy and security, tabular data synthesis emerges as a critical research area. The recent state-of-the-art methods show that large language models (LLMs) can be adopted to generate realistic tabular data. As LLMs pre-process tabular data as full text, they have the advantage of avoiding the curse of dimensionality associated with one-hot encoding high-dimensional data. However, their long training time and limited re-usability on new tasks prevent them from replacing exiting tabular generative models. In this paper, we propose Tabula, a tabular data synthesizer based on the language model structure. Through Tabula, we demonstrate the inherent limitation of employing pre-trained language models designed for natural language processing (NLP) in the context of tabular data synthesis. Our investigation delves into the development of a dedicated foundational model tailored specifically for tabular data synthesis. Additionally, we propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data. Extensive experiments on six datasets demonstrate that using a language model structure without loading the well-trained model weights yields a better starting model for tabular data synthesis. Moreover, the Tabula model, previously trained on other tabular data, serves as an excellent foundation model for new tabular data synthesis tasks. Additionally, the token sequence compression method substantially reduces the model's training time. Results show that Tabula averagely reduces 46.2% training time per epoch comparing to current LLMs-based state-of-the-art algorithm and consistently achieves even higher synthetic data utility

arXiv.org e-Print Archive

Using a contextualised English support programme to assist international engineering students

Author: Chen Shaun
Gollagher Susan
Kavanagh Lydia
Reidsema Carl
Publication venue: School of Engineering, Deakin University
Publication date: 01/01/2015
Field of study

University of Queensland eSpace

Virtualization in the Private Cloud: State of the Practice

Author: Birke Robert
Chen Lydia Y.
Podzimek Andrej
Smirni Evgenia
Publication venue: W&M ScholarWorks
Publication date: 01/01/2016
Field of study

Virtualization has become a mainstream technology that allows efficient and safe resource sharing in data centers. In this paper, we present a large scale workload characterization study of 90K virtual machines hosted on 8K physical servers, across several geographically distributed corporate data centers of a major service provider. The study focuses on 19 days of operation and focuses on the state of the practice, i. e., how virtual machines are deployed across different physical resources with an emphasis on processors and memory, focusing on resource sharing and usage of physical resources, virtual machine life cycles, and migration patterns and their frequencies. This paper illustrates that indeed there is a huge tendency in over-provisioning CPU and memory resources while certain virtualization features (e. g., migration and collocation) are used rather conservatively, showing that there is significant room for the development of policies that aim to reduce operational costs in data centers

College of William & Mary: W&M Publish