South African Tuberculosis Vaccine Initiative

UCT Computer Science Research Document Archive
Not a member yet
    1269 research outputs found

    Evolving Swarm-Robotic Behavioral Allocations

    Get PDF
    This study investigates comparative methods for two-step collective behavior evolution (evolving group behaviors from pre-evolved behaviors), to encourage the evolution of behavioral diversity in swarm-robotic applications. Specifically, we investigate behavioral diversity evolution given pre-evolved behaviors in collective behaviors that are effective across increasingly complex and difficult collective herding task environments. Results indicate that specific complements of pre-evolved (lower task-performance) collective herding behaviors was suitable for achieving high task performance across all environments and task difficulty levels. Results support the efficacy of the two-step approach for evolving behaviorally heterogeneous groups in collective behavior tasks that benefit from groups comprising various complementary behaviors

    Molecular modeling of Group B Streptococcus type II and III capsular polysaccharides explains low filter retention of type II and lack of cross-reactivity with type III

    Get PDF
    Group B Streptococcus (GBS) is a bacterial pathogen associated with significant morbidity and mortality in pregnant women and infants, particularly in resource-limited settings. A hexavalent vaccine candidate in development incorporates the capsular polysaccharides (CPSs) from the most prevalent serotypes: Ia, Ib, II, III, IV, and V. Vaccine production is facilitated by a standardized CPS purification process. In the final purification step, a 30 kDa membrane filter gives high-yield recovery for five of the six CPSs, but <50 % for type II (GBSII), despite similar CPS structure and size. However, a smaller 10 kDa membrane improves recovery to about 90 %, suggesting that CPS conformation affects retention. Here comparative molecular modeling – corroborated by through-space NMR correlations – reveals that GBSII forms compact, globular conformations, while type III (GBSIII) forms an elongated zig-zag. This explains GBSII's poor retention during filtration: GBSII's compact globules pass through the 30 kDa membrane more easily than GBSIII's elongated forms. Additionally, we identify distinct epitopes and compare their interactions with a GBSIII-specific fragment antibody to clarify the lack of cross-reactivity between GBSII and GBSIII. This work provides valuable mechanistic insight into physically observed behavior to inform development of multivalent GBS vaccines to reduce maternal and infant mortality

    Story Generation with Large Language Models for African Languages

    Get PDF
    The development of Large Language Models (LLMs) for African languages has been hindered by the lack of large-scale textual data. Previous research has shown that relatively small language models, when trained on synthetic data generated by larger models, can produce fluent, short English stories, providing a data-efficient alternative to large-scale pretraining. In this paper, we apply a similar approach to develop and evaluate small language models for generating children’s stories in isiZulu and Yoruba, using synthetic datasets created through translation and multilingual prompting. We train six language-specific models varying in dataset size and source, and based on the GPT-2 architecture. Our results show that models trained on synthetic low-resource data are capable of producing coherent and fluent short stories in isiZulu and Yoruba. Models trained on larger synthetic datasets generally perform better in terms of coherence and grammar, and also tend to generalize better, as seen by their lower evaluation perplexities. Models trained on datasets generated through prompting instead of translation generate similar or more coherent stories and display more creativity, but perform worse in terms of generalization to unseen data. In addition to the potential educational applications of the automated story generation, our approach has the potential to be used as the foundation for more data-efficient low-resource language models

    GASNet: Geometric Robust Adaptive Spatial-Enhanced Network for Building Extraction

    No full text
    Accurately extracting building information from high-resolution remote sensing images is of great significance in urban planning, land use management, and related fields. However, challenges such as shadow interference, tree occlusion, and the complex, diverse structures of buildings make fast and accurate building extraction from remote sensing images a highly challenging task. To address this challenge, this paper proposes GASNet, a novel framework designed to enhance feature representation through global spatial dependency modeling and multiscale boundary refinement. First, we introduce the Dual-Scale Dependency Module (DSDM), which leverages graph-based reasoning in non-Euclidean space to dynamically aggregate local and global spatial dependencies while suppressing redundant features. In addition, the Scale-Aware Efficient Attention (SAEA) mechanism is proposed to enhance feature representation along both horizontal and vertical directions, enabling comprehensive boundary information capture. By leveraging the integrity of buildings, it effectively mitigates occlusion-induced interference, significantly improving the accuracy and completeness of building extraction. Extensive experiments on three benchmark datasets—WHU, Massachusetts, and Inria Aerial—demonstrate the superiority of GASNet. Our method achieves state-of-the-art performance, surpassing existing approaches by margins of 1.73% IoU on WHU, 0.48% IoU on Massachusetts, and 1.28% IoU on Inria

    On the usage of semantics, syntax, and morphology for noun classification in isiZulu

    Get PDF
    There is limited work aimed at solving the core task of noun classification for Nguni languages. The task focuses on identifying the semantic categorisation of each noun and plays a crucial role in the ability to form semantically and morphologically valid sentences. The work by Byamugisha (2022) was the first to tackle the problem for a related, but non-Nguni, language. While there have been efforts to replicate it for a Nguni language, there has been no effort focused on comparing the technique used in the original work vs. contemporary neural methods or a number of traditional machine learning classification techniques that do not rely on human-guided knowledge to the same extent. We reproduce Byamugisha (2022)’s work with different configurations to account for differences in access to datasets and resources, compare the approach with a pre-trained transformer-based model, and traditional machine learning models that rely on less human-guided knowledge. The newly created data-driven models outperform the knowledge-infused models, with the best performing models achieving an F1 score of 0.97

    Herds From Video: Learning a Microscopic Herd Model From Macroscopic Motion Data

    Get PDF
    We present a method for animating herds that automatically tunes a microscopic herd model based on a short video clip of real animals. Our method handles videos with dense herds, where individual animal motion cannot be separated out. Our contribution is a novel framework for extracting macroscopic herd behaviour from such video clips, and then deriving the microscopic agent parameters that best match this behaviour. To support this learning process, we extend standard agent models to provide a separation between leaders and followers, better match the occlusion and field-of-view limitations of real animals, support differentiable parameter optimization and improve authoring control. We validate the method by showing that once optimized, the social force and perception parameters of the resulting herd model are accurate enough to predict subsequent frames in the video, even for macroscopic properties not directly incorporated in the optimization process. Furthermore, the extracted herding characteristics can be applied to any terrain with a palette and region-painting approach that generalizes to different herd sizes and leader trajectories. This enables the authoring of herd animations in new environments while preserving learned behaviour

    Neural Morphological Tagging for Nguni Languages

    Get PDF
    Morphological parsing is the task of decomposing words into morphemes, the smallest units of meaning in a language, and labelling their grammatical roles. It is a particularly challenging task for agglutinative languages, such as the Nguni languages of South Africa, which construct words by concatenating multiple morphemes. A morphological parsing system can be framed as a pipeline with two separate components, a segmenter followed by a tagger. This paper investigates the use of neural methods to build morphological taggers for the four Nguni languages. We compare two classes of approaches: training neural sequence labellers (LSTMs and neural CRFs) from scratch and finetuning pretrained language models. We compare performance across these two categories, as well as to a traditional rule-based morphological parser. Neural taggers comfortably outperform the rule-based baseline and models trained from scratch tend to outperform pretrained models. We also compare parsing results across different upstream segmenters and with varying linguistic input features. Our findings confirm the viability of employing neural taggers based on pre-existing morphological segmenters for the Nguni languages

    Designing and Contextualising Probes for African Languages

    Get PDF
    Pretrained language models (PLMs) for African languages are continually improving, but the reasons behind these advances remain unclear. This paper presents the first systematic investigation into how knowledge about African languages is encoded in PLMs. We train layer-wise probes for six typologically diverse African languages to analyse how linguistic features are distributed. We also design control tasks, a way to interpret probe performance, for the MasakhaPOS dataset. We find PLMs adapted for African languages to encode more linguistic information about target languages than massively multilingual PLMs. Our results reaffirm previous findings that token-level syntactic information concentrates in middle-to-last layers, while sentence-level semantic information is distributed across all layers. Through control tasks and probing baselines, we confirm that performance reflects the internal knowledge of PLMs rather than probe memorisation. Our study applies established interpretability techniques to African-language PLMs. In doing so, we highlight the internal mechanisms underlying the success of strategies like active learning and multilingual adaptation

    Semi-automatic linguistic annotation for lexicography with machine learning

    Get PDF
    Many terminology lists for South African languages provide only the translations of each term without grammatical information such as part-of-speech and noun class. This makes it difficult to know how to use these words correctly in context. Annotating these terminology lists manually requires linguistic expertise and can be costly, time-intensive, and error prone. Instead, we propose annotating these terminology lists using a machine-learning classifier. An expert will then review the generated output to ensure accuracy. We will apply this approach to isiXhosa and integrate the results into the IsiXhosa.click online dictionary (Marquard 2024). This progresses the annotation of lexicographic works by making it easier to input linguistic information in dictionaries

    IsiZulu noun classification based on replicating the ensemble approach for Runyankore

    Get PDF
    A noun’s class is a crucial component in NLP, because it governs agreement across the sentence in Niger Congo B (NCB) languages, among others. There is a lack of computational models for determining a noun’s class owing to ill-documentation in most NCB languages. A promising approach by Byamugisha (2022) used a data-driven approach for Runyankore that combined syntax and semantics. The code and data are inaccessible however, and it remains to be seen whether it is suitable for other NCB languages. We solve the problem by reproducing Byamugisha’s experiment, but then for isiZulu. We conducted this as two independent experiments, so that we also could subject it to a meta-analysis. Results showed that it was reproducible only in part, mainly due to imprecision in the original description, and the current impossibility to generate the same kind of source data set generated from an existing grammar. The different choices made in attempting to reproduce the pipeline as well as differences in choice of training and test data had a large effect on the eventual accuracy of noun class disambiguation but could produce an accuracy of 83%, in the same range as Runyankore

    1,071

    full texts

    1,269

    metadata records
    Updated in last 30 days.
    UCT Computer Science Research Document Archive is based in South Africa
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇