34 research outputs found

    Classification and visualisation of text documents using networks

    Get PDF
    In both the areas of text classification and text visualisation graph/network theoretic methods can be applied effectively. For text classification we assessed the effectiveness of graph/network summary statistics to develop weighting schemes and features to improve test accuracy. For text visualisation we developed a framework using established visual cues from the graph visualisation literature to communicate information intuitively. The final output of the visualisation component of the dissertation was a tool that would allow members of the public to produce a visualisation from a text document. We represented a text document as a graph/network. The words were nodes and the edges were created when a pair of words appeared within a pre-specified distance (window) of words from each other. The text document model is a matrix representation of a document collection such that it can be integrated into a machine or statistical learning algorithm. The entries of this matrix can be weighting according to various schemes. We used the graph/network representation of a text document to create features and weighting schemes that could be applied to the text document model. This approach was not well developed for text classification therefore we applied different edge weighting methods, window sizes, weighting schemes and features. We also applied three machine learning algorithms, naïve Bayes, neural networks and support vector machines. We compared our various graph/network approaches to the traditional document model with term frequency inverse-document-frequency. We were interested in establishing whether or not the use of graph weighting schemes and graph features could increase test accuracy for text classification tasks. As far as we can tell from the literature, this is the first attempt to use graph features to weight bag-of-words features for text classification. These methods had been applied to information retrieval (Blanco & Lioma, 2012). It seemed they could also be applied to text classification. The text visualisation field seemed divorced from the text summarisation and information retrieval fields, in that text co-occurrence relationships were not treated with equal importance. Developments in the graph/network visualisation literature could be taken advantage of for the purposes of text visualisation. We created a framework for text visualisation using the graph/network representation of a text document. We used force directed algorithms to visualise the document. We used established visual cues like, colour, size and proximity in space to convey information through the visualisation. We also applied clustering and part-of-speech tagging to allow for filtering and isolating of specific information within the visualised document. We demonstrated this framework with four example texts. We found that total degree, a graph weighting scheme, outperformed term frequency on average. The effect of graph features depended heavily on the machine learning method used: for the problems we considered graph features increased accuracy for SVM classifiers, had little effect for neural networks and decreased accuracy for naïve Bayes classifiers Therefore the impact on test accuracy of adding graph features to the document model is dependent on the machine learning algorithm used. The visualisation of text graphs is able to convey meaningful information regarding the text at a glance through established visual cues. Related words are close together in visual space and often connected by thick edges. Large nodes often represent important words. Modularity clustering is able to extract thematically consistent clusters from text graphs. This allows for the clusters to be isolated and investigated individually to understand specific themes within a document. The use of part-of-speech tagging is effective in both reducing the number of words being displayed but also increasing the relevance of words being displayed. This was made clear through the use of part-of-speech tags applied to the Internal Resistance of Apartheid Wikipedia webpage. The webpage was reduced to its proper nouns which contained much of the important information in the text. Training accuracy is important in text classification which is a task that can often be performed on vast amounts of documents. Much of the research in text classification is aimed at increasing classification accuracy either through feature engineering, or optimising machine learning methods. The finding that total degree outperformed term frequency on average provides an alternative avenue for achieving higher test accuracy. The finding that the addition of graph features can increase test accuracy when matched with the right machine learning algorithm suggests some new research should be conducted regarding the role that graph features can have in text classification. Text visualisation is used as an exploratory tool and as a means of quickly and easily conveying text information. The framework we developed is able to create automated text visualisations that intuitively convey information for short and long text documents. This can greatly reduce the amount of time it takes to assess the content of a document which can increase general access to information

    Mashing with unmalted sorghum using a novel low temperature enzyme system: impacts of sorghum grain composition and microstructure

    Get PDF
    Brewing lager beers from unmalted sorghum traditionally requires the use of high temperature mashing and exogenous enzymes to ensure adequate starch conversion. Here, a novel low-temperature mashing system is compared to a more traditional mash in terms of the wort quality produced (laboratory scale) from five unmalted sorghums (2 brewing and 3 non-brewing varieties). The low temperature mash generated worts of comparable quality to those resulting from a traditional energy intensive mash protocol. Furthermore, its performance was less dependent on sorghum raw material quality, such that it may facilitate the use of what were previously considered non-brewing varieties. Whilst brewing sorghums were of lower protein content, protein per se did not correlate with mashing performance. Rather, it was the way in which protein was structured (particularly the strength of protein starch interactions) which most influenced brewing performance. RVA profile was the easiest way of identifying this characteristic as potentially problematic

    The Maltase Involved in Starch Metabolism in Barley Endosperm Is Encoded by a Single Gene

    Get PDF
    During germination and early seedling growth of barley (Hordeum vulgare), maltase is responsible for the conversion of maltose produced by starch degradation in the endosperm to glucose for seedling growth. Despite the potential relevance of this enzyme for malting and the production of alcoholic beverages, neither the nature nor the role of maltase is fully understood. Although only one gene encoding maltase has been identified with certainty, there is evidence for the existence of other genes and for multiple forms of the enzyme. It has been proposed that maltase may be involved directly in starch granule degradation as well as in maltose hydrolysis. The aim of our work was to discover the nature of maltase in barley endosperm. We used ion exchange chromatography to fractionate maltase activity from endosperm of young seedlings, and we partially purified activity for protein identification. We compared maltase activity in wild-type barley and transgenic lines with reduced expression of the previously-characterised maltase gene Agl97, and we used genomic and transcriptomic information to search for further maltase genes. We show that all of the maltase activity in the barley endosperm can be accounted for by a single gene, Agl97. Multiple forms of the enzyme most likely arise from proteolysis and other post-translational modifications

    An approach to developing a prediction model of fertility intent among HIV-positive women and men in Cape Town, South Africa: a case study

    Get PDF
    As a ‘case-study’ to demonstrate an approach to establishing a fertility-intent prediction model, we used data collected from recently diagnosed HIV-positive women (N = 69) and men (N = 55) who reported inconsistent condom use and were enrolled in a sexual and reproductive health intervention in public sector HIV care clinics in Cape Town, South Africa. Three theoretically-driven prediction models showed reasonable sensitivity (0.70–1.00), specificity (0.66–0.94), and area under the receiver operating characteristic curve (0.79–0.89) for predicting fertility intent at the 6-month visit. A k-fold cross-validation approach was employed to reduce bias due to over-fitting of data in estimating sensitivity, specificity, and area under the curve. We discuss how the methods presented might be used in future studies to develop a clinical screening tool to identify HIV-positive individuals likely to have future fertility intent and who could therefore benefit from sexual and reproductive health counselling around fertility options

    Factors Influencing Pregnancy Desires among HIV Positive Women in Sibande District in Mpumalanga, South Africa.

    No full text
    Fertility issues for HIV-positive women are becoming increasingly important. The study investigated the pregnancy desires of HIV positive women of Gert Sibande District in Mpumalanga, South Africa. The objective of the study is to present findings on factors influencing pregnancy desires amongst HIV positive women that have participated in Prevention of Mother to child Transmission of HIV programme. A cross sectional survey was conducted. 47 public health facilities in Gert Sibande District of Mpumalanga, South Africa were used to conduct interviews between September 2008 and March 2009. 815 HIV infected mothers at postnatal care, with babies aged 3-6 months. Women in the current study had poor knowledge about HIV transmission from mother to child. We found that only 16.6% had a desire to have children. In multivariable regression analysis the desire to have children was associated with having fewer children, had discussed family planning, current partner knew his HIV status and unknown HIV status of their infant. The main family methods currently used was injection (54.8%), followed by condom (33.9%), the pill (22%) and female condom (14.6%). Women with HIV who desire to have children face risks that need special consideration. Family planning for HIV infected women should be promoted and improved in postnatal care.Key words: Prevention of Mother to child Transmission of HIV programme (PMTCT), pregnancy desires, family planning, male involvement, HIV knowledge, HIV positive mothers
    corecore