82 research outputs found

    Role of semantic indexing for text classification.

    Get PDF
    The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, failure to take into account the semantic relatedness between terms means that document similarity is not properly captured in the VSM. To address this problem, semantic indexing approaches have been proposed for modelling the semantic relatedness between terms in document representations. Accordingly, in this thesis, we empirically review the impact of semantic indexing on text classification. This empirical review allows us to answer one important question: how beneficial is semantic indexing to text classification performance. We also carry out a detailed analysis of the semantic indexing process which allows us to identify reasons why semantic indexing may lead to poor text classification performance. Based on our findings, we propose a semantic indexing framework called Relevance Weighted Semantic Indexing (RWSI) that addresses the limitations identified in our analysis. RWSI uses relevance weights of terms to improve the semantic indexing of documents. A second problem with the VSM is the lack of supervision in the process of creating document representations. This arises from the fact that the VSM was originally designed for unsupervised document retrieval. An important feature of effective document representations is the ability to discriminate between relevant and non-relevant documents. For text classification, relevance information is explicitly available in the form of document class labels. Thus, more effective document vectors can be derived in a supervised manner by taking advantage of available class knowledge. Accordingly, we investigate approaches for utilising class knowledge for supervised indexing of documents. Firstly, we demonstrate how the RWSI framework can be utilised for assigning supervised weights to terms for supervised document indexing. Secondly, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. A further limitation of the standard VSM is that an indexing vocabulary that consists only of terms from the document collection is used for document representation. This is based on the assumption that terms alone are sufficient to model the meaning of text documents. However for certain classification tasks, terms are insufficient to adequately model the semantics needed for accurate document classification. A solution is to index documents using semantically rich concepts. Accordingly, we present an event extraction framework called Rule-Based Event Extractor (RUBEE) for identifying and utilising event information for concept-based indexing of incident reports. We also demonstrate how certain attributes of these events e.g. negation, can be taken into consideration to distinguish between documents that describe the occurrence of an event, and those that mention the non-occurrence of that event

    Edge Detection: A Collection of Pixel based Approach for Colored Images

    Full text link
    The existing traditional edge detection algorithms process a single pixel on an image at a time, thereby calculating a value which shows the edge magnitude of the pixel and the edge orientation. Most of these existing algorithms convert the coloured images into gray scale before detection of edges. However, this process leads to inaccurate precision of recognized edges, thus producing false and broken edges in the image. This paper presents a profile modelling scheme for collection of pixels based on the step and ramp edges, with a view to reducing the false and broken edges present in the image. The collection of pixel scheme generated is used with the Vector Order Statistics to reduce the imprecision of recognized edges when converting from coloured to gray scale images. The Pratt Figure of Merit (PFOM) is used as a quantitative comparison between the existing traditional edge detection algorithm and the developed algorithm as a means of validation. The PFOM value obtained for the developed algorithm is 0.8480, which showed an improvement over the existing traditional edge detection algorithms.Comment: 5 Page

    Locality sensitive batch selection for triplet networks.

    Get PDF
    Triplet networks are deep metric learners which learn to optimise a feature space using similarity knowledge gained from training on triplets of data simultaneously. The architecture relies on the triplet loss function to optimise its weights based upon the distance between triplet members. Composition of input triplets therefore directly impacts the quality of the learned representations, meaning that a training scheme which optimises their formation is crucial. However, an exhaustive search for the best triplets is prohibitive unless the search for triplets is confined to smaller training regions or batches. Accordingly, current triplet mining approaches use informed selection applied only to a random minibatch, but the resulting view fails to exploit areas of complexity in the feature space. In this work, we introduce a locality-sensitive batching strategy, which uses the locality of examples to create batches as an alternative to the commonly adopted randomly minibatching. Our results demonstrate this method to offer better performance on three image and two text classification tasks with statistical significance. Importantly most of these gains are incrementally realised with as little as 25% of the training iterations

    Satisfacción laboral y éxito agrícola de procesadores de arroz a pequeña escala beneficiarios de microfinanzas en el estado de Jigawa, Nigeria

    Get PDF
    Limitation of research information on job satisfaction and enterprise success-remunerative business going concern of rice value chain actors especially the processors has been a challenge to the sustainability of the supportive policy driven rice value chain in the study area. Therefore, this necessitates research so as to identify the possible pitfalls alongside proffer viable scientific remedies that will enhance the sustainability of the entire rice value chain in the study area. Using a cross-sectional data obtained from a total of 133 and 67 par-boilers and millers respectively, through a well-structured questionnaire and interview schedule, the job satisfaction and agripreneurial success of micro-finance benefitted rice processors in Nigeria’s Jigawa state were investigated. Unlike the millers, it was established that majority of the par-boilers were not satisfied with the job and it owes majorly to poor job security and disincentive attitude of the supportive institutions. However, across the study target groups, despite the few hitches, majority of the enterprises were found to be successful and mutually owes to remunerative turnover ratio of the enterprise among others. Nonetheless, vulnerable household’s composition which exacerbates the pressure on the limited resources with negative consequence on the income capital base affected job satisfaction and agripreneurial success of the processors.  Therefore, the study advises the policymakers to strength the macro-economic policies so as to enhance the sustainability of the entire rice value chain in the study area. Also, concerned stakeholders involved in policymaking need to intensify their campaign on the importance of sustainable livelihood by encouraging most of the actors to maintain a fair household size.La escasa investigación sobre la satisfacción laboral y el éxito agroempresarial en la rentabilidad de los actores de la cadena de valor del arroz, especialmente de los procesadores, ha sido un reto para la sostenibilidad de la cadena de valor del arroz, impulsada por políticas de apoyo en la zona de estudio. Por lo tanto, es necesaria una investigación que identifique las posibles dificultades, para ofrecer soluciones científicas viables que mejoren la sostenibilidad de toda la cadena de valor del arroz en la zona estudiada. Utilizando datos transversales obtenidos de un total de 133 trabajadores que escaldan el arroz y 67 piladores, mediante un cuestionario estructurado y un programa de entrevistas, se investigó la satisfacción laboral y el éxito agroempresarial de los procesadores de arroz beneficiados por la microfinanciación en el estado nigeriano de Jigawa. A diferencia de los piladores, se comprobó que la mayoría de los trabajadores que escaldan el arroz no estaban satisfechos con su trabajo, debido sobre todo a la escasa seguridad laboral y a la actitud desincentivadora de las instituciones de apoyo. Sin embargo, en todos los grupos destinatarios del estudio, a pesar de algunos problemas, la mayoría de las empresas tuvieron éxito, lo que se debe, entre otras cosas, al índice de retorno de inversión de la empresa. No obstante, la composición vulnerable de los hogares, que agrava la presión sobre los limitados recursos con consecuencias negativas sobre la base del capital de ingresos, afectó la satisfacción laboral y al éxito agroempresarial de los procesadores de arroz.  Por tanto, el presente estudio aconseja reforzar las políticas macroeconómicas para mejorar la sostenibilidad de toda la cadena de valor del arroz en la zona estudiada. Asimismo, las partes interesadas que intervienen en la formulación de políticas deben intensificar su campaña sobre la importancia de la sostenibilidad de los medios de vida, animando a la mayoría de los agentes a mantener un tamaño de familia adecuado

    Design, implementation, and evaluation of school-based sexual health education interventions in sub-Saharan Africa

    Get PDF
    School-based sexual health education is commonly used to promote the sexual health of young people and guide them in their relationships. This thesis reports on research that aimed to provide evidence-based recommendations to optimise the effectiveness of school-based sexual health education in sub-Saharan Africa (sSA). There are six chapters in the thesis. Chapter 1 introduces the thesis, Chapters 2 to 5 consist of four empirical studies, and Chapter 6 provides an overall discussion and looks at the strengths, limitations, and implications of the findings. Chapter 2 is a systematic review and meta-analysis of school-based sexual health education in sSA. It provides some evidence of the interventions in promoting self-reported condom use. However, it shows there are no harmful or beneficial effects with respect to sexually transmitted infections (STI) as evidenced by biomarkers. It highlights the paucity of evaluated interventions using biomedical markers, and reports on the process of evaluation, which limits our understanding of why interventions work or do not work. Features associated with effective interventions are noted. Chapter 3 is a case study involving MEMA Kwa Vijana, an adolescent sexual and reproductive health intervention implemented in Tanzania. This study highlights the influence of structural factors in schools and wider environmental factors on the effectiveness of school-based sexual health interventions. Furthermore, it identifies the social and cultural factors that influence young people’s sexual behaviours and that must be addressed beyond the education and health sectors. Chapter 4 is a multiple case study of seven school-based sexual health interventions implemented in five sub-Saharan African countries. It 4 identifies the design, implementation, and evaluation features that differentiate between effective and ineffective interventions. Chapter 5 is a qualitative study of researchers’ experiences of school-based sexual health education in sSA. This study extends previous work by generating a set of valuable recommendations based on researchers’ experiences of interventions that could improve future interventions in sSA. Overall, this research project demonstrates the potential of school-based sexual health education in promoting sexual health and preventing STIs in sSA. It provides a series of recommendations for the design, implementation, and evaluation of school-based sexual health interventions.This work presents independent research funded by the UK National Institute for Health Research(NIHR), School for Public Health research and the NIHR Collaboration for Leadership in Applied Health Research and Care of the South West Peninsula (PenCLAHRC). The views expressed in this paper are those of the authors and not necessarily those of NIHR, the University of Exeter or the UK Department of Health

    Study of similarity metrics for matching network-based personalised human activity recognition.

    Get PDF
    Personalised Human Activity Recognition (HAR) models trained using data from the target user (subject-dependent) have been shown to be superior to non personalised models that are trained on data from a general population (subject-independent). However, from a practical perspective, collecting sufficient training data from end users to create subject-dependent models is not feasible. We have previously introduced an approach based on Matching networks which has proved effective for training personalised HAR models while requiring very little data from the end user. Matching networks perform nearest-neighbour classification by reusing the class label of the most similar instances in a provided support set, which makes them very relevant to case-based reasoning. A key advantage of matching networks is that they use metric learning to produce feature embeddings or representations that maximise classification accuracy, given a chosen similarity metric. However, to the best of our knowledge, no study has been provided into the performance of different similarity metrics for matching networks. In this paper, we present a study of five different similarity metrics: Euclidean, Manhattan, Dot Product, Cosine and Jaccard, for personalised HAR. Our evaluation shows that substantial differences in performance are achieved using different metrics, with Cosine and Jaccard producing the best performance

    Personalised human activity recognition using matching networks.

    Get PDF
    Human Activity Recognition (HAR) is typically modelled as a classification task where sensor data associated with activity labels are used to train a classifier to recognise future occurrences of these activities. An important consideration when training HAR models is whether to use training data from a general population (subject-independent), or personalised training data from the target user (subject-dependent). Previous evaluations have shown personalised training to be more accurate because of the ability of resulting models to better capture individual users' activity patterns. From a practical perspective however, collecting sufficient training data from end users may not be feasible. This has made using subject-independent training far more common in real-world HAR systems. In this paper, we introduce a novel approach to personalised HAR using a neural network architecture called a matching network. Matching networks perform nearest-neighbour classification by reusing the class label of the most similar instances in a provided support set, which makes them very relevant to case-based reasoning. A key advantage of matching networks is that they use metric learning to produce feature embeddings or representations that maximise classification accuracy, given a chosen similarity metric. Evaluations show our approach to substantially out perform general subject-independent models by at least 6% macro-averaged F1 score

    A knowledge-light approach to personalised and open-ended human activity recognition.

    Get PDF
    Human Activity Recognition (HAR) is a core component of clinical decision support systems that rely on activity monitoring for self-management of chronic conditions such as Musculoskeletal Disorders. Deployment success of such applications in part depend on their ability to adapt to individual variations in human movement and to facilitate a range of human activity classes. Research in personalised HAR aims to learn models that are sensitive to the subtle nuances in human movement whilst Open-ended HAR learns models that can recognise activity classes out of the pre-defined set available at training. Current approaches to personalised HAR impose a data collection burden on the end user; whilst Open-ended HAR algorithms are heavily reliant on intermediary-level class descriptions. Instead of these 'knowledge-intensive' HAR algorithms; in this article, we propose a 'knowledge-light' method. Specifically, we show how by using a few seconds of raw sensor data, obtained through micro-interactions with the end-user, we can effectively personalise HAR models and transfer recognition functionality to new activities with zero re-training of the model after deployment. We introduce a Personalised Open-ended HAR algorithm, MNZ, a user context aware Matching Network architecture and evaluate on 3 HAR data sources. Performance results show up to 48.9% improvement with personalisation and up to 18.3% improvement compared to the most common 'knowledge-intensive' Open-ended HAR algorithms

    Determination of fatty acids and physicochemical properties of neem (Azadrachta indica L) seed oil extracts

    Get PDF
    Neem tree is a folklore plant mostly used in medicinal preparations. Therefore, neem seeds were investigated with the aim of determining its fatty acid composition and physicochemical properties of the oil extract. The oil was extracted from the powdered seed using n-hexane with the help of Soxhlet which yielded 29.71% oil. Results revealed that the oil was liquid at room temperature and physically stable at varying temperatures (0, 50 and 100°C). It appeared to be pale greenish yellow, garlic-like odour, had a little bitter taste, viscosity of 12.2Pas and pH value of 6.78 ± 0.0135. The chemical parameters were identified to be 1.22 ± 0.029%, 2.36 ± 0.054 mg NaOH/g oil, 172.84 ± 0.559 mgNaOH/g oil and 1.88 ± 0.059 meq/kg oil for free fatty acids, acid value, saponification value and peroxide value respectively. The GC-MS analysis showed that the oil extract contained six different fatty acids with total composition of 63.07% oil. The compound with the highest composition was linoleic acid (40%) followed by oleic (35%), cis-13-octadecenoic acid (8.9%), palmitic acid (8.5%), stearic acid (7.5%) while the least compound was cis-vaccenic acid (0.5%). However, contrary to previous work where it was reported that oleic acid or linoleic acid was the dominant fatty acid found in neem oil. Linoleic acid was found to be dominant in this current research work. It is however recommended that under-utilized neem seeds should be explored the more with a view to producing viable products
    • …
    corecore