542 research outputs found

    DNA metabarcoding for the identification of species within vegetarian food samples

    Get PDF
    >Magister Scientiae - MScAims DNA metabarcoding has recently emerged as a valuable supplementary tool to ensure food authenticity within the global food market. However, it is widely known that highly processed food samples are one of DNA metabarcoding’s greatest shortfalls due to high DNA degradation, presence of PCR inhibitors and the incomplete removal of several undesirable compounds (such as polysaccharides) that makes the amplification of desired DNA challenging. This project has two main aims, the first of which was to determine and develop a cost and time effective DNA metabarcoding system that could successfully describe to species level the ingredient composition of highly processed vegetarian food products. The DNA metabarcoding system was thoroughly evaluated and tested by combining well-researched primers with varying concentrations into a multiplex reaction. The combination of plant and animal primers selected that yielded the best results were used to determine the species composition in the samples. The second aim is to determine the possible presence of meat contaminants within the highly processed vegetarian food samples. Numerous studies have shown that food adulteration is a wide-spread phenomenon throughout the world due to the economic gains it can provide. Animal primers were introduced into the multiplex reaction to aid in the identification of any meat products that could have been inserted into the vegetarian products to lower the overall cost to company. Methodology Thirty-two highly processed vegetarian food samples were collected in the Cape Town area from local and franchised supermarkets. DNA was extracted using the Chloroform/Isoamyl alcohol method best suited for plant-based samples followed by amplification of the following mini-barcoding regions: the mitochondrial 16S ribosomal rRNA, cytochrome B, tRNALeu – trnL – UAA intron and the ribosomal internal transcribed spacer region – ITS2 for plant and fungi identification. The PCR products were purified using the Qiaquick kit and library preparation and building was conducted using the TruSeq DNA PCR-free Library kit. Final purification was completed using AMPure XP kit and the pooled libraries were sequenced on an Illumina Miseq using 300bp paired-end run. Statistical and bioinformatic analysis on the NGS raw sequence reads was performed in R version 3.6.3. Results The results of the data analysis showed that the cytochrome B primer couldn’t detect any animal DNA in the vegetarian samples, however animal-derived sequences were detected in the positives present, validating the efficacy of the multiplex reaction. Mitochondrial 16S ribosomal rRNA was only able to detect plant-based DNA due to the structural homology between chloroplast and mitochondrial DNA. The fungal ribosomal internal transcribed spacer region – ITS2 detected sequences deriving from “Viridiplantae”. This result could have been due to the fungal and plant ribosomal internal transcribed spacer region – ITS2 sharing a reverse primer during amplification. The trnL region was able to detect the presence of undeclared coriander, mustard and wheat in 8 (29%), 6 (21%) and 5 (18%) samples respectively. Additionally, trnL was able to detect the presence of tobacco in 11 (35%) samples. This could have been due to cross-contamination between samples being co-extracted and amplified at the same time for separate studies. The PITS2 region was able to detect the presence of undeclared barley, mustard and wheat in 8 (25%), 4 (14%) and 4 (14%) samples respectively. Our results show the possibility of DNA metabarcoding for the authentication of a wide range of species present in highly processed vegetarian samples using a single assay. However, further optimization of the technique for the identification of both plant and animal species within vegetarian samples needs to be performed before the wide-spread implementation of this technology would be both feasible and viable. Eliminating primer biases, decreasing the risk of homology between different primers in the same assay as well as preventing the amplification of sequencing of undesirable DNA need to be further explored and ultimately mitigated before DNA metabarcoding can be widely seen as an effective and cost-effective method for authentication and food control

    'Don't Get Too Technical with Me': A Discourse Structure-Based Framework for Science Journalism

    Full text link
    Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style.Comment: Accepted to EMNLP 202

    Development of Open source Laboratory Information Management System (LIMS) For Human Biobanking

    Get PDF
    Magister Scientiae - MSc (Bioinformatics)Biobanks are collections of biological samples and associated data for future use. The day to day activities in a biobank laboratory is underpinned by a laboratory information management system (LIMS). For example, the LIMS manages the execution of tests on biospecimens and track their movement and processing through the laboratory. There are a range of commercially available Biobank LIMS systems on the market but their costs are prohibitive in a resource limited setting. The cost of Commercial off-the-shelf software includes the initial cost of acquiring the system, as well as the cost of maintenance and support throughout the software's life cycle. The Bika LIMS system on the other hand is Free and open source software (FOSS) with decreased license cost, used routinely in non-medical laboratories. Ideally, if Bika LIMS could be customised to handle human biospecimens, then both biobanks and genetics laboratories could benefit. Central to any biobank functionality in Bika LIMS is the ability to import information from routine biomedical equipment. We identified two instruments that are key to human biobanking and are lacking in Bika LIMS namely BioDrop ?LITE and the Qubit Fluorometric instrument. Import interfaces for importing DNA/RNA concentration analyses from these instruments and management of the results with associated sample information would add value to the LIMS. The aim of the thesis was to customise Bika LIMS for utility in a biomedical laboratory. In collaboration with colleagues at Tygerberg medical school, the Bika LIMS software was customised to accommodate the DNA and RNA concentration analyses results for a pathology laboratory and the LIMS workflows customised for use at Tygerberg medical school. In this process the manual operations of Tygerberg medical school laboratory would migrate to the use of Bika LIMS. The analytical module in Bika LIMS was implemented using PYTHON, by using logic that allows importing of specific analyses. A template was created for the BioDrop ?LITE and Qubit Fluorometric instruments used for developing the interface for an analysis import form. The instruments generate results in CSV file format. A parser was created to read and parse the files uploaded from the import form, by splitting them into parts, extracting the data, and populating key-value pairs. The controller manages the submission of the form by initialising the parser that imports the specific file into the LIMS where it is managed by the configured Bika LIMS workflow

    Modelling of a System for the Detection of Weak Signals Through Text Mining and NLP. Proposal of Improvement by a Quantum Variational Circuit

    Full text link
    Tesis por compendio[ES] En esta tesis doctoral se propone y evalúa un sistema para detectar señales débiles (weak signals) relacionadas con cambios futuros trascendentales. Si bien la mayoría de las soluciones conocidas se basan en el uso de datos estructurados, el sistema propuesto detecta cuantitativamente estas señales utilizando información heterogénea y no estructurada de fuentes científicas, periodísticas y de redes sociales. La predicción de nuevas tendencias en un medio tiene muchas aplicaciones. Por ejemplo, empresas y startups se enfrentan a cambios constantes en sus mercados que son muy difíciles de predecir. Por esta razón, el desarrollo de sistemas para detectar automáticamente cambios futuros significativos en una etapa temprana es relevante para que cualquier organización tome decisiones acertadas a tiempo. Este trabajo ha sido diseñado para obtener señales débiles del futuro en cualquier campo dependiendo únicamente del conjunto de datos de entrada de documentos. Se aplican técnicas de minería de textos y procesamiento del lenguaje natural para procesar todos estos documentos. Como resultado, se obtiene un mapa con un ranking de términos, una lista de palabras clave clasificadas automáticamente y una lista de expresiones formadas por múltiples palabras. El sistema completo se ha probado en cuatro sectores diferentes: paneles solares, inteligencia artificial, sensores remotos e imágenes médicas. Este trabajo ha obtenido resultados prometedores, evaluados con dos metodologías diferentes. Como resultado, el sistema ha sido capaz de detectar de forma satisfactoria nuevas tendencias en etapas muy tempranas que se han vuelto cada vez más importantes en la actualidad. La computación cuántica es un nuevo paradigma para una multitud de aplicaciones informáticas. En esta tesis doctoral también se presenta un estudio de las tecnologías disponibles en la actualidad para la implementación física de qubits y puertas cuánticas, estableciendo sus principales ventajas y desventajas, y los marcos disponibles para la programación e implementación de circuitos cuánticos. Con el fin de mejorar la efectividad del sistema, se describe un diseño de un circuito cuántico basado en máquinas de vectores de soporte (SVM) para la resolución de problemas de clasificación. Este circuito está especialmente diseñado para los ruidosos procesadores cuánticos de escala intermedia (NISQ) que están disponibles actualmente. Como experimento, el circuito ha sido probado en un computador cuántico real basado en qubits superconductores por IBM como una mejora para el subsistema de minería de texto en la detección de señales débiles. Los resultados obtenidos con el experimento cuántico muestran también conclusiones interesantes y una mejora en el rendimiento de cerca del 20% sobre los sistemas convencionales, pero a su vez confirman que aún se requiere un desarrollo tecnológico continuo para aprovechar al máximo la computación cuántica.[CA] En aquesta tesi doctoral es proposa i avalua un sistema per detectar senyals febles (weak signals) relacionats amb canvis futurs transcendentals. Si bé la majoria de solucions conegudes es basen en l'ús de dades estructurades, el sistema proposat detecta quantitativament aquests senyals utilitzant informació heterogènia i no estructurada de fonts científiques, periodístiques i de xarxes socials. La predicció de noves tendències en un medi té moltes aplicacions. Per exemple, empreses i startups s'enfronten a canvis constants als seus mercats que són molt difícils de predir. Per això, el desenvolupament de sistemes per detectar automàticament canvis futurs significatius en una etapa primerenca és rellevant perquè les organitzacions prenguen decisions encertades a temps. Aquest treball ha estat dissenyat per obtenir senyals febles del futur a qualsevol camp depenent únicament del conjunt de dades d'entrada de documents. S'hi apliquen tècniques de mineria de textos i processament del llenguatge natural per processar tots aquests documents. Com a resultat, s'obté un mapa amb un rànquing de termes, un llistat de paraules clau classificades automàticament i un llistat d'expressions formades per múltiples paraules. El sistema complet s'ha provat en quatre sectors diferents: panells solars, intel·ligència artificial, sensors remots i imatges mèdiques. Aquest treball ha obtingut resultats prometedors, avaluats amb dues metodologies diferents. Com a resultat, el sistema ha estat capaç de detectar de manera satisfactòria noves tendències en etapes molt primerenques que s'han tornat cada cop més importants actualment. La computació quàntica és un paradigma nou per a una multitud d'aplicacions informàtiques. En aquesta tesi doctoral també es presenta un estudi de les tecnologies disponibles actualment per a la implementació física de qubits i portes quàntiques, establint-ne els principals avantatges i desavantatges, i els marcs disponibles per a la programació i implementació de circuits quàntics. Per tal de millorar l'efectivitat del sistema, es descriu un disseny d'un circuit quàntic basat en màquines de vectors de suport (SVM) per resoldre problemes de classificació. Aquest circuit està dissenyat especialment per als sorollosos processadors quàntics d'escala intermèdia (NISQ) que estan disponibles actualment. Com a experiment, el circuit ha estat provat en un ordinador quàntic real basat en qubits superconductors per IBM com una millora per al subsistema de mineria de text. Els resultats obtinguts amb l'experiment quàntic també mostren conclusions interessants i una millora en el rendiment de prop del 20% sobre els sistemes convencionals, però a la vegada confirmen que encara es requereix un desenvolupament tecnològic continu per aprofitar al màxim la computació quàntica.[EN] In this doctoral thesis, a system to detect weak signals related to future transcendental changes is proposed and tested. While most known solutions are based on the use of structured data, the proposed system quantitatively detects these signals using heterogeneous and unstructured information from scientific, journalistic, and social sources. Predicting new trends in an environment has many applications. For instance, companies and startups face constant changes in their markets that are very difficult to predict. For this reason, developing systems to automatically detect significant future changes at an early stage is relevant for any organization to make right decisions on time. This work has been designed to obtain weak signals of the future in any field depending only on the input dataset of documents. Text mining and natural language processing techniques are applied to process all these documents. As a result, a map of ranked terms, a list of automatically classified keywords and a list of multi-word expressions are obtained. The overall system has been tested in four different sectors: solar panels, artificial intelligence, remote sensing, and medical imaging. This work has obtained promising results that have been evaluated with two different methodologies. As a result, the system was able to successfully detect new trends at a very early stage that have become more and more important today. Quantum computing is a new paradigm for a multitude of computing applications. This doctoral thesis also presents a study of the technologies that are currently available for the physical implementation of qubits and quantum gates, establishing their main advantages and disadvantages and the available frameworks for programming and implementing quantum circuits. In order to improve the effectiveness of the system, a design of a quantum circuit based on support vector machines (SVMs) is described for the resolution of classification problems. This circuit is specially designed for the noisy intermediate-scale quantum (NISQ) computers that are currently available. As an experiment, the circuit has been tested on a real quantum computer based on superconducting qubits by IBM as an improvement for the text mining subsystem in the detection of weak signals. The results obtained with the quantum experiment show interesting outcomes with an improvement of close to 20% better performance than conventional systems, but also confirm that ongoing technological development is still required to take full advantage of quantum computing.Griol Barres, I. (2022). Modelling of a System for the Detection of Weak Signals Through Text Mining and NLP. Proposal of Improvement by a Quantum Variational Circuit [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/183029TESISCompendi

    Quantum inspired approach for early classification of time series

    Get PDF
    Is it possible to apply some fundamental principles of quantum-computing to time series classi\ufb01cation algorithms? This is the initial spark that became the research question I decided to chase at the very beginning of my PhD studies. The idea came accidentally after reading a note on the ability of entanglement to express the correlation between two particles, even far away from each other. The test problem was also at hand because I was investigating on possible algorithms for real time bot detection, a challenging problem at present day, by means of statistical approaches for sequential classi\ufb01cation. The quantum inspired algorithm presented in this thesis stemmed as an evolution of the statistical method mentioned above: it is a novel approach to address binary and multinomial classi\ufb01cation of an incoming data stream, inspired by the principles of Quantum Computing, in order to ensure the shortest decision time with high accuracy. The proposed approach exploits the analogy between the intrinsic correlation of two or more particles and the dependence of each item in a data stream with the preceding ones. Starting from the a-posteriori probability of each item to belong to a particular class, we can assign a Qubit state representing a combination of the aforesaid probabilities for all available observations of the time series. By leveraging superposition and entanglement on subsequences of growing length, it is possible to devise a measure of membership to each class, thus enabling the system to take a reliable decision when a suf\ufb01cient level of con\ufb01dence is met. In order to provide an extensive and thorough analysis of the problem, a well-\ufb01tting approach for bot detection was replicated on our dataset and later compared with the statistical algorithm to determine the best option. The winner was subsequently examined against the new quantum-inspired proposal, showing the superior capability of the latter in both binary and multinomial classi\ufb01cation of data streams. The validation of quantum-inspired approach in a synthetically generated use case, completes the research framework and opens new perspectives in on-the-\ufb02y time series classi\ufb01cation, that we have just started to explore. Just to name a few ones, the algorithm is currently being tested with encouraging results in predictive maintenance and prognostics for automotive, in collaboration with University of Bradford (UK), and in action recognition from video streams

    The 10th Jubilee Conference of PhD Students in Computer Science

    Get PDF

    Reinforcement Learning for Generative AI: A Survey

    Full text link
    Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distribution and the target distribution. This formulation successfully establishes the objective of generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a generative model. Reinforcement learning, serving as a competitive option to inject new training signals by creating new objectives that exploit novel signals, has demonstrated its power and flexibility to incorporate human inductive bias from multiple angles, such as adversarial learning, hand-designed rules and learned reward model to build a performant model. Thereby, reinforcement learning has become a trending research field and has stretched the limits of generative AI in both model design and application. It is reasonable to summarize and conclude advances in recent years with a comprehensive review. Although there are surveys in different application areas recently, this survey aims to shed light on a high-level review that spans a range of application areas. We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications. Notably, we also surveyed the fast-developing large language model area. We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI

    Hidden Citations Obscure True Impact in Science

    Full text link
    References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus

    Smart Monitoring and Control in the Future Internet of Things

    Get PDF
    The Internet of Things (IoT) and related technologies have the promise of realizing pervasive and smart applications which, in turn, have the potential of improving the quality of life of people living in a connected world. According to the IoT vision, all things can cooperate amongst themselves and be managed from anywhere via the Internet, allowing tight integration between the physical and cyber worlds and thus improving efficiency, promoting usability, and opening up new application opportunities. Nowadays, IoT technologies have successfully been exploited in several domains, providing both social and economic benefits. The realization of the full potential of the next generation of the Internet of Things still needs further research efforts concerning, for instance, the identification of new architectures, methodologies, and infrastructures dealing with distributed and decentralized IoT systems; the integration of IoT with cognitive and social capabilities; the enhancement of the sensing–analysis–control cycle; the integration of consciousness and awareness in IoT environments; and the design of new algorithms and techniques for managing IoT big data. This Special Issue is devoted to advancements in technologies, methodologies, and applications for IoT, together with emerging standards and research topics which would lead to realization of the future Internet of Things
    • …
    corecore