4,498 research outputs found

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

    An expert botanical feature extraction technique based on phenetic features for identifying plant species

    Get PDF
    In this paper, we present a new method to recognise the leaf type and identify plant species using phenetic parts of the leaf; lobes, apex and base detection. Most of the research in this area focuses on the popular features such as the shape, colour, vein, and texture, which consumes large amounts of computational processing and are not efficient, especially in the Acer database with a high complexity structure of the leaves. This paper is focused on phenetic parts of the leaf which increases accuracy. Detecting the local maxima and local minima are done based on Centroid Contour Distance for Every Boundary Point, using north and south region to recognise the apex and base. Digital morphology is used to measure the leaf shape and the leaf margin. Centroid Contour Gradient is presented to extract the curvature of leaf apex and base. We analyse 32 leaf images of tropical plants and evaluated with two different datasets, Flavia, and Acer. The best accuracy obtained is 94.76% and 82.6% respectively. Experimental results show the effectiveness of the proposed technique without considering the commonly used features with high computational cost

    High-throughput analysis and advanced search for visually-observed phenotypes

    Get PDF
    Title from PDF of title page (University of Missouri--Columbia, viewed on May 13, 2013).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. Chi-Ren ShyuIncludes bibliographical references.Vita.Ph. D. University of Missouri--Columbia 2012."May 2012"The trend in many scientific disciplines today, especially in biology and genetics, is towards larger scale experiments in which a tremendous amount of data is generated. As imaging of data becomes increasingly more popular in experiments related to phenotypes, the ability to perform high-throughput big data analyses and to efficiently locate specific information within these data based on increasingly complicated and varying search criteria is of great importance to researchers. This research develops several methods for high-throughput phenotype analysis. This notably includes a registration algorithm called variable object pattern matching for mapping multiple indistinct and dynamic objects across images and detecting the presence of missing, extra, and merging objects. Research accomplishments resulted in a number of unique advanced search mechanisms including a retrieval engine that integrates multiple phenotype text sources and domain ontologies and a search method that retrieves objects based on temporal semantics and behavior. These search mechanisms represent the first of their kind in the phenotype community. While this computational framework is developed primarily for the plant community, it has potential applications in other domains including the medical field.Includes bibliographical references

    A look inside the Pl@ntNet experience

    Get PDF
    International audiencePl@ntNet is an innovative participatory sensing platform relying on image-based plants identification as a mean to enlist non-expert contributors and facilitate the production of botanical observation data. One year after the public launch of the mobile application, we carry out a self-critical evaluation of the experience with regard to the requirements of a sustainable and effective ecological surveillance tool. We first demonstrate the attractiveness of the developed multimedia system (with more than 90K end-users) and the nice self-improving capacities of the whole collaborative workflow. We then point out the current limitations of the approach towards producing timely and accurate distribution maps of plants at a very large scale. We discuss in particular two main issues: the bias and the incompleteness of the produced data. We finally open new perspectives and describe upcoming realizations towards bridging these gaps

    ChatGPT in the context of precision agriculture data analytics

    Full text link
    In this study we argue that integrating ChatGPT into the data processing pipeline of automated sensors in precision agriculture has the potential to bring several benefits and enhance various aspects of modern farming practices. Policy makers often face a barrier when they need to get informed about the situation in vast agricultural fields to reach to decisions. They depend on the close collaboration between agricultural experts in the field, data analysts, and technology providers to create interdisciplinary teams that cannot always be secured on demand or establish effective communication across these diverse domains to respond in real-time. In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports. The large language models map the speech input to text, allowing the user to form its own version of unconstrained verbal query, raising the barrier of having to learn and adapt oneself to a specific data analytics software. The output of the language model can interact through Python code and Pandas with the entire database, visualize the results and use speech synthesis to engage the user in an iterative and refining discussion related to the data. We show three ways of how ChatGPT can interact with the database of the remote server to which a dispersed network of different modalities (optical counters, vibration recordings, pictures, and video), report. We examine the potential and the validity of the response of ChatGPT in analyzing, and interpreting agricultural data, providing real time insights and recommendations to stakeholdersComment: 33 pages, 21 figure

    Remote Sensing Information Sciences Research Group, Santa Barbara Information Sciences Research Group, year 3

    Get PDF
    Research continues to focus on improving the type, quantity, and quality of information which can be derived from remotely sensed data. The focus is on remote sensing and application for the Earth Observing System (Eos) and Space Station, including associated polar and co-orbiting platforms. The remote sensing research activities are being expanded, integrated, and extended into the areas of global science, georeferenced information systems, machine assissted information extraction from image data, and artificial intelligence. The accomplishments in these areas are examined

    Unipept: computational exploration of metaproteome data

    Get PDF

    Orymold: ontology based gene expression data integration and analysis tool applied to rice

    Get PDF
    Background: Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies). Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces. Results: We designed and implemented a software tool that allows investigators to create their own semantic model of an organism and to use it to dynamically integrate expression data obtained from DNA microarrays and other probe based technologies. The software provides tools to use the semantic model to postulate and validate of hypotheses on the spatial and temporal expression and function of genes. In order to illustrate the software's use and features, we used it to build a semantic model of rice (Oryza sativa) and integrated experimental data into it. Conclusion: In this paper we describe the development and features of a flexible software application for dynamic gene expression data annotation, integration, and exploration called Orymold. Orymold is freely available for non-commercial users from http://www.oryzon.com/media/orymold.html webcit

    High-dimensional indexing methods utilizing clustering and dimensionality reduction

    Get PDF
    The emergence of novel database applications has resulted in the prevalence of a new paradigm for similarity search. These applications include multimedia databases, medical imaging databases, time series databases, DNA and protein sequence databases, and many others. Features of data objects are extracted and transformed into high-dimensional data points. Searching for objects becomes a search on points in the high-dimensional feature space. The dissimilarity between two objects is determined by the distance between two feature vectors. Similarity search is usually implemented as nearest neighbor search in feature vector spaces. The cost of processing k-nearest neighbor (k-NN) queries via a sequential scan increases as the number of objects and the number of features increase. A variety of multi-dimensional index structures have been proposed to improve the efficiency of k-NN query processing, which work well in low-dimensional space but lose their efficiency in high-dimensional space due to the curse of dimensionality. This inefficiency is dealt in this study by Clustering and Singular Value Decomposition - CSVD with indexing, Persistent Main Memory - PMM index, and Stepwise Dimensionality Increasing - SDI-tree index. CSVD is an approximate nearest neighbor search method. The performance of CSVD with indexing is studied and the approximation to the distance in original space is investigated. For a given Normalized Mean Square Error - NMSE, the higher the degree of clustering, the higher the recall. However, more clusters require more disk page accesses. Certain number of clusters can be obtained to achieve a higher recall while maintaining a relatively lower query processing cost. Clustering and Indexing using Persistent Main Memory - CIPMM framework is motivated by the following consideration: (a) a significant fraction of index pages are accessed randomly, incurring a high positioning time for each access; (b) disk transfer rate is improving 40% annually, while the improvement in positioning time is only 8%; (c) query processing incurs less CPU time for main memory resident than disk resident indices. CIPMM aims at reducing the elapsed time for query processing by utilizing sequential, rather than random disk accesses. A specific instance of the CIPMM framework CIPOP, indexing using Persistent Ordered Partition - OP-tree, is elaborated and compared with clustering and indexing using the SR-tree, CISR. The results show that CIPOP outperforms CISR, and the higher the dimensionality, the higher the performance gains. The SDI-tree index is motivated by fanouts decrease with dimensionality increasing and shorter vectors reduce cache misses. The index is built by using feature vectors transformed via principal component analysis, resulting in a structure with fewer dimensions at higher levels and increasing the number of dimensions from one level to the other. Dimensions are retained in nonincreasing order of their variance according to a parameter p, which specifies the incremental fraction of variance at each level of the index. Experiments on three datasets have shown that SDL-trees with carefully tuned parameters access fewer disk accesses than SR-trees and VAMSR-trees and incur less CPU time than VA-Files in addition
    corecore