230 research outputs found

    CacophonyViz : Visualisation of birdsong derived ecological health indicators

    Get PDF
    The purpose of this work was to create an easy to interpret visualisation of a simple index that represents the quantity and quality of bird life in New Zealand. The index was calculated from an algorithm that assigned various weights to each species of bird. This work is important as it forms a part of the ongoing work by the Cacophony Project which aims to eradicate pests that currently destroy New Zealand native birds and their habitat. The map will be used to promote the Cacophony project to a wide public audience and encourage their participation by giving relevant feedback on the effects of intervention such as planting and trapping in their communities. The Design Science methodology guided this work through the creation of a series of prototypes that through their evaluation built on lessons learnt at each stage resulting in a final artifact that successfully displayed the index at various locations across a map of New Zealand. It is concluded that the artifact is ready and suitable for deployment once the availability of real data from the automatic analysis of audio recordings from multiple locations becomes available

    EpiJSON: A unified data-format for epidemiology

    Get PDF
    AbstractEpidemiology relies on data but the divergent ways data are recorded and transferred, both within and between outbreaks, and the expanding range of data-types are creating an increasingly complex problem for the discipline. There is a need for a consistent, interpretable and precise way to transfer data while maintaining its fidelity. We introduce ‘EpiJSON’, a new, flexible, and standards-compliant format for the interchange of epidemiological data using JavaScript Object Notation. This format is designed to enable the widest range of epidemiological data to be unambiguously held and transferred between people, software and institutions. In this paper, we provide a full description of the format and a discussion of the design decisions made. We introduce a schema enabling automatic checks of the validity of data stored as EpiJSON, which can serve as a basis for the development of additional tools. In addition, we also present the R package ‘repijson’ which provides conversion tools between this format, line-list data and pre-existing analysis tools. An example is given to illustrate how EpiJSON can be used to store line list data. EpiJSON, designed around modern standards for interchange of information on the internet, is simple to implement, read and check. As such, it provides an ideal new standard for epidemiological, and other, data transfer to the fast-growing open-source platform for the analysis of disease outbreaks

    DNA barcoding and taxonomy: dark taxa and dark texts

    Get PDF
    Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world. Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late-twentieth century remains offline (‘dark texts’). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These ‘dark taxa’ hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global ‘digital dashboard’ to assess the extent to which biodiversity data are available online. This article is part of the themed issue ‘From DNA barcodes to biomes’

    CacophonyViz: Visualisation of Birdsong Derived Ecological Health Indicators

    Get PDF
    The purpose of this work was to create an easy to interpret visualisation of a simple index that represents the quantity and quality of bird life in New Zealand. The index was calculated from an algorithm that assigned various weights to each species of bird. This work is important as it forms a part of the ongoing work by the Cacophony Project which aims to eradicate pests that currently destroy New Zealand native birds and their habitat. The map will be used to promote the Cacophony project to a wide public audience and encourage their participation by giving relevant feedback on the effects of intervention such as planting and trapping in their communities. The Design Science methodology guided this work through the creation of a series of prototypes that through their evaluation built on lessons learnt at each stage resulting in a final artifact that successfully displayed the index at various locations across a map of New Zealand. It is concluded that the artifact is ready and suitable for deployment once the availability of real data from the automatic analysis of audio recordings from multiple locations becomes available

    LARCH: Large Language Model-based Automatic Readme Creation with Heuristics

    Full text link
    Writing a readme is a crucial aspect of software development as it plays a vital role in managing and reusing program code. Though it is a pain point for many developers, automatically creating one remains a challenge even with the recent advancements in large language models (LLMs), because it requires generating an abstract description from thousands of lines of code. In this demo paper, we show that LLMs are capable of generating a coherent and factually correct readmes if we can identify a code fragment that is representative of the repository. Building upon this finding, we developed LARCH (LLM-based Automatic Readme Creation with Heuristics) which leverages representative code identification with heuristics and weak supervision. Through human and automated evaluations, we illustrate that LARCH can generate coherent and factually correct readmes in the majority of cases, outperforming a baseline that does not rely on representative code identification. We have made LARCH open-source and provided a cross-platform Visual Studio Code interface and command-line interface, accessible at https://github.com/hitachi-nlp/larch. A demo video showcasing LARCH's capabilities is available at https://youtu.be/ZUKkh5ED-O4.Comment: This is a pre-print of a paper accepted at CIKM'23 Demo. Refer to the DOI URL for the original publicatio

    Geospatial Data Modeling to Support Energy Pipeline Integrity Management

    Get PDF
    Several hundred thousand miles of energy pipelines span the whole of North America -- responsible for carrying the natural gas and liquid petroleum that power the continent\u27s homes and economies. These pipelines, so crucial to everyday goings-on, are closely monitored by various operating companies to ensure they perform safely and smoothly. Happenings like earthquakes, erosion, and extreme weather, however -- and human factors like vehicle traffic and construction -- all pose threats to pipeline integrity. As such, there is a tremendous need to measure and indicate useful, actionable data for each region of interest, and operators often use computer-based decision support systems (DSS) to analyze and allocate resources for active and potential hazards. We designed and implemented a geospatial data service, REST API for Pipeline Integrity Data (RAPID) to improve the amount and quality of data available to DSS. More specifically, RAPID -- built with a spatial database and the Django web framework -- allows third-party software to manage and query an arbitrary number of geographic data sources through one centralized REST API. Here, we focus on the process and peculiarities of creating RAPID\u27s model and query interface for pipeline integrity management; this contribution describes the design, implementation, and validation of that model, which builds on existing geospatial standards

    A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    Get PDF
    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others

    Enabling Spatio-Temporal Search in Open Data

    Get PDF
    Intuitively, most datasets found in Open Data are organised by spatio-temporal scope, that is, single datasets provide data for a certain region, valid for a certain time period. For many use cases (such as for instance data journalism and fact checking) a pre-dominant need is to scope down the relevant datasets to a particular period or region. Therefore, we argue that spatio-temporal search is a crucial need for Open Data portals and across Open Data portals, yet - to the best of our knowledge - no working solution exists. We argue that - just like for for regular Web search - knowledge graphs can be helpful to significantly improve search: in fact, the ingredients for a public knowledge graph of geographic entities as well as time periods and events exist already on the Web of Data, although they have not yet been integrated and applied - in a principled manner - to the use case of Open Data search. In the present paper we aim at doing just that: we (i) present a scalable approach to construct a spatio-temporal knowledge graph that hierarchically structures geographical, as well as temporal entities, (ii) annotate a large corpus of tabular datasets from open data portals, (iii) enable structured, spatio-temporal search over Open Data catalogs through our spatio-temporal knowledge graph, both via a search interface as well as via a SPARQL endpoint, available at data.wu.ac.at/odgraphsearch/Series: Working Papers on Information Systems, Information Business and Operation

    Bicycles Mobility Prediction

    Get PDF
    The growth in mobile wireless communication requires sharp solutions in handling mobility problems that encompass poor handover management, interference in access points, excessive load in macrocells, and other relevant mobility issues. With the deployment of small cell networks in 5G mobile systems the problems mentioned intensify thus, mobility prediction schemes arise to surpass and mitigate these issues. Predicting mobility is not a trivial task due to the vastness of different variables that characterize a mobility route translating into unpredictability and randomness. Therefore, the task of this work is to overcome these challenges by building a solid mobility prediction architecture that can analyze big data and find patterns in the mobility aspect to ultimately perform reliable predictions. The models introduced in this dissertation are two deep learning schemes based on an Artificial Neural Network (ANN) architecture and a LSTM Long-Short Term Memory (LSTM) architecture. The prediction was made in two levels: Short-term prediction and Long-term prediction. We verified that in the short-term domain both models performed equivalently with successful results. However, in long-term prediction, the LSTM model surpassed the ANN model. Consequently, the LSTM approach constitutes the stronger model in all prediction aspects. Implementing this model in cellular networks is an important asset in optimizing processes such as routing and caching as the cellular networks can allocate the necessary resources to provide a better user experience. With this optimization impact and with the emergence of the Internet of Things (IoT), the prediction model can support and improve the development of smart applications related to our daily mobility routine.O crescimento da comunicação móvel sem fios exige soluções precisas para lidar com problemas de mobilidade que englobam uma gestão pobre de handover, interferência em pontos de acesso, carga excessiva em macrocélulas e outros problemas relevantes ao aspeto da mobilidade. Com a implantação de redes de pequenas células no sistema móvel 5G, os problemas mencionados intensificam-se. Desta forma, são necessários esquemas de previsão de mobilidade para superar e mitigar esses problemas. Prever a mobilidade não é uma tarefa trivial devido à imensidão de diferentes variáveis que caracterizam uma rota de mobilidade, traduzindo-se em grandes dimensões de imprevisibilidade e aleatoriedade. Portanto, a tarefa deste trabalho é superar esses desafios construindo uma arquitetura sólida de estimação de mobilidade, que possa analisar um grande fluxo de dados e encontrar padrões para, em última análise, realizar previsões credíveis e assertivas. Os modelos apresentados nesta dissertação são dois esquemas de deep learning baseados em uma arquitetura de RNA (Rede Neuronal) e uma arquitetura LSTM (Long-Short Term Memory). A previsão foi feita em dois níveis: previsão de curto prazo e previsão de longo prazo. Verificámos que no curto prazo ambos os modelos tiveram um desempenho equivalente com resultados bem sucedidos. No entanto, na previsão de longo prazo, o modelo LSTM superou o modelo ANN. Consequentemente, a abordagem LSTM constitui o modelo mais forte em todos os aspectos de previsão. A implementação deste modelo, em redes celulares, é uma medida importante na otimização de processos como, routing ou caching, proporcionando uma melhor experiência wireless ao utilizador. Com este impacto de otimização e com o surgimento da Internet of Things (IoT), o modelo de previsão pode apoiar e melhorar o desenvolvimento de aplicações inteligentes relacionadas com a nossa rotina diária de mobilidade
    • …
    corecore