763 research outputs found

    MISNIS: an intelligent platform for Twitter topic mining

    Get PDF
    Twitter has become a major tool for spreading news, for dissemination of positions and ideas, and for the commenting and analysis of current world events. However, with more than 500 million tweets flowing per day, it is necessary to find efficient ways of collecting, storing, managing, mining and visualizing all this information. This is especially relevant if one considers that Twitter has no ways of indexing tweet contents, and that the only available categorization “mechanism” is the #hashtag, which is totally dependent of a user's will to use it. This paper presents an intelligent platform and framework, named MISNIS - Intelligent Mining of Public Social Networks’ Influence in Society - that facilitates these issues and allows a non-technical user to easily mine a given topic from a very large tweet's corpus and obtain relevant contents and indicators such as user influence or sentiment analysis. When compared to other existent similar platforms, MISNIS is an expert system that includes specifically developed intelligent techniques that: (1) Circumvent the Twitter API restrictions that limit access to 1% of all flowing tweets. The platform has been able to collect more than 80% of all flowing portuguese language tweets in Portugal when online; (2) Intelligently retrieve most tweets related to a given topic even when the tweets do not contain the topic #hashtag or user indicated keywords. A 40% increase in the number of retrieved relevant tweets has been reported in real world case studies. The platform is currently focused on Portuguese language tweets posted in Portugal. However, most developed technologies are language independent (e.g. intelligent retrieval, sentiment analysis, etc.), and technically MISNIS can be easily expanded to cover other languages and locations

    Genome sequence-based species delimitation with confidence intervals and improved distance functions

    Get PDF
    Background For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept. Results Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions. Conclusions Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms

    Advanced Algorithms for 3D Medical Image Data Fusion in Specific Medical Problems

    Get PDF
    Fúze obrazu je dnes jednou z nejběžnějších avšak stále velmi diskutovanou oblastí v lékařském zobrazování a hraje důležitou roli ve všech oblastech lékařské péče jako je diagnóza, léčba a chirurgie. V této dizertační práci jsou představeny tři projekty, které jsou velmi úzce spojeny s oblastí fúze medicínských dat. První projekt pojednává o 3D CT subtrakční angiografii dolních končetin. V práci je využito kombinace kontrastních a nekontrastních dat pro získání kompletního cévního stromu. Druhý projekt se zabývá fúzí DTI a T1 váhovaných MRI dat mozku. Cílem tohoto projektu je zkombinovat stukturální a funkční informace, které umožňují zlepšit znalosti konektivity v mozkové tkáni. Třetí projekt se zabývá metastázemi v CT časových datech páteře. Tento projekt je zaměřen na studium vývoje metastáz uvnitř obratlů ve fúzované časové řadě snímků. Tato dizertační práce představuje novou metodologii pro klasifikaci těchto metastáz. Všechny projekty zmíněné v této dizertační práci byly řešeny v rámci pracovní skupiny zabývající se analýzou lékařských dat, kterou vedl pan Prof. Jiří Jan. Tato dizertační práce obsahuje registrační část prvního a klasifikační část třetího projektu. Druhý projekt je představen kompletně. Další část prvního a třetího projektu, obsahující specifické předzpracování dat, jsou obsaženy v disertační práci mého kolegy Ing. Romana Petera.Image fusion is one of today´s most common and still challenging tasks in medical imaging and it plays crucial role in all areas of medical care such as diagnosis, treatment and surgery. Three projects crucially dependent on image fusion are introduced in this thesis. The first project deals with the 3D CT subtraction angiography of lower limbs. It combines pre-contrast and contrast enhanced data to extract the blood vessel tree. The second project fuses the DTI and T1-weighted MRI brain data. The aim of this project is to combine the brain structural and functional information that purvey improved knowledge about intrinsic brain connectivity. The third project deals with the time series of CT spine data where the metastases occur. In this project the progression of metastases within the vertebrae is studied based on fusion of the successive elements of the image series. This thesis introduces new methodology of classifying metastatic tissue. All the projects mentioned in this thesis have been solved by the medical image analysis group led by Prof. Jiří Jan. This dissertation concerns primarily the registration part of the first project and the classification part of the third project. The second project is described completely. The other parts of the first and third project, including the specific preprocessing of the data, are introduced in detail in the dissertation thesis of my colleague Roman Peter, M.Sc.

    Recuperação de informação multimodal em repositórios de imagem médica

    Get PDF
    The proliferation of digital medical imaging modalities in hospitals and other diagnostic facilities has created huge repositories of valuable data, often not fully explored. Moreover, the past few years show a growing trend of data production. As such, studying new ways to index, process and retrieve medical images becomes an important subject to be addressed by the wider community of radiologists, scientists and engineers. Content-based image retrieval, which encompasses various methods, can exploit the visual information of a medical imaging archive, and is known to be beneficial to practitioners and researchers. However, the integration of the latest systems for medical image retrieval into clinical workflows is still rare, and their effectiveness still show room for improvement. This thesis proposes solutions and methods for multimodal information retrieval, in the context of medical imaging repositories. The major contributions are a search engine for medical imaging studies supporting multimodal queries in an extensible archive; a framework for automated labeling of medical images for content discovery; and an assessment and proposal of feature learning techniques for concept detection from medical images, exhibiting greater potential than feature extraction algorithms that were pertinently used in similar tasks. These contributions, each in their own dimension, seek to narrow the scientific and technical gap towards the development and adoption of novel multimodal medical image retrieval systems, to ultimately become part of the workflows of medical practitioners, teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais, clínicas e outros centros de diagnóstico, levou à criação de enormes repositórios de dados, frequentemente não explorados na sua totalidade. Além disso, os últimos anos revelam, claramente, uma tendência para o crescimento da produção de dados. Portanto, torna-se importante estudar novas maneiras de indexar, processar e recuperar imagens médicas, por parte da comunidade alargada de radiologistas, cientistas e engenheiros. A recuperação de imagens baseada em conteúdo, que envolve uma grande variedade de métodos, permite a exploração da informação visual num arquivo de imagem médica, o que traz benefícios para os médicos e investigadores. Contudo, a integração destas soluções nos fluxos de trabalho é ainda rara e a eficácia dos mais recentes sistemas de recuperação de imagem médica pode ser melhorada. A presente tese propõe soluções e métodos para recuperação de informação multimodal, no contexto de repositórios de imagem médica. As contribuições principais são as seguintes: um motor de pesquisa para estudos de imagem médica com suporte a pesquisas multimodais num arquivo extensível; uma estrutura para a anotação automática de imagens; e uma avaliação e proposta de técnicas de representation learning para deteção automática de conceitos em imagens médicas, exibindo maior potencial do que as técnicas de extração de features visuais outrora pertinentes em tarefas semelhantes. Estas contribuições procuram reduzir as dificuldades técnicas e científicas para o desenvolvimento e adoção de sistemas modernos de recuperação de imagem médica multimodal, de modo a que estes façam finalmente parte das ferramentas típicas dos profissionais, professores e investigadores da área da saúde.Programa Doutoral em Informátic

    PeachVar-DB : Curated Collection of Genetic Variations for the Interactive Analysis of Peach Genome Data

    Get PDF
    Applying next-generation sequencing (NGS) technologies to species of agricultural interest has the potential to accelerate the understanding and exploration of genetic resources. The storage, availability and maintenance of huge quantities of NGS-generated data remains a major challenge. The PeachVar-DB portal, available at http://hpc-bioinformatics.cineca.it/peach, is an open-source catalog of genetic variants present in peach (Prunus persica L. Batsch) and wild-related species of Prunus genera, annotated from 146 samples publicly released on the Sequence Read Archive (SRA). We designed a user-friendly web-based interface of the database, providing search tools to retrieve single nucleotide polymorphism (SNP) and InDel variants, along with useful statistics and information. PeachVar-DB results are linked to the Genome Database for Rosaceae (GDR) and the Phytozome database to allow easy access to other external useful plant-oriented resources. In order to extend the genetic diversity covered by the PeachVar-DB further, and to allow increasingly powerful comparative analysis, we will progressively integrate newly released data

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    The Flora Incognita app - interactive plant species identification

    Get PDF
    Being able to identify plant species is an important factor for understanding biodiversity and its change due to natural and anthropogenic drivers. We discuss the freely available Flora Incognita app for Android, iOS and Harmony OS devices that allows users to interactively identify plant species and capture their observations. Specifically developed deep learning algorithms, trained on an extensive repository of plant observations, classify plant images with yet unprecedented accuracy. By using this technology in a context-adaptive and interactive identification process, users are now able to reliably identify plants regardless of their botanical knowledge level. Users benefit from an intuitive interface and supplementary educational materials. The captured observations in combination with their metadata provide a rich resource for researching, monitoring and understanding plant diversity. Mobile applications such as Flora Incognita stimulate the successful interplay of citizen science, conservation and education
    corecore