552 research outputs found
NetiNeti : Discovery of Scientific Names from Text Using Machine Learning Methods Figure 1
Figure 1 demonstrates a series of training experiments with the Naïve Bayes classifier using different neighborhoods for contextual features, different sizes of positive and
negative training examples and evaluated the resulting classifiers with our annotated
gold standard corpus.
The data sets are the results of running NetiNeti on subset of 136 PubMedCentral tagged open access articles and with no stop list.A scientific name for an organism can be associated with almost all biological data.
Name identification is an important step in many text mining tasks aiming to extract
useful information from biological, biomedical and biodiversity text sources. A
scientific name acts as an important metadata element to link biological information.We present NetiNeti, a machine learning based approach for identification and
discovery of scientific names. The system implementing the approach can be accessed
at http://namefinding.ubio.org we present the comparison results of various machine
learning algorithms on our annotated corpus. Naïve Bayes and Maximum Entropy
with Generalized Iterative Scaling (GIS) parameter estimation are the top two
performing algorithms
Information retrieval and text mining technologies for chemistry
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European
Community’s Horizon 2020 Program (project reference:
654021 - OpenMinted). M.K. additionally acknowledges the
Encomienda MINETAD-CNIO as part of the Plan for the
Advancement of Language Technology. O.R. and J.O. thank
the Foundation for Applied Medical Research (FIMA),
University of Navarra (Pamplona, Spain). This work was
partially funded by Consellería
de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic
funding of UID/BIO/04469/2013 unit and COMPETE 2020
(POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi
for useful feedback and discussions during the preparation of
the manuscript.info:eu-repo/semantics/publishedVersio
Management of Scientific Images: An approach to the extraction, annotation and retrieval of figures in the field of High Energy Physics
El entorno de la información en la primera década del siglo XXI no tiene precedentes. Las barreras físicas que han limitado el acceso al conocimiento están desapareciendo a medida que los métodos tradicionales de acceso a información se reemplazan o se mejoran gracias al uso de sistemas basados en computador. Los sistemas digitales son capaces de gestionar colecciones mucho más grandes de documentos, confrontando a los usuarios de información con la avalancha de documentos asociados a su tópico de interés. Esta nueva situación ha creado un incentivo para el desarrollo de técnicas de minería de datos y la creación de motores de búsqueda más eficientes y capaces de limitar los resultados de búsqueda a un subconjunto reducido de los más relevantes. Sin embargo, la mayoría de los motores de búsqueda en la actualidad trabajan con descripciones textuales. Estas descripciones se pueden extraer o bien del contenido o a través de fuentes externas. La recuperación basada en el contenido no textual de documentos es un tema de investigación continua. En particular, la recuperación de imágenes y el desentrañar la información contenida en ellas están suscitando un gran interés en la comunidad científica. Las bibliotecas digitales se sitúan en una posición especial dentro de los sistemas que facilitan el acceso al conocimiento. Actúan como repositorios de documentos que comparten algunas características comunes (por ejemplo, pertenecer a la misma área de conocimiento o ser publicados por la misma institución) y como tales contienen documentos considerados de interés para un grupo particular de usuarios. Además, facilitan funcionalidades de recuperación sobre las colecciones gestionadas. Normalmente, las publicaciones científicas son las unidades más pequeñas gestionadas por las bibliotecas digitales científicas. Sin embargo, en el proceso de creación científica hay diferentes tipos de artefactos, entre otros: figuras y conjuntos de datos. Las figuras juegan un papel particularmente importante en el proceso de publicación científica. Representan los datos en una forma gráfica que nos permite mostrar patrones sobre grandes conjuntos de datos y transmitir ideas complejas de un modo fácilmente entendible. Los sistemas existentes para bibliotecas digitales facilitan el acceso a figuras, pero solo como parte de los ficheros sobre los que se serializa la publicación entera. El objetivo de esta tesis es proponer un conjunto de métodos ytécnicas que permitan transformar las figuras en productos de primera clase dentro del proceso de publicación científica, permitiendo que los investigadores puedan obtener el máximo beneficio a la hora de realizar búsquedas y revisiones de bibliografía existente. Los métodos y técnicas propuestos están orientados a facilitar la adquisición, anotación semántica y búsqueda de figuras contenidas en publicaciones científicas. Para demostrar la completitud de la investigación se han ilustrado las teorías propuestas mediante ejemplos en el campo de la Física de Partículas (también conocido como Física de Altas Energías). Para aquellos casos en los que se han necesitadoo en las figuras que aparecen con más frecuencia en las publicaciones de Física de Partículas: los gráficos científicos denominados en inglés con el término plots. Los prototipos que propuestas más detalladas han desarrollado para esta tesis se han integrado parcialmente dentro del software Invenio (1) para bibliotecas digitales, así como dentro de INSPIRE, una de las mayores bibliotecas digitales en Física de Partículas mantenida gracias a la colaboración de grandes laboratorios y centros de investigación como son el CERN, SLAC, DESY y Fermilab. 1). http://invenio-software.org
Statistical assessment on Non-cooperative Target Recognition using the Neyman-Pearson statistical test
Electromagnetic simulations of a X-target were performed in order to obtain its Radar Cross
Section (RCS) for several positions and frequencies. The software used is the CST MWS©. A 1 : 5
scale model of the proposed aircraft was created in CATIA© V5 R19 and imported directly into
the CST MWS© environment. Simulations on the X-band were made with a variable mesh size
due to a considerable wavelength variation. It is intended to evaluate the Neyman-Pearson (NP)
simple hypothesis test performance by analyzing its Receiver Operating Characteristics (ROCs)
for two different radar detection scenarios - a Radar Absorbent Material (RAM) coated model,
and a Perfect Electric Conductor (PEC) model for recognition purposes.
In parallel the radar range equation is used to estimate the maximum range detection for the
simulated RAM coated cases to compare their shielding effectiveness (SE) and its consequent
impact on recognition. The AN/APG-68(V)9’s airborne radar specifications were used to compute
these ranges and to simulate an airborne hostile interception for a Non-Cooperative Target
Recognition (NCTR) environment. Statistical results showed weak recognition performances
using the Neyman-Pearson (NP) statistical test. Nevertheless, good RCS reductions for most of
the simulated positions were obtained reflecting in a 50:9% maximum range detection gain for
the PAniCo RAM coating, abiding with experimental results taken from the reviewed literature.
The best SE was verified for the PAniCo and CFC-Fe RAMs.Simulações electromagnéticas do alvo foram realizadas de modo a obter a assinatura radar (RCS)
para várias posições e frequências. O software utilizado é o CST MWS©. O modelo proposto à
escala 1:5 foi modelado em CATIA© V5 R19 e importado diretamente para o ambiente de trabalho
CST MWS©. Foram efectuadas simulações na banda X com uma malha de tamanho variável
devido à considerável variação do comprimento de onda. Pretende-se avaliar estatisticamente
o teste de decisão simples de Neyman-Pearson (NP), analisando as Características de Operação
do Receptor (ROCs) para dois cenários de detecção distintos - um modelo revestido com material
absorvente (RAM), e outro sendo um condutor perfeito (PEC) para fins de detecção.
Em paralelo, a equação de alcance para radares foi usada para estimar o alcance máximo de
detecção para ambos os casos de modo a comparar a eficiência de blindagem electromagnética
(SE) entre os diferentes revestimentos. As especificações do radar AN/APG-68(V)9 do F-16 foram
usadas para calcular os alcances para cada material, simulando uma intercepção hostil num
ambiente de reconhecimento de alvos não-cooperativos (NCTR). Os resultados mostram performances
de detecção fracas usando o teste de decisão simples de Neyman-Pearson como detector
e uma boa redução de RCS para todas as posições na gama de frequências selecionada. Um ganho
de alcance de detecção máximo 50:9 % foi obtido para o RAM PAniCo, estando de acordo com
os resultados experimentais da bibliografia estudada. Já a melhor SE foi verificada para o RAM
CFC-Fe e PAniCo
Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research
- …