9,736 research outputs found
Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most
widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic
and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in
medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer
mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/
outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also
explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude
that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process,
and that with the right computational tools and data collection strategies this process can be made defined and repeatable.
The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text
processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and
research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis
systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority
outside of the authors’ own group) who work in text processing for biomedicine and other areas. GATE is available online
,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and
developer community and also on a commercial basis
A survey on the development status and application prospects of knowledge graph in smart grids
With the advent of the electric power big data era, semantic interoperability
and interconnection of power data have received extensive attention. Knowledge
graph technology is a new method describing the complex relationships between
concepts and entities in the objective world, which is widely concerned because
of its robust knowledge inference ability. Especially with the proliferation of
measurement devices and exponential growth of electric power data empowers,
electric power knowledge graph provides new opportunities to solve the
contradictions between the massive power resources and the continuously
increasing demands for intelligent applications. In an attempt to fulfil the
potential of knowledge graph and deal with the various challenges faced, as
well as to obtain insights to achieve business applications of smart grids,
this work first presents a holistic study of knowledge-driven intelligent
application integration. Specifically, a detailed overview of electric power
knowledge mining is provided. Then, the overview of the knowledge graph in
smart grids is introduced. Moreover, the architecture of the big knowledge
graph platform for smart grids and critical technologies are described.
Furthermore, this paper comprehensively elaborates on the application prospects
leveraged by knowledge graph oriented to smart grids, power consumer service,
decision-making in dispatching, and operation and maintenance of power
equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio
Text mining of biomedical literature: discovering new knowledge
Biomedical literature is increasing day by day. The present scenario shows that the volume of literature regarding “coronavirus” has expanded at a high rate. In this study, text mining technique has been employed to discover something new from the published literature. The main objectives of this study are to show the growth of literature (Jan-Jun, 2020), extract document section, identify latent topics, find the most frequent word, represent the bag of words, and the hierarchical clustering. We have collected 16500 documents from PubMed. This study finds most number of documents (11499) belong to May and June. We explore “betacoronavirus” as the leading document section (3837); “covid” (29890) as the most frequent word in the abstracts; and positive-negative weights of topics. Further, we measure the term frequency (TF) of a document title in the bag of words model. Then we compute a hierarchical clustering of document titles. It reveals that the lowest distance the selected cluster (C133) is 0.30. We also have made a discussion over future prospects and mentioned that this paper can be useful to researchers and library professionals for knowledge management
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Application of pre-training and fine-tuning AI models to machine translation: a case study of multilingual text classification in Baidu
With the development of international information technology, we are producing
a huge amount of information all the time. The processing ability of information in
various languages is gradually replacing information and becoming a rarer resource.
How to obtain the most effective information in such a large and complex amount of
multilingual textual information is a major goal of multilingual information
processing.
Multilingual text classification helps users to break the language barrier and
accurately locate the required information and triage information. At the same time,
the rapid development of the Internet has accelerated the communication among users
of various languages, giving rise to a large number of multilingual texts, such as book
and movie reviews, online chats, product introductions and other forms, which
contain a large amount of valuable implicit information and urgently need automated
tools to categorize and process those multilingual texts.
This work describes the Natural Language Process (NLP) sub-task known as
Multilingual Text Classification (MTC) performed within the context of Baidu, a
Chinese leading AI company with a strong Internet base, whose NLP division led the
industry in deep learning technology to go online in Machine Translation (MT) and
search. Multilingual text classification is an important module in NLP machine
translation and a basic module in NLP tasks. It can be applied to many fields, such as
Fake Reviews Detection, News Headlines Categories Classification, Analysis of
positive and negative reviews and so on.
In the following work, we will first define the AI model paradigm of
'pre-training and fine-tuning' in deep learning in the Baidu NLP department. Then
investigated the application scenarios of multilingual text classification. Most of the
text classification systems currently available in the Chinese market are designed for a
single language, such as Alibaba's text classification system. If users need to classify
texts of the same category in multiple languages, they need to train multiple single
text classification systems and then classify them one by one.
However, many internationalized products do not have a single text language,
such as AliExpress cross-border e-commerce business, Airbnb B&B business, etc.
Industry needs to understand and classify users’ reviews in various languages, and
have conducted in-depth statistics and marketing strategy development, and
multilingual text classification is particularly important in this scenario.
Therefore, we focus on interpreting the methodology of multilingual text
classification model of machine translation in Baidu NLP department, and capture
sets of multilingual data of reviews, news headlines and other data for manual
classification and labeling, use the labeling results for fine-tuning of multilingual text
classification model, and output the quality evaluation data of Baidu multilingual text
classification model after fine-tuning. We will discuss if the pre-training and
fine-tuning of the large model can substantially improve the quality and performance
of multilingual text classification.
Finally, based on the machine translation-multilingual text classification model,
we derive the application method of pre-training and fine-tuning paradigm in the
current cutting-edge deep learning AI model under the NLP system and verify the
generality and cutting-edge of the pre-training and fine-tuning paradigm in the deep
learning-intelligent search field.Com o desenvolvimento da tecnologia de informação internacional, estamos
sempre a produzir uma enorme quantidade de informação e o recurso mais escasso já
nĂŁo Ă© a informação, mas a capacidade de processar informação em cada lĂngua. A
maior parte da informação multilingue é expressa sob a forma de texto. Como obter a
informação mais eficaz numa quantidade tão considerável e complexa de informação
textual multilingue é um dos principais objetivos do processamento de informação
multilingue.
A classificação de texto multilingue ajuda os utilizadores a quebrar a barreira
linguĂstica e a localizar com precisĂŁo a informação necessária e a classificá-la. Ao
mesmo tempo, o rápido desenvolvimento da Internet acelerou a comunicação entre
utilizadores de várias lĂnguas, dando origem a um grande nĂşmero de textos
multilingues, tais como crĂticas de livros e filmes, chats, introduções de produtos e
outros distintos textos, que contĂŞm uma grande quantidade de informação implĂcita
valiosa e necessitam urgentemente de ferramentas automatizadas para categorizar e
processar esses textos multilingues.
Este trabalho descreve a subtarefa do Processamento de Linguagem Natural
(PNL) conhecida como Classificação de Texto Multilingue (MTC), realizada no
contexto da Baidu, uma empresa chinesa lĂder em IA, cuja equipa de PNL levou a
indĂşstria em tecnologia baseada em aprendizagem neuronal a destacar-se em
Tradução Automática (MT) e pesquisa cientĂfica. A classificação multilingue de
textos é um módulo importante na tradução automática de PNL e um módulo básico
em tarefas de PNL. A MTC pode ser aplicada a muitos campos, tais como análise de
sentimentos multilingues, categorização de notĂcias, filtragem de conteĂşdos
indesejados (do inglĂŞs spam), entre outros.
Neste trabalho, iremos primeiro definir o paradigma do modelo AI de 'pré-treino
e afinação' em aprendizagem profunda no departamento de PNL da Baidu. Em
seguida, realizaremos a pesquisa sobre outros produtos no mercado com capacidade
de classificação de texto — a classificação de texto levada a cabo pela Alibaba. Após
a pesquisa, verificamos que a maioria dos sistemas de classificação de texto
atualmente disponĂveis no mercado chinĂŞs sĂŁo concebidos para uma Ăşnica lĂngua, tal como o sistema de classificação de texto Alibaba. Se os utilizadores precisarem de
classificar textos da mesma categoria em várias lĂnguas, precisam de aplicar vários
sistemas de classificação de texto para cada lĂngua e depois classificá-los um a um.
No entanto, muitos produtos internacionalizados nĂŁo tĂŞm uma Ăşnica lĂngua de
texto, tais como AliExpress comércio eletrónico transfronteiriço, Airbnb B&B
business, etc. A indústria precisa compreender e classificar as revisões dos
utilizadores em várias lĂnguas. Esta necessidade conduziu a um desenvolvimento
aprofundado de estatĂsticas e estratĂ©gias de marketing, e a classificação de textos
multilingues é particularmente importante neste cenário.
Desta forma, concentrar-nos-emos na interpretação da metodologia do modelo
de classificação de texto multilingue da tradução automática no departamento de PNL
Baidu. Colhemos para o efeito conjuntos de dados multilingues de comentários e
crĂticas, manchetes de notĂcias e outros dados para classificação manual, utilizamos os
resultados dessa classificação para o aperfeiçoamento do modelo de classificação de
texto multilingue e produzimos os dados de avaliação da qualidade do modelo de
classificação de texto multilingue da Baidu. Discutiremos se o pré-treino e o
aperfeiçoamento do modelo podem melhorar substancialmente a qualidade e o
desempenho da classificação de texto multilingue. Finalmente, com base no modelo
de classificação de texto multilingue de tradução automática, derivamos o método de
aplicação do paradigma de pré-formação e afinação no atual modelo de IA de
aprendizagem profunda de ponta sob o sistema de PNL, e verificamos a robustez e os
resultados positivos do paradigma de pré-treino e afinação no campo de pesquisa de
aprendizagem profunda
- …