13,513 research outputs found
Building Program Vector Representations for Deep Learning
Deep learning has made significant breakthroughs in various fields of
artificial intelligence. Advantages of deep learning include the ability to
capture highly complicated features, weak involvement of human engineering,
etc. However, it is still virtually impossible to use deep learning to analyze
programs since deep architectures cannot be trained effectively with pure back
propagation. In this pioneering paper, we propose the "coding criterion" to
build program vector representations, which are the premise of deep learning
for program analysis. Our representation learning approach directly makes deep
learning a reality in this new field. We evaluate the learned vector
representations both qualitatively and quantitatively. We conclude, based on
the experiments, the coding criterion is successful in building program
representations. To evaluate whether deep learning is beneficial for program
analysis, we feed the representations to deep neural networks, and achieve
higher accuracy in the program classification task than "shallow" methods, such
as logistic regression and the support vector machine. This result confirms the
feasibility of deep learning to analyze programs. It also gives primary
evidence of its success in this new field. We believe deep learning will become
an outstanding technique for program analysis in the near future.Comment: This paper was submitted to ICSE'1
GERNERMED: an open German medical NER model
The current state of adoption of well-structured electronic health records
and integration of digital methods for storing medical patient data in
structured formats can often considered as inferior compared to the use of
traditional, unstructured text based patient data documentation. Data mining in
the field of medical data analysis often needs to rely solely on processing of
unstructured data to retrieve relevant data. In natural language processing
(NLP), statistical models have been shown successful in various tasks like
part-of-speech tagging, relation extraction (RE) and named entity recognition
(NER). In this work, we present GERNERMED, the first open, neural NLP model for
NER tasks dedicated to detect medical entity types in German text data. Here,
we avoid the conflicting goals of protection of sensitive patient data from
training data extraction and the publication of the statistical model weights
by training our model on a custom dataset that was translated from publicly
available datasets in foreign language by a pretrained neural machine
translation model. The sample code and the statistical model is available at:
https://github.com/frankkramer-lab/GERNERME
NLP and the Humanities: The Revival of an Old Liaison
This paper presents an overview of some\ud
emerging trends in the application of NLP\ud
in the domain of the so-called Digital Humanities\ud
and discusses the role and nature\ud
of metadata, the annotation layer that is so\ud
characteristic of documents that play a role\ud
in the scholarly practises of the humanities.\ud
It is explained how metadata are the\ud
key to the added value of techniques such\ud
as text and link mining, and an outline is\ud
given of what measures could be taken to\ud
increase the chances for a bright future for\ud
the old ties between NLP and the humanities.\ud
There is no data like metadata
Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing
Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers
Recommended from our members
OBOME - Ontology based opinion mining in UBIPOL
Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy â related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated
Using Neural Networks for Relation Extraction from Biomedical Literature
Using different sources of information to support automated extracting of
relations between biomedical concepts contributes to the development of our
understanding of biological systems. The primary comprehensive source of these
relations is biomedical literature. Several relation extraction approaches have
been proposed to identify relations between concepts in biomedical literature,
namely, using neural networks algorithms. The use of multichannel architectures
composed of multiple data representations, as in deep neural networks, is
leading to state-of-the-art results. The right combination of data
representations can eventually lead us to even higher evaluation scores in
relation extraction tasks. Thus, biomedical ontologies play a fundamental role
by providing semantic and ancestry information about an entity. The
incorporation of biomedical ontologies has already been proved to enhance
previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1
- âŠ