787 research outputs found
SR4GN: A Species Recognition Software Tool for Gene Normalization
As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications. SR4GN can be downloaded at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4G
Data Optimization in Deep Learning: A Survey
Large-scale, high-quality data are considered an essential factor for the
successful application of many deep learning techniques. Meanwhile, numerous
real-world deep learning tasks still have to contend with the lack of
sufficient amounts of high-quality data. Additionally, issues such as model
robustness, fairness, and trustworthiness are also closely related to training
data. Consequently, a huge number of studies in the existing literature have
focused on the data aspect in deep learning tasks. Some typical data
optimization techniques include data augmentation, logit perturbation, sample
weighting, and data condensation. These techniques usually come from different
deep learning divisions and their theoretical inspirations or heuristic
motivations may seem unrelated to each other. This study aims to organize a
wide range of existing data optimization methodologies for deep learning from
the previous literature, and makes the effort to construct a comprehensive
taxonomy for them. The constructed taxonomy considers the diversity of split
dimensions, and deep sub-taxonomies are constructed for each dimension. On the
basis of the taxonomy, connections among the extensive data optimization
methods for deep learning are built in terms of four aspects. We probe into
rendering several promising and interesting future directions. The constructed
taxonomy and the revealed connections will enlighten the better understanding
of existing methods and the design of novel data optimization techniques.
Furthermore, our aspiration for this survey is to promote data optimization as
an independent subdivision of deep learning. A curated, up-to-date list of
resources related to data optimization in deep learning is available at
\url{https://github.com/YaoRujing/Data-Optimization}
Machine Learning Methods with Noisy, Incomplete or Small Datasets
In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Social impact retrieval: measuring author inļ¬uence on information retrieval
The increased presence of technologies collectively referred to as Web 2.0 mean the entire process of new media production and dissemination has moved away from an
authorcentric approach. Casual web users and browsers are increasingly able to play a more active role in the information creation process. This means that the traditional ways in which information sources may be validated and scored must adapt accordingly.
In this thesis we propose a new way in which to look at a user's contributions to the network in which they are present, using these interactions to provide a measure of
authority and centrality to the user. This measure is then used to attribute an query-independent interest score to each of the contributions the author makes, enabling us
to provide other users with relevant information which has been of greatest interest to a community of like-minded users. This is done through the development of two
algorithms; AuthorRank and MessageRank.
We present two real-world user experiments which focussed around multimedia annotation and browsing systems that we built; these systems were novel in themselves, bringing together video and text browsing, as well as free-text annotation. Using these systems as examples of real-world applications for our approaches, we then look at a
larger-scale experiment based on the author and citation networks of a ten year period of the ACM SIGIR conference on information retrieval between 1997-2007. We use the
citation context of SIGIR publications as a proxy for annotations, constructing large social networks between authors. Against these networks we show the eļ¬ectiveness of
incorporating user generated content, or annotations, to improve information retrieval
Identifying Semantic Divergences Across Languages
Cross-lingual resources such as parallel corpora and bilingual dictionaries are cornerstones of multilingual natural language processing (NLP). They have been used to study the nature of translation, train automatic machine translation systems, as well as to transfer models across languages for an array of NLP tasks. However, the majority of work in cross-lingual and multilingual NLP assumes that translations recorded in these resources are semantically equivalent. This is often not the case---words and sentences that are considered to be translations of each other frequently divergein meaning, often in systematic ways.
In this thesis, we focus on such mismatches in meaning in text that we expect to be aligned across languages. We term such mismatches as cross-lingual semantic divergences. The core claim of this thesis is that translation is not always meaning preserving which leads to cross-lingual semantic divergences that affect multilingual NLP tasks. Detecting such divergences requires ways of directly characterizing differences in meaning across languages through novel cross-lingual tasks, as well as models that account for translation ambiguity and do not rely on expensive, task-specific supervision.
We support this claim through three main contributions. First, we show that a large fraction of data in multilingual resources (such as parallel corpora and bilingual dictionaries) is identified as semantically divergent by human annotators. Second, we introduce cross-lingual tasks that characterize differences in word meaning across languages by identifying the semantic relation between two words. We also develop methods to predict such semantic relations, as well as a model to predict whether sentences in different languages have the same meaning. Finally, we demonstrate the impact of divergences by applying the methods developed in the previous sections to two downstream tasks. We first show that our model for identifying semantic relations between words helps in separating equivalent word translations from divergent translations in the context of bilingual dictionary induction, even when the two words are close in meaning. We also show that identifying and filtering semantic divergences in parallel data helps in training a neural machine translation system twice as fast without sacrificing quality
The doctoral research abstracts. Vol:7 2015 / Institute of Graduate Studies, UiTM
Foreword:
The Seventh Issue of The Doctoral Research Abstracts captures the novelty of
65 doctorates receiving their scrolls in UiTMās 82nd Convocation in the field of
Science and Technology, Business and Administration, and Social Science and
Humanities. To the recipients I would like to say that you have most certainly
done UiTM proud by journeying through the scholastic path with its endless
challenges and impediments, and persevering right till the very end.
This convocation should not be regarded as the end of your highest scholarly
achievement and contribution to the body of knowledge but rather as the
beginning of embarking into high impact innovative research for the
community and country from knowledge gained during this academic
journey.
As alumni of UiTM, we will always hold you dear to our hearts. A new
āhandshakeā is about to take place between you and UiTM as joint
collaborators in future research undertakings. I envisioned a strong
research pact between you as our alumni and UiTM in breaking the
frontier of knowledge through research.
I wish you all the best in your endeavour and may I offer my
congratulations to all the graduands. āUiTM sentiasa dihati kuā /
Tan Sri Datoā Sri Prof Ir Dr Sahol Hamid Abu Bakar , FASc, PEng
Vice Chancellor
Universiti Teknologi MAR
Aerospace medicine and biology: A continuing bibliography with indexes (supplement 327)
This bibliography lists 127 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during August, 1989. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance
Natural Gas Processing for Removal of Sour Gases and their Storage for Production of LNG and its gasification
This thesis discusses many aspects of natural gas processing namely sour gases separation, CO2 sequestration for EOR, and natural gas regasification. Two technologies of sour gases removal processes have been simulated using HYSYS. In addition, an experimental study of the effect the porosity in the real rock in underground reservoir on CO2 injection has been conducted. Finally, an innovative finned vaporizer with invasive defrosting method has been tested experimentally in a pilot-scale forced-draft unit. Defrosting was performed by using a hot MEG-water solution cycle
- ā¦