2,235 research outputs found
Extracting Word Sequence Correspondences with Support Vector Machines
method of word sequence correspondences from non-aligned parallel corpora with Support Vector Machines, which have high ability of the generalization, rarely cause over-fit for training samples and can learn dependencies of features by using a kernel function. Our method uses features for the translation model which use the translation dictionary, the number of words, part-of-speech, constituent words and neighbor words. Experiment results in which Japanese and English parallel corpora are used archived 81.1 % precision rate and 69.0 % recall rate of the extracted word sequence correspondences. This demonstrates that our method could reduce the cost for making translation dictionaries
The application of Deep Learning in Persian Documents Sentiment Analysis
Nowadays the amount of textual information on the web is grown rapidly. The huge textual data needs more accurate classification algorithms. Sentiment analysis is a branch of text classification that is used to classify user opinions in case of market decisions, product evaluations or measuring consumer confidence. With the rise of the production rate of Persian text data in a commercial area, improvement of the efficiency of algorithms in Persian is a must. The structure of the Persian language such as word and sentence structures poses some challenges in this area. Deep learning algorithms are recently used in NLP and especially sentiment text classification for many dominant languages like Persian. The goal is to improve the performance of classification using deep learning issues. In this work, the authors proposed a hybrid method by a combination of structural correspondence learning (SCL) and convolutional neural network (CNN). The SCL method selects the most effective pivot features so the adaptation from one domain to similar ones cannot drop the efficiency drastically. The results showed that the proposed hybrid method that is learned from one domain can act efficiently in a similar domain. The result showed that applying a combination of SCL+CNN can improve the result of sentiment classification for two domains more than 10 percent
Augmenting Translation Lexica by Learning Generalised Translation Patterns
Bilingual Lexicons do improve quality: of parallel corpora alignment, of newly extracted
translation pairs, of Machine Translation, of cross language information retrieval, among
other applications. In this regard, the first problem addressed in this thesis pertains to
the classification of automatically extracted translations from parallel corpora-collections
of sentence pairs that are translations of each other. The second problem is concerned
with machine learning of bilingual morphology with applications in the solution of first
problem and in the generation of Out-Of-Vocabulary translations.
With respect to the problem of translation classification, two separate classifiers for
handling multi-word and word-to-word translations are trained, using previously extracted
and manually classified translation pairs as correct or incorrect. Several insights
are useful for distinguishing the adequate multi-word candidates from those that are
inadequate such as, lack or presence of parallelism, spurious terms at translation ends
such as determiners, co-ordinated conjunctions, properties such as orthographic similarity
between translations, the occurrence and co-occurrence frequency of the translation
pairs. Morphological coverage reflecting stem and suffix agreements are explored as key
features in classifying word-to-word translations. Given that the evaluation of extracted
translation equivalents depends heavily on the human evaluator, incorporation of an
automated filter for appropriate and inappropriate translation pairs prior to human evaluation
contributes to tremendously reduce this work, thereby saving the time involved
and progressively improving alignment and extraction quality. It can also be applied
to filtering of translation tables used for training machine translation engines, and to
detect bad translation choices made by translation engines, thus enabling significative
productivity enhancements in the post-edition process of machine made translations.
An important attribute of the translation lexicon is the coverage it provides. Learning
suffixes and suffixation operations from the lexicon or corpus of a language is an extensively
researched task to tackle out-of-vocabulary terms. However, beyond mere words
or word forms are the translations and their variants, a powerful source of information
for automatic structural analysis, which is explored from the perspective of improving
word-to-word translation coverage and constitutes the second part of this thesis. In this
context, as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries,
an approach to automatically induce segmentation and learn bilingual morph-like units by identifying and pairing word stems and suffixes is proposed, using the bilingual
corpus of translations automatically extracted from aligned parallel corpora, manually
validated or automatically classified. Minimally supervised technique is proposed to enable
bilingual morphology learning for language pairs whose bilingual lexicons are highly
defective in what concerns word-to-word translations representing inflection diversity.
Apart from the above mentioned applications in the classification of machine extracted
translations and in the generation of Out-Of-Vocabulary translations, learned bilingual
morph-units may also have a great impact on the establishment of correspondences of
sub-word constituents in the cases of word-to-multi-word and multi-word-to-multi-word
translations and in compression, full text indexing and retrieval applications
CALIPER: Continuous Authentication Layered with Integrated PKI Encoding Recognition
Architectures relying on continuous authentication require a secure way to
challenge the user's identity without trusting that the Continuous
Authentication Subsystem (CAS) has not been compromised, i.e., that the
response to the layer which manages service/application access is not fake. In
this paper, we introduce the CALIPER protocol, in which a separate Continuous
Access Verification Entity (CAVE) directly challenges the user's identity in a
continuous authentication regime. Instead of simply returning authentication
probabilities or confidence scores, CALIPER's CAS uses live hard and soft
biometric samples from the user to extract a cryptographic private key embedded
in a challenge posed by the CAVE. The CAS then uses this key to sign a response
to the CAVE. CALIPER supports multiple modalities, key lengths, and security
levels and can be applied in two scenarios: One where the CAS must authenticate
its user to a CAVE running on a remote server (device-server) for access to
remote application data, and another where the CAS must authenticate its user
to a locally running trusted computing module (TCM) for access to local
application data (device-TCM). We further demonstrate that CALIPER can leverage
device hardware resources to enable privacy and security even when the device's
kernel is compromised, and we show how this authentication protocol can even be
expanded to obfuscate direct kernel object manipulation (DKOM) malwares.Comment: Accepted to CVPR 2016 Biometrics Worksho
Basic research planning in mathematical pattern recognition and image analysis
Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis
Sensing Highly Non-Rigid Objects with RGBD Sensors for Robotic Systems
The goal of this research is to enable a robotic system to manipulate clothing and other highly non-rigid objects using an RGBD sensor. The focus of this thesis is to define and test various algorithms / models that are used to solve parts of the laundry process (i.e. handling, classifying, sorting, unfolding, and folding). First, a system is presented for automatically extracting and classifying items in a pile of laundry. Using only visual sensors, the robot identifies and extracts items sequentially from the pile. When an item is removed and isolated, a model is captured of the shape and appearance of the object, which is then compared against a dataset of known items. The contributions of this part of the laundry process are a novel method for extracting articles of clothing from a pile of laundry, a novel method of classifying clothing using interactive perception, and a multi-layer approach termed L-M-H, more specifically L-C-S-H for clothing classification. This thesis describes two different approaches to classify clothing into categories. The first approach relies upon silhouettes, edges, and other low-level image measurements of the articles of clothing. Experiments from the first approach demonstrate the ability of the system to efficiently classify and label into one of six categories (pants, shorts, short-sleeve shirt, long-sleeve shirt, socks, or underwear). These results show that, on average, classification rates using robot interaction are 59% higher than those that do not use interaction. The second approach relies upon color, texture, shape, and edge information from 2D and 3D data within a local and global perspective. The multi-layer approach compartmentalizes the problem into a high (H) layer, multiple mid-level (characteristics(C), selection masks(S)) layers, and a low (L) layer. This approach produces \u27local\u27 solutions to solve the global classification problem. Experiments demonstrate the ability of the system to efficiently classify each article of clothing into one of seven categories (pants, shorts, shirts, socks, dresses, cloths, or jackets). The results presented in this paper show that, on average, the classification rates improve by +27.47% for three categories, +17.90% for four categories, and +10.35% for seven categories over the baseline system, using support vector machines. Second, an algorithm is presented for automatically unfolding a piece of clothing. A piece of cloth is pulled in different directions at various points of the cloth in order to flatten the cloth. The features of the cloth are extracted and calculated to determine a valid location and orientation in which to interact with it. The features include the peak region, corner locations, and continuity / discontinuity of the cloth. In this thesis, a two-stage algorithm is presented, introducing a novel solution to the unfolding / flattening problem using interactive perception. Simulations using 3D simulation software, and experiments with robot hardware demonstrate the ability of the algorithm to flatten pieces of laundry using different starting configurations. These results show that, at most, the algorithm flattens out a piece of cloth from 11.1% to 95.6% of the canonical configuration. Third, an energy minimization algorithm is presented that is designed to estimate the configuration of a deformable object. This approach utilizes an RGBD image to calculate feature correspondence (using SURF features), depth values, and boundary locations. Input from a Kinect sensor is used to segment the deformable surface from the background using an alpha-beta swap algorithm. Using this segmentation, the system creates an initial mesh model without prior information of the surface geometry, and it reinitializes the configuration of the mesh model after a loss of input data. This approach is able to handle in-plane rotation, out-of-plane rotation, and varying changes in translation and scale. Results display the proposed algorithm over a dataset consisting of seven shirts, two pairs of shorts, two posters, and a pair of pants. The current approach is compared using a simulated shirt model in order to calculate the mean square error of the distance from the vertices on the mesh model to the ground truth, provided by the simulation model
- …