2,375 research outputs found
Cross-lingual document retrieval categorisation and navigation based on distributed services
The widespread use of the Internet across countries has increased the need for access to document collections
that are often written in languages different from a user’s native language. In this paper we describe Clarity, a
Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian.
Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation,
text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven
methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system
architecture that was developed to support the interaction and coordination of Clarity’s distributed services, (iii)
the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an
example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only
such system that currently supports Baltic languages
Conference Program
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), xii-xvii.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays
The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques.
The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in
Lithuanian speech recognizer.
The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed.
The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences
14th Conference on DATA ANALYSIS METHODS for Software Systems
DAMSS-2023 is the 14th International Conference on Data Analysis Methods for Software Systems, held in Druskininkai, Lithuania. Every year at the same venue and time. The exception was in 2020, when the world was gripped by the Covid-19 pandemic and the movement of people was severely restricted. After a year’s break, the conference was back on track, and the next conference was successful in achieving its primary goal of lively scientific communication. The conference focuses on live interaction among participants. For better efficiency of communication among participants, most of the presentations are poster presentations.
This format has proven to be highly effective. However, we have several oral sections, too. The history of the conference dates back to 2009 when 16 papers were presented. It began as a workshop and has evolved into a well-known conference. The idea of such a workshop originated at the Institute of Mathematics and Informatics, now the Institute of Data Science and Digital Technologies of Vilnius University. The Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea, which gained enthusiastic acceptance from both the Lithuanian and international scientific communities. This year’s conference features 84 presentations, with 137 registered participants from 11 countries. The conference serves as a gathering point for researchers from six Lithuanian universities, making it the main annual meeting for Lithuanian computer scientists. The primary aim of the conference is to showcase research conducted at Lithuanian and foreign universities in the fields of data science and software engineering. The annual organization of the conference facilitates the rapid exchange of new ideas within the scientific community. Seven IT companies supported the conference this year, indicating the relevance of the conference topics to the business sector. In addition, the conference is supported by the Lithuanian Research Council and the National Science and Technology Council (Taiwan, R. O. C.). The conference covers a wide range of topics, including Applied Mathematics, Artificial Intelligence, Big Data, Bioinformatics, Blockchain Technologies, Business Rules, Software Engineering, Cybersecurity, Data Science, Deep Learning, High-Performance Computing, Data Visualization, Machine Learning, Medical Informatics, Modelling Educational Data, Ontological Engineering, Optimization, Quantum Computing, Signal Processing. This book provides an overview of all presentations from the DAMSS-2023 conference
Cross-language Text Classification with Convolutional Neural Networks From Scratch
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach
Fractionation of heavy metals in sewage sludge and their removal using low-molecular-weight organic acids
The total concentration and the concentrations of individual chemical species of selected heavy metals were estimated in primary and anaerobically digested sewage sludge. The concentration of Zn (1503 mg/kg) was highest and was followed by Cu (201 mg/kg), Cr (196 mg/kg), Pb (56 mg/kg), Ni (44 mg/kg) and Cd (3.6 mg/kg). The metal was divided into 5 fractions (exchangeable (F1), adsorbed (F2), organically bound (F3), bound to carbonates (F4), and residual (F5)) via sequential extraction. The sludge treatment procedure had no significant effect on the fractionation results. In both the primary and anaerobically digested sewage sludge, the heavy metals were ranked according to their mobilities (fractions F1 and F2) in the following order: Ni > Zn > Cu > Cd > Pb > >Cr. Metal stability in the environment was evaluated by the sulphide and residual fraction F5, and the following ranking order was identified: Cr > >Pb≈Ni > Cd > Zn≈Cu. A leaching experiment with low-molecular-weight organic acids (oxalic, acetic and citric acid) revealed that the metal-removal efficiency varied depending on the number of carboxyl groups in the extracting agent, the chemical speciation of the metal (Ni, Zn or Cu) in the sludge and the concentration and pH change of the extracting solution. Acid solutions with a 0.5 M concentration, ranked according to their Zn-removal efficiency, are ranked as follows: citric acid (100%) > acetic acid (78%) > oxalic acid (71%). In all of the cases, citric acid showed the best capacity for the removal of metal from the sludge, with an extraction efficiency ranging from 30–100%, while the Ni and Cu removal efficiencies with the acetic and oxalic acid were less than 40%.
First published online: 11 Oct 201
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition
We propose a quantum kernel learning (QKL) framework to address the inherent
data sparsity issues often encountered in training large-scare acoustic models
in low-resource scenarios. We project acoustic features based on
classical-to-quantum feature encoding. Different from existing quantum
convolution techniques, we utilize QKL with features in the quantum space to
design kernel-based classifiers. Experimental results on challenging spoken
command recognition tasks for a few low-resource languages, such as Arabic,
Georgian, Chuvash, and Lithuanian, show that the proposed QKL-based hybrid
approach attains good improvements over existing classical and quantum
solutions.Comment: Submitted to ICASSP 202
Comparative Analysis of Word Embeddings for Capturing Word Similarities
Distributed language representation has become the most widely used technique
for language representation in various natural language processing tasks. Most
of the natural language processing models that are based on deep learning
techniques use already pre-trained distributed word representations, commonly
called word embeddings. Determining the most qualitative word embeddings is of
crucial importance for such models. However, selecting the appropriate word
embeddings is a perplexing task since the projected embedding space is not
intuitive to humans. In this paper, we explore different approaches for
creating distributed word representations. We perform an intrinsic evaluation
of several state-of-the-art word embedding methods. Their performance on
capturing word similarities is analysed with existing benchmark datasets for
word pairs similarities. The research in this paper conducts a correlation
analysis between ground truth word similarities and similarities obtained by
different word embedding methods.Comment: Part of the 6th International Conference on Natural Language
Processing (NATP 2020
- …