93 research outputs found

    Book review

    Get PDF
    Kiss Álmos Péter (ed.), Az afrikai terrorista- és szakadárszervezetek (African Terrorist and Secessionist Organizations), HVK TKH, Budapest, 2015. ISBN 978-963-89948-4-4 by András Kocso

    Classification using a sparse combination of basis functions

    Get PDF
    Combinations of basis functions are applied here to generate and solve a convex reformulation of several well-known machine learning algorithms like certain variants of boosting methods and Support Vector Machines. We call such a reformulation a Convex Networks (CN) approach. The nonlinear Gauss-Seidel iteration process for solving the CN problem converges globally and fast as we prove. A major property of CN solution is the sparsity, the number of basis functions with nonzero coefficients. The sparsity of the method can effectively be controlled by heuristics where our techniques are inspired by the methods from linear algebra. Numerical results and comparisons demonstrate the effectiveness of the proposed methods on publicly available datasets. As a consequence, the CN approach can perform learning tasks using far fewer basis functions and generate sparse solutions

    A hierarchical evaluation methodology in speech recognition

    Get PDF
    In speech recognition vast hypothesis spaces are generated, so the search methods used and their speedup techniques are both of great importance. One way of getting a speedup gain is to search in multiple steps. In this multipass search technique the first steps use only a rough estimate, while the latter steps apply the results of the previous ones. To construct these raw tests we use simplified phoneme groups which are based on some distance function defined over phonemes. The tests we performed show that this technique could significantly speed up the recognition process

    Extracting human protein information from MEDLINE using a full-sentence parser

    Get PDF
    Today, a fair number of systems are available for the task of processing biological data. The development of effective systems is of great importance since they can support both the research and the everyday work of biologists. It is well known that biological databases are large both in size and number, hence data processing technologies are required for the fast and effective management of the contents stored in databases like MEDLINE. A possible solution for content management is the application of natural language processing methods to help make this task easier. With our approach we would like to learn more about the interactions of human genes using full-sentence parsing. Given a sentence, the syntactic parser assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a constituent representation of a sentence (showing noun phrases, verb phrases, and so on). Here we show experimentally that using the syntactic information of each abstract, the biological interactions of genes can be predicted. Hence, it is worth developing the kind of information extraction (IE) system that can retrieve information about gene interactions just by using syntactic information contained in these text. Our IE system can handle certain types of gene interactions with the help of machine learning (ML) methodologies (Hidden Markov Models, Artificial Neural Networks, Decision Trees, Support Vector Machines). The experiments and practical usage show clearly that our system can provide a useful intuitive guide for biological researchers in their investigations and in the design of their experiments

    CLASSIFIER COMBINATION IN SPEECH RECOGNITION

    Get PDF
    In statistical pattern recognition, the principal task is to classify abstract data sets. Instead of using robust but computational expensive algorithms it is possible to combine `weak´ classifiers that can be employed in solving complex classification tasks. In this comparative study, we will examine the effectiveness of the commonly used hybrid schemes - especially those used for speech recognition problems - concentrating on cases which employ different combinations of classifiers

    Named entity recognition for Hungarian using various machine learning algorithms

    Get PDF
    In this paper we introduce a statistical Named Entity recognizer (NER) system for the Hungarian language. We examined three methods for identifying and disambiguating proper nouns (Artificial Neural Network, Support Vector Machine, C4.5 Decision Tree), their combinations and the effects of dimensionality reduction as well. We used a segment of Szeged Corpus [5] for training and validation purposes, which consists of short business news articles collected from MTI (Hungarian News Agency, www.mti.hu). Our results were presented at the Second Conference on Hungarian Computational Linguistics [7]. Our system makes use of both language dependent features (describing the orthography of proper nouns in Hungarian) and other, language independent information such as capitalization. Since we avoided the inclusion of large gazetteers of pre-classified entities, the system remains portable across languages without requiring any major modification, as long as the few specialized orthographical and syntactic characteristics are collected for a new target language. The best performing model achieved an F measure accuracy of 91.95%

    Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

    Get PDF
    We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair
    corecore