Search CORE

66 research outputs found

Link Graph Analysis for Adult Images Classification

Author: Belyaev Dmitry
Kharitonov Evgeny
Kotlyarov Dmitry
Muchnik Ilya
Romanenko Fedor
Slesarev Anton
Publication venue
Publication date: 01/01/2010
Field of study

In order to protect an image search engine's users from undesirable results adult images' classifier should be built. The information about links from websites to images is employed to create such a classifier. These links are represented as a bipartite website-image graph. Each vertex is equipped with scores of adultness and decentness. The scores for image vertexes are initialized with zero, those for website vertexes are initialized according to a text-based website classifier. An iterative algorithm that propagates scores within a website-image graph is described. The scores obtained are used to classify images by choosing an appropriate threshold. The experiments on Internet-scale data have shown that the algorithm under consideration increases classification recall by 17% in comparison with a simple algorithm which classifies an image as adult if it is connected with at least one adult site (at the same precision level).Comment: 7 pages. Young Scientists Conference, 4th Russian Summer School in Information Retrieva

arXiv.org e-Print Archive

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Quasi-Concave Functions and Greedy Algorithms

Author: Ilya Muchnik
Vadim E. Levit
Yulia Kempner
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen

CiteSeerX

Crossref

Game interpretation of Kolmogorov complexity

Author: Alexander Shen
Andrej A. Muchnik
Er Shen
Ilya Mezhirov
Nikolay Vereshchagin
Nikolay Vereshchagin
See Profile
Publication venue
Publication date: 24/03/2010
Field of study

The Kolmogorov complexity function K can be relativized using any oracle A, and most properties of K remain true for relativized versions. In section 1 we provide an explanation for this observation by giving a game-theoretic interpretation and showing that all "natural" properties are either true for all sufficiently powerful oracles or false for all sufficiently powerful oracles. This result is a simple consequence of Martin's determinacy theorem, but its proof is instructive: it shows how one can prove statements about Kolmogorov complexity by constructing a special game and a winning strategy in this game. This technique is illustrated by several examples (total conditional complexity, bijection complexity, randomness extraction, contrasting plain and prefix complexities).Comment: 11 pages. Presented in 2009 at the conference on randomness in Madison

arXiv.org e-Print Archive

CiteSeerX

FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier

Author: Alexander Simonov
Andrew Garazha
Anton Buzdin
Anton Buzdin
Anton Buzdin
Artem Mescheryakov
Ilya Muchnik
Maxim Sorokin
Maxim Sorokin
Nicolas Borisov
Nicolas Borisov
Victor Tkachev
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels

Directory of Open Access Journals

Induced Layered Clusters, Hereditary Mappings and Convex Geometries

Author: Boris Mirkin
Ilya Muchnik
Publication venue
Publication date: 01/01/2001
Field of study

A method for structural clustering proposed by the authors is extended to the case when there are externally defined restrictions on the relations between sets and their elements. This framework appears to be related to order-theoretic concepts of the hereditary mappings and convex geometries, which enables us to givecharacterizations of those in terms of the monotone linkage functions. Key words: layered cluster, monotone linkage, greedy optimization, convex geometry, hereditary mapping.

CiteSeerX

Elsevier - Publisher Connector

Relation between Protein Structure, Sequence Homology and Composition of Amino Acids

Author: Eddy Mayoraz
Eddy Mayoraz
Ilya Muchnik
Ilya Muchnik
Inna Dubchak
Inna Dubchak
Publication venue
Publication date
Field of study

. A method of quantitative comparison of two classifications rules applied to protein folding problem is presented. Classification of proteins based on sequence homology and based on amino acid composition were compared and analyzed according to this approach. The coefficient of correlation between these classification methods and the procedure of estimation of robustness of the coefficient are discussed. RRR 6-95 Page 1 1 Introduction One of the most powerful methods of protein structure prediction is the model building by homology (Hilbert et al, 1993). Chothia and Lesk (1986) suggested that if two sequences can be aligned with 50% or greater residue identity they have a similar fold. This threshold of 50% is usually used as a "safe definition of sequence homology" (Pascarella & Argos, 1992) and in conventional opinion grants a reasonable confidence that a protein sequence has chain conformation of the template excluding less conserved regions. But it was shown that structure inform..

CiteSeerX

Optimization Algorithms for Separable Functions With Tree-Like Adjacency of Variables and Their Application to the Analysis of Massive Data Sets

Author: Ilya Muchnik
Vadim Mottl
Publication venue
Publication date
Field of study

A massive data set is considered as a set of experimentally acquired values of a number of variables each of which is associated with the respective node of an undirected adjacency graph that presets the fixed structure of the data set. The class of data analysis problems under consideration is outlined by the assumption that the ultimate aim of processing can be represented as a transformation of the original data array into a secondary array of the same structure but with node variables of, generally speaking, different nature, i.e. different ranges. Such a generalized problem is set as the formal problem of optimization (minimization or maximization) of a real-valued objective function of all the node variables. The objective function is assumed to consist of additive constituents of one or two arguments, respectively, node and edge functions. The former of them carry the data-dependent information on the sought-for values of the secondary variables, whereas the latter ones are mean..

CiteSeerX

Combinatorial clustering for textual data representation in machine learning models.” http://www.datalaundering.com/download/theoretic.pdf

Author: Andrei Anghelescu
Ilya Muchnik
Publication venue
Publication date
Field of study

In text stream analysis one of the main problems is finding an effective method to classify documents fast and correctly. This is the reason why dimensionality reduction and related methods of representation of significant information are critical to develop a good text classifier. In this report we describe a novel purely combinatorial approach to obtain a meaningful representation of text data. There are two basic ideas that we realized in the current development of this approach. Namely, (1) Layered Clusters which induce over the entire data a stratification in a tower structure like a nesting doll (Russian Matreshka) [1][2], and, (2) parallel clustering of documents and their features (frequencies of words in our case). The clusters are sub-matrices of data which include each other according to the ordering given by the clustering model: the deepest cluster-matrix represents the largest weighted quasi-clique if the input data-matrix would be interpreted as a hypergraph; its effective weight is also the largest possible; the second cluster includes the first one and represents the second level of a quasi-clique with less valu

CiteSeerX