Search CORE

337,380 research outputs found

The Google Similarity Distance

Author: Cilibrasi Rudi
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2007
Field of study

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.Comment: 15 pages, 10 figures; changed some text/figures/notation/part of theorem. Incorporated referees comments. This is the final published version up to some minor changes in the galley proof

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Quantum-inspired algorithm for direct multi-class classification

Author: Blank Carsten
Freytes Hector
Giuntini Roberto
Holik Federico
Park Daniel K.
Sergioli Giuseppe
Publication venue
Publication date: 01/01/2023
Field of study

Over the last few decades, quantum machine learning has emerged as a groundbreaking discipline. Harnessing the peculiarities of quantum computation for machine learning tasks offers promising advantages. Quantum-inspired machine learning has revealed how relevant benefits for machine learning problems can be obtained using the quantum information theory even without employing quantum computers. In the recent past, experiments have demonstrated how to design an algorithm for binary classification inspired by the method of quantum state discrimination, which exhibits high performance with respect to several standard classifiers. However, a generalization of this quantuminspired binary classifier to a multi-class scenario remains nontrivial. Typically, a simple solution in machine learning decomposes multi-class classification into a combinatorial number of binary classifications, with a concomitant increase in computational resources. In this study, we introduce a quantum-inspired classifier that avoids this problem. Inspired by quantum state discrimination, our classifier performs multi-class classification directly without using binary classifiers. We first compared the performance of the quantum-inspired multi-class classifier with eleven standard classifiers. The comparison revealed an excellent performance of the quantum-inspired classifier. Comparing these results with those obtained using the decomposition in binary classifiers shows that our method improves the accuracy and reduces the time complexity. Therefore, the quantum-inspired machine learning algorithm proposed in this work is an effective and efficient framework for multi-class classification. Finally, although these advantages can be attained without employing any quantum component in the hardware, we discuss how it is possible to implement the model in quantum hardware

CONICET Digital

Archivio istituzionale della ricerca - Università di Cagliari

Arts, Computers and Artificial Intelligence

Author: Neeman Sol, Ph.D.
Publication venue: ScholarsArchive@JWU
Publication date: 09/11/1996
Field of study

Science and art seem to belong to different cultures. Science and technology, mainly the products of the intellect, use terminology and vocabulary that are concise and well defined. In contrast, in artistic expression, ambiguity is a powerful component. Still the relationship between these two different categories of human activity is interesting and fascinating. In this paper, a general comparison of these two disciplines will be introduced. Then the possibility of mechanical creation of art using computers and artificial intelligence will be discussed. This will be followed by two techniques which are used to create poetry and music. First, a statistical approach for mechanical composition of music will be presented. This method uses parameters of existing music to create similar music. Second, a method of mechanical composition of poetry will be presented which combines linguistic models, a classification dictionary and semantic information

ScholarsArchive at Johnson & Wales University

HELIN Digital Commons

Recommended from our members

High performance latent dirichlet allocation for text mining

Author: Liu Zelong
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space

Brunel University Research Archive

A NEW MODEL ON BENTHIC FORAMINIFER IMAGE CLASSIFICATION AND DEFINITIONS BASED ON CONVENTIONAL NEURAL NETWORK (CNN)

Author: Kübra Yayan
Uğur Yayan
Publication venue: 'Eskisehir Osmangazi Universitesi Muhendislik ve Mimarlik Fakultesi Dergisi'
Publication date: 01/04/2023
Field of study

Fossil studies are of great importance in order to observe the change of living species over the years, to make inferences by using the information provided by the observed species, and to understand the developing and changing structure of the world we live in over the years. However, the examination and interpretation of fossil specimens is a complex and long process. Artificial intelligence studies have begun to be applied to this field in order to facilitate the working methods of paleontologists. The detection and classification of fossil specimens with the aid of computers simplifies this process as much as possible compared to manual classification processes and reduces foreign dependency for fossil assemblages for which paleontologists are not experts. To achieve this, 9 benthic foraminiferal species and non-foraminiferal sample photographs from a selected dataset were used. In this study, a new method developed for the classification of benthic foraminifera using deep convolutional neural networks, reaching higher accuracy than the results in the literature, is presented. With this method, at least 70% accuracy rates were achieved in the test results of the trained system. This study, which reached high accuracy rates with a new method, has created a successful development for the branch of paleontology in the use of artificial intelligence in microfossil identification

Directory of Open Access Journals

ANALISIS POSTUR KERJA KARYAWAN KANTOR PADA PT XZ

Author: Br Tarigan Elisya Florena
Zetli Sri
Publication venue: LPPM Putera Batam
Publication date: 25/07/2021
Field of study

Ergonomics is a systematic branch of science to utilize information about human nature, human capabilities and limitations to design effective, safe and comfortable work systems. Ergonomics includes many things related to employee work, one of which is office ergonomics which includes the entire work environment and work tools related to computers, chairs and others. High demands for office employees at PT. XZ requires employees to work for quite a long time where based on existing surveys, it is found that office workers spend more than 75% of their working time sitting in front of the computer. Jobs like this are related to several ergonomic risks felt by employees, so it is necessary to measure the level of ergonomics risk in office employees at PT. XZ. Rapid Office Strain Assessment (ROSA) is a rapid analysis to measure work risks associated with the use of computers where this method of assessment is designed to measure the risk of worker injury and determine the level of change action based on reports of worker discomfort. From the results of the study it was found that complaints on the employee's body using the CMDQ questionnaire showed that 5 employees felt the most complaints in the lower back by 28.5%, the neck 21%, the upper back 18% and finally the hips / buttocks by 12 ,8%. From the analysis of work posture using the ROSA method, the final score of the five employees is the same, namely 5, which means that it is included in the warning level classification so that it is necessary to improve work posture according to the setting procedure for computer work stations, namely by paying attention to chair height, elbow position, monitor surface distance, monitor height, computer surface position, back and forth backrest, telephone distance, wrist angle, and mouse position

Universitas Putera Batam (UPB): Open Journal Systems

Recognition of prokaryotic promoters based on a novel variable-window Z-curve method

Author: Alberts
Askary
Bansal
Barrios
Barrios
Benson
Bland
Burden
Burnham
Estrem
Evans
Gama-Castro
Gao
Gao
Geladi
Gordon
Gruber
Gruber
Guo
Helmann
Helmann
Hook-Barnard
Hook-Barnard
Höskuldsson
Kai Song
Kvalheim
Kvalheim
Lin
Lindgren
Mann
McCracken
Paget
Paget
Perez-Rueda
Perez-Rueda
Rani
Rosipal
Rosipal
Rännar
Samal
Shultzaberger
Shultzaberger
Sierro
Sierro
Tsukahara
van Hijum
Wold
Wold
Wosten
Yang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Transcription is the first step in gene expression, and it is the step at which most of the regulation of expression occurs. Although sequenced prokaryotic genomes provide a wealth of information, transcriptional regulatory networks are still poorly understood using the available genomic information, largely because accurate prediction of promoters is difficult. To improve promoter recognition performance, a novel variable-window Z-curve method is developed to extract general features of prokaryotic promoters. The features are used for further classification by the partial least squares technique. To verify the prediction performance, the proposed method is applied to predict promoter fragments of two representative prokaryotic model organisms (Escherichia coli and Bacillus subtilis). Depending on the feature extraction and selection power of the proposed method, the promoter prediction accuracies are improved markedly over most existing approaches: for E. coli, the accuracies are 96.05% (σ70 promoters, coding negative samples), 90.44% (σ70 promoters, non-coding negative samples), 92.13% (known sigma-factor promoters, coding negative samples), 92.50% (known sigma-factor promoters, non-coding negative samples), respectively; for B. subtilis, the accuracies are 95.83% (known sigma-factor promoters, coding negative samples) and 99.09% (known sigma-factor promoters, non-coding negative samples). Additionally, being a linear technique, the computational simplicity of the proposed method makes it easy to run in a matter of minutes on ordinary personal computers or even laptops. More importantly, there is no need to optimize parameters, so it is very practical for predicting other species promoters without any prior knowledge or prior information of the statistical properties of the samples

Crossref

PubMed Central

A descriptive review and classification of organizational information security awareness research

Author: Hutchinson Gershon
Ophoff Jacques
Publication venue
Publication date: 08/03/2020
Field of study

Information security awareness (ISA) is a vital component of information security in organizations. The purpose of this research is to descriptively review and classify the current body of knowledge on ISA. A sample of 59 peer-reviewed academic journal articles, which were published over the last decade from 2008 to 2018, were analyzed. Articles were classified using coding techniques from the grounded theory literature-review method. The results show that ISA research is evolving with behavioral research studies still being explored. Quantitative empirical research is the dominant methodology and the top three theories used are general deterrence theory, theory of planned behavior, and protection motivation theory. Future research could focus on qualitative approaches to provide greater depth of ISA understanding

Abertay Research Portal