Search CORE

6,456 research outputs found

Image Segmentation and Classification of Marine Organisms

Author: Vojjila Krishna Teja
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

To automate the arduous task of identifying and classifying images through their domain expertise, pioneers in the field of machine learning and computer vision invented many algorithms and pre-processing techniques. The process of classification is flexible with many user and domain specific alterations. These techniques are now being used to classify marine organisms to study and monitor their populations. Despite advancements in the field of programming languages and machine learning, image segmentation and classification for unlabeled data still needs improvement. The purpose of this project is to explore the various pre-processing techniques and classification algorithms that help cluster and classify images and hence choose the best parameters for identifying the various marine species present in an image

SJSU ScholarWorks

Pre Processing Techniques for Arabic Documents Clustering

Author: Alhanjouri Mohammed A.
Publication venue: 'Vandana Publications'
Publication date: 01/01/2017
Field of study

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: term pruning, term weighting using (TF-IDF), morphological analysis techniques using (root-based stemming, light stemming, and raw text), and normalization. Experimental work examined the effect of clustering algorithms using a most widely used partitional algorithm, K-means, compared with other clustering partitional algorithm, Expectation Maximization (EM) algorithm. Comparison between the effect of both Euclidean Distance and Manhattan similarity measurement function was attempted in order to produce best results in document clustering. Results were investigated by measuring evaluation of clustered documents in many cases of preprocessing techniques. Experimental results show that evaluation of document clustering can be enhanced by implementing term weighting (TF-IDF) and term pruning with small value for minimum term frequency. In morphological analysis, light stemming, is found more appropriate than root-based stemming and raw text. Normalization, also improved clustering process of Arabic documents, and evaluation is enhanced

Institutional Repository of the Islamic University of Gaza

Pre-processing techniques to improve HEVC subjective quality

Author: Botella Guillermo
Del Barrio A. A.
Fernández D. G.
Grecos Christos
Meyer-Baese Anke
Meyer-Baese Uwe
Publication venue: ScholarWorks@CWU
Publication date: 09/04/2017
Field of study

Nowadays, HEVC is the cutting edge encoding standard being the most efficient solution for transmission of video content. In this paper a subjective quality improvement based on pre-processing algorithms for homogeneous and chaotic regions detection is proposed and evaluated for low bit-rate applications at high resolutions. This goal is achieved by means of a texture classification applied to the input frames. Furthermore, these calculations help also reduce the complexity of the HEVC encoder. Therefore both the subjective quality and the HEVC performance are improved

ScholarWorks at Central Washington University

TRAP

Privacy evaluation of fairness-enhancing pre-processing techniques

Author: Taillandier Jean-Christophe
Publication venue
Publication date: 01/12/2020
Field of study

La prédominance d’algorithmes de prise de décision, qui sont souvent basés sur desmodèles issus de l’apprentissage machine, soulève des enjeux importants en termes de ladiscrimination et du manque d’équité par ceux-ci ainsi que leur impact sur le traitement degroupes minoritaires ou sous-représentés. Cela a toutefois conduit au développement de tech-niques dont l’objectif est de mitiger ces problèmes ainsi que les les difficultés qui y sont reliées. Dans ce mémoire, nous analysons certaines de ces méthodes d’amélioration de l’équitéde type «pré-traitement» parmi les plus récentes, et mesurons leur impact sur le compromiséquité-utilité des données transformées. Plus précisément, notre focus se fera sur troistechniques qui ont pour objectif de cacher un attribut sensible dans un ensemble de données,dont deux basées sur les modèles générateurs adversériaux (LAFTR [67] et GANSan [6])et une basée sur une transformation déterministe et les fonctions de densités (DisparateImpact Remover [33]). Nous allons premièrement vérifier le niveau de contrôle que cestechniques nous offrent quant au compromis équité-utilité des données. Par la suite, nousallons investiguer s’il est possible d’inverser la transformation faite aux données par chacunde ces algorithmes en construisant un auto-encodeur sur mesure qui tentera de reconstruireles données originales depuis les données transformées. Finalement, nous verrons qu’unacteur malveillant pourrait, avec les données transformées par ces trois techniques, retrouverl’attribut sensible qui est censé être protégé avec des algorithmes d’apprentissage machinede base. Une des conclusions de notre recherche est que même si ces techniques offrentdes garanties pratiques quant à l’équité des données produites, il reste souvent possible deprédire l’attribut sensible en question par des techniques d’apprentissage, ce qui annulepotentiellement toute protection que la technique voulait accorder, créant ainsi de sérieuxdangers au niveau de la vie privée.The prevalence of decision-making algorithms, based on increasingly powerful patternrecognition machine learning algorithms, has brought a growing wave of concern about dis-crimination and fairness of those algorithm predictions as well as their impacts on equity andtreatment of minority or under-represented groups. This in turn has fuelled the developmentof new techniques to mitigate those issues and helped outline challenges related to such issues. n this work, we analyse recent advances in fairness enhancing pre-processing techniques,evaluate how they control the fairness-utility trade-off and the dataset’s ability to be usedsuccessfully in downstream tasks. We focus on three techniques that attempt to hide asensitive attribute in a dataset, two based onGenerative Adversarial Networksarchitectures(LAFTR [67] and GANSan [6]), and one deterministic transformation of dataset relyingon density functions (Disparate Impact Remover [33]). First we analyse the control overthe fairness-utility trade-off each of these techniques offer. We then attempt to revertthe transformation on the data each of these techniques applied using a variation of anauto-encoder built specifically for this purpose, which we calledreconstructor. Lastly wesee that even though these techniques offer practical guarantees of specific fairness metrics,basic machine learning classifiers are often able to successfully predict the sensitive attributefrom the transformed data, effectively enabling discrimination. This creates what we believeis a major issue in fairness-enhancing technique research that is in large part due to intricaterelationship between fairness and privacy

Dépôt Institutionnel Numérique

Robust pre-processing techniques for non-ideal iris images

Author: Barve Purva M.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2005
Field of study

The human iris has been demonstrated to be a very accurate, non-invasive and easy-to-use biometric for personal identification. Most of the current state-of-the-art iris recognition systems require the iris acquisition to be ideal. A lot of constraints are hence put on the user and the acquisition process.;Our aim in this research is to relax these conditions and to develop a pre-processing algorithm, which can be used in conjunction with any matching algorithm to handle the so-called non-ideal iris images. In this thesis we present a few robust techniques to process the non-ideal iris images so as to give a segmented iris image to the matching algorithm. The motivation behind this work is to reduce the false reject rates of the current recognition systems and to reduce the intra-class variability. A new technique for estimating and compensating the angle in non-frontal iris images is presented. We have also developed a novel segmentation algorithm, which uses an ellipse-fitting approach for localizing the pupil. A fast and simple limbus boundary segmentation algorithm is also presented

The Research Repository @ WVU (West Virginia University)

IMPACT OF DATA PRE-PROCESSING TECHNIQUES ON MACHINE LEARNING MODELS

Author: Tahir Ali
Publication venue: 'Saint Louis University'
Publication date: 01/01/2022
Field of study

The Volve dataset, which contains the time series values of different sensors that have been used at the Volve drilling site contains many flaws which make it hard for machine learning models to learn from the dataset and provide useful insights and future predictions. Three flaws have been highlighted including missing data, different frequency rates, and too many attributes (high dimensional data). To solve the issues, present in time series data, a data preprocessing pipeline has been proposed which first removes the noise through the rolling mean. Then applies gap analysis to remove the columns whose gaps can not be filled with data imputation methods. After that gap has been filled by the KNN imputer which imputes the missing values in the data. After that data resampling has been applied to make the sampling rate consistent as the time series prediction model takes a constant sampling rate. For hyper-parameter tuning of the resampling method AIC and BIC value has been created on a grid of hyper-parameters. After resampling, top parameters were selected on basis of Pearson correlation, after which AIC and BIC has been used to select the most relevant 3 parameters. These 3 parameters has then be used to train three models that are: RNN + MLP, LSTM + MLP, and LSTM + RNN + MLP. On basis of mean absolute error (MAE) best model has been selected which is RNN + MLP

NORA - Norwegian Open Research Archives

UiS Brage

The Influence of Text Pre-processing on Plagiarism Detection

Author: Ceska Z
Fox C
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2011
Field of study

This paper explores the influence of text preprocessing techniques on plagiarism detection. We examine stop-word removal, lemmatization,number replacement, synonymy recognition, and word generalization. We also look into the influence of punctuation and word-order within N-grams. All these techniques are evaluated according to their impact on F1-measure and speed of execution. Our experiments were performed on a Czech corpus of plagiarized documents about politics. At the end of this paper, we propose what we consider to be the best combination of text pre-processing techniques

University of Essex Research Repository

Performance Analysis of Pre-Processing Techniques with Ensemble of 5 Classifiers

Author: Sonali Kadam, Rutuja Pawar, Manisha Kumari, Shweta Phule, Priyansha Kher
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2017
Field of study

The continuous development in network attack is being a difficult issue in programming industry. Intrusion detection framework is utilized to identify and break down system attack so IDS need to be updated that can screen the framework and can trigger the alert in the framework. Numerous methods have been proposed by various authors to enhance the execution of IDS yet at the same time they can't give legitimate or complete solution.In the proposed work authorsconsidered several classification techniques and selected the most suitable classifiers namely Bayesian Network, Naive bayes, JRip, MLP, IBK, PART and J48 based on the accuracy.These selected classifiers were further ensemble and experiments were performed on the combination of ensemble of classifiers. The combination giving best accuracy will be used in IDS for detection of various attacks. In additiontwo pre-processing techniques were used for the performance analysis. The outcome of these experiment shows improvement in the detection rate of U2R and R2L attack

International Journal on Recent and Innovation Trends in Computing and Communication