89 research outputs found

    Process Monitoring and Data Mining with Chemical Process Historical Databases

    Get PDF
    Modern chemical plants have distributed control systems (DCS) that handle normal operations and quality control. However, the DCS cannot compensate for fault events such as fouling or equipment failures. When faults occur, human operators must rapidly assess the situation, determine causes, and take corrective action, a challenging task further complicated by the sheer number of sensors. This information overload as well as measurement noise can hide information critical to diagnosing and fixing faults. Process monitoring algorithms can highlight key trends in data and detect faults faster, reducing or even preventing the damage that faults can cause. This research improves tools for process monitoring on different chemical processes. Previously successful monitoring methods based on statistics can fail on non-linear processes and processes with multiple operating states. To address these challenges, we develop a process monitoring technique based on multiple self-organizing maps (MSOM) and apply it in industrial case studies including a simulated plant and a batch reactor. We also use standard SOM to detect a novel event in a separation tower and produce contribution plots which help isolate the causes of the event. Another key challenge to any engineer designing a process monitoring system is that implementing most algorithms requires data organized into “normal” and “faulty”; however, data from faulty operations can be difficult to locate in databases storing months or years of operations. To assist in identifying faulty data, we apply data mining algorithms from computer science and compare how they cluster chemical process data from normal and faulty conditions. We identify several techniques which successfully duplicated normal and faulty labels from expert knowledge and introduce a process data mining software tool to make analysis simpler for practitioners. The research in this dissertation enhances chemical process monitoring tasks. MSOM-based process monitoring improves upon standard process monitoring algorithms in fault identification and diagnosis tasks. The data mining research reduces a crucial barrier to the implementation of monitoring algorithms. The enhanced monitoring introduced can help engineers develop effective and scalable process monitoring systems to improve plant safety and reduce losses from fault events

    Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms

    Full text link
    Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks for different constraints are used as basis functions to encode the observed functional activity at a given time point. These encodings are decoded using machine learning to compare both the algorithms and their assumptions, using the time series weights to predict whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. For classifying cognitive activity, the sparse coding algorithm of L1L1 Regularized Learning consistently outperformed 4 variations of ICA across different numbers of networks and noise levels (p<<0.001). The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy. Within each algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<<0.001). The success of sparse coding algorithms may suggest that algorithms which enforce sparse coding, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA

    Improved Feature Extraction, Feature Selection, and Identification Techniques That Create a Fast Unsupervised Hyperspectral Target Detection Algorithm

    Get PDF
    This research extends the emerging field of hyperspectral image (HSI) target detectors that assume a global linear mixture model (LMM) of HSI and employ independent component analysis (ICA) to unmix HSI images. Via new techniques to fully automate feature extraction, feature selection, and target pixel identification, an autonomous global anomaly detector, AutoGAD, has been developed for potential employment in an operational environment for real-time processing of HSI targets. For dimensionality reduction (initial feature extraction prior to ICA), a geometric solution that effectively approximates the number of distinct spectral signals is presented. The solution is based on the theory of the shape of the eigenvalue curve of the covariance matrix of spectral data containing noise. For feature selection, previously a subjective definition called significant kurtosis change was used to denote the separation between targets classes and non-target classes. This research presents two new measures, potential target signal to noise ratio (PT SNR) and max pixel score which computed for each of the ICA features to create a new two dimensional feature space where the overlap between target and non-target classes is reduced compared to the one dimensional kurtosis value feature space. Finally, after target feature selection, adaptive noise filtering, but with an iterative approach, is applied to the signals. The effect is a reduction in the power of the noise while preserving the power of the target signal prior to target identification to reduce false positive detections. A zero-detection histogram method is applied to the smoothed signals to identify target locations to the user. MATLAB code for the AutoGAD algorithm is provided

    Informed source extraction from a mixture of sources exploiting second order temporal structure

    Get PDF
    Extracting a specific signal from among man

    Hybrid solutions to instantaneous MIMO blind separation and decoding: narrowband, QAM and square cases

    Get PDF
    Future wireless communication systems are desired to support high data rates and high quality transmission when considering the growing multimedia applications. Increasing the channel throughput leads to the multiple input and multiple output and blind equalization techniques in recent years. Thereby blind MIMO equalization has attracted a great interest.Both system performance and computational complexities play important roles in real time communications. Reducing the computational load and providing accurate performances are the main challenges in present systems. In this thesis, a hybrid method which can provide an affordable complexity with good performance for Blind Equalization in large constellation MIMO systems is proposed first. Saving computational cost happens both in the signal sep- aration part and in signal detection part. First, based on Quadrature amplitude modulation signal characteristics, an efficient and simple nonlinear function for the Independent Compo- nent Analysis is introduced. Second, using the idea of the sphere decoding, we choose the soft information of channels in a sphere, and overcome the so- called curse of dimensionality of the Expectation Maximization (EM) algorithm and enhance the final results simultaneously. Mathematically, we demonstrate in the digital communication cases, the EM algorithm shows Newton -like convergence.Despite the widespread use of forward -error coding (FEC), most multiple input multiple output (MIMO) blind channel estimation techniques ignore its presence, and instead make the sim- plifying assumption that the transmitted symbols are uncoded. However, FEC induces code structure in the transmitted sequence that can be exploited to improve blind MIMO channel estimates. In final part of this work, we exploit the iterative channel estimation and decoding performance for blind MIMO equalization. Experiments show the improvements achievable by exploiting the existence of coding structures and that it can access the performance of a BCJR equalizer with perfect channel information in a reasonable SNR range. All results are confirmed experimentally for the example of blind equalization in block fading MIMO systems

    Unsupervised spectral classification of astronomical x-ray sources based on independent component analysis

    Get PDF
    By virtue of the sensitivity of the XMM-Newton and Chandra X-ray telescopes, astronomers are capable of probing increasingly faint X-ray sources in the universe. On the other hand, we have to face a tremendous amount of X-ray imaging data collected by these observatories. We developed an efficient framework to classify astronomical X-ray sources through natural grouping of their reduced dimensionality profiles, which can faithfully represent the high dimensional spectral information. X-ray imaging spectral extraction techniques, which use standard astronomical software (e.g., SAS, FTOOLS and CIAO), provide an efficient means to investigate multiple X-ray sources in one or more observations at the same time. After applying independent component analysis (ICA), the high-dimensional spectra can be expressed by reduced dimensionality profiles in an independent space. An infrared spectral data set obtained for the stars in the Large Magellanic Cloud,observed by the Spitzer Space Telescope Infrared Spectrograph, has been used to test the unsupervised classification algorithms. The least classification error is achieved by the hierarchical clustering algorithm with the average linkage of the data, in which each spectrum is scaled by its maximum amplitude. Then we applied a similar hierarchical clustering algorithm based on ICA to a deep XMM-Newton X-ray observation of the field of the eruptive young star V1647 Ori. Our classification method establishes that V1647 Ori is a spectrally distinct X-ray source in this field. Finally, we classified the Xray sources in the central field of a large survey, the Subaru/XMM-Newton deep survey, which contains a large population of high-redshift extragalactic sources. A small group of sources with maximum spectral peak above 1 keV are easily picked out from the spectral data set, and these sources appear to be associated with active galaxies. In general, these experiments confirm that our classification framework is an efficient X-ray imaging spectral analysis tool that gives astronomers insight into the fundamental physicalmechanisms responsible for X-ray emission and, furthermore, can be applied to a wide range of the electromagnetic spectrum

    Application of Singular Spectrum Analysis (SSA), Independent Component Analysis (ICA) and Empirical Mode Decomposition (EMD) for automated solvent suppression and automated baseline and phase correction from multi-dimensional NMR spectra

    Get PDF
    A common problem on protein structure determination by NMR spectroscopy is due to the solvent artifact. Typically, a deuterated solvent is used instead of normal water. However, several experimental methods have been developed to suppress the solvent signal in the case that one has to use a protonated solvent or if the signals of the remaining protons even in a highly deuterated sample are still too strong. For a protein dissolved in 90% H2O / 10% D2O, the concentration of solvent protons is about five orders of magnitude greater than the concentration of the protons of interest in the solute. Therefore, the evaluation of multi-dimensional NMR spectra may be incomplete since certain resonances of interest (e.g. Hα proton resonances) are hidden by the solvent signal and since signal parts of the solvent may be misinterpreted as cross peaks originating from the protein. The experimental solvent suppression procedures typically are not able to recover these significant protein signals. Many post-processing methods have been designed in order to overcome this problem. In this work, several algorithms for the suppression of the water signal have been developed and compared. In particular, it has been shown that the Singular Spectrum Analysis (SSA) can be applied advantageously to remove the solvent artifact from NMR spectra of any dimensionality both digitally and analogically acquired. In particular, the investigated time domain signals (FIDs) are decomposed into water and protein related components by means of an initial embedding of the data in the space of time-delayed coordinates. Eigenvalue decomposition is applied on these data and the component with the highest variance (typically represented by the dominant solvent signal) is neglected before reverting the embedding. Pre-processing (group delay management and signal normalization) and post-processing (inverse normalization, Fourier transformation and phase and baseline corrections) of the NMR data is mandatory in order to obtain a better performance of the suppression. The optimal embedding dimension has been empirically determined in accordance to a specific qualitative and quantitative analysis of the extracted components applied on a back-calculated two-dimensional spectrum of HPr protein from Staphylococcus aureus. Moreover, the investigation of experimental data (three-dimensional 1H13C HCCH-TOCSY spectrum of Trx protein from Plasmodium falciparum and two-dimensional NOESY and TOCSY spectra of HPr protein from Staphylococcus aureus) has revealed the ability of the algorithm to recover resonances hidden underneath the water signal. Pathological diseases and the effects of drugs and lifestyle can be detected from NMR spectroscopy applied on samples containing biofluids (e.g. urine, blood, saliva). The detection of signals of interest in such spectra can be hampered by the solvent as well. The SSA has also been successfully applied to one-dimensional urine, blood and cell spectra. The algorithm for automated solvent suppression has been introduced in the AUREMOL software package (AUREMOL_SSA). It is optionally followed by an automated baseline correction in the frequency domain (AUREMOL_ALS) that can be also used out the former algorithm. The automated recognition of baseline points is differently performed in dependence on the dimensionality of the data. In order to investigate the limitations of the SSA, it has been applied to spectra whose dominant signal is not the solvent (as in case of watergate solvent suppression and in case of back-calculated data not including any experimental water signal) determining the optimal solvent-to-solute ratio. The Independent Component Analysis (ICA) represents a valid alternative for water suppression when the solvent signal is not the dominant one in the spectra (when it is smaller than the half of the strongest solute resonance). In particular, two components are obtained: the solvent and the solute. The ICA needs as input at least as many different spectra (mixtures) as the number of components (source signals), thus the definition of a suitable protocol for generating a dataset of one-dimensional ICA-tailored inputs is straightforward. The ICA has revealed to overcome the SSA limitations and to be able to recover resonances of interest that cannot be detected applying the SSA. The ICA avoids all the pre- and post-processing steps, since it is directly applied in the frequency domain. On the other hand, the selection of the component to be removed is automatically detected in the SSA case (having the highest variance). In the ICA, a visual inspection of the extracted components is still required considering that the output is permutable and scale and sign ambiguities may occur. The Empirical Mode Decomposition (EMD) has revealed to be more suitable for automated phase correction than for solvent suppression purposes. It decomposes the FID into several intrinsic mode functions (IMFs) whose frequency of oscillation decreases from the first to the last ones (that identifies the solvent signal). The automatically identified non-baseline regions in the Fourier transform of the sum of the first IMFs are separately evaluated and genetic algorithms are applied in order to determine the zero- and first-order terms suitable for an optimal phase correction. The SSA and the ALS algorithms have been applied before assigning the two-dimensional NOESY spectrum (with the program KNOWNOE) of the PSCD4-domain of the pleuralin protein in order to increase the number of already existing distance restraints. A new routine to derive 3JHNHα couplings from torsion angles (Karplus relation) and vice versa, has been introduced in the AUREMOL software. Using the newly developed tools a refined three-dimensional structure of the PSCD4-domain could be obtained

    Estimation of offsets in GPS time-series and application to the detection of earthquake deformation in the far-field

    Get PDF
    Extracting geophysical signals from Global Positioning System (GPS) coordinate time-series is a well-established practice that has led to great insights into how the Earth deforms. Often small discontinuities are found in such time-series and are traceable to either broad-scale deformation (i.e. earthquakes) or discontinuities due to equipment changes and/or failures. Estimating these offsets accurately enables the identification of coseismic deformation estimates in the former case, and the removal of unwanted signals in the latter case which then allows tectonic rates to be estimated more accurately. We develop a method to estimate accurately discontinuities in time series of GPS positions at specified epochs, based on a so-called ‘offset series’. The offset series are obtained by varying the amount of GPS data before and after an event while estimating the offset. Two methods, a mean and a weighted mean method, are then investigated to produce the estimated discontinuity from the offset series. The mean method estimates coseismic offsets without making assumptions about geophysical processes that may be present in the data (i.e. tectonic rate, seasonal variations), whereas the weighted mean method includes estimating coseismic offsets with a model of these processes. We investigate which approach is the most appropriate given certain lengths of available data and noise within the time-series themselves. For the Sumatra–Andaman event, with 4.5 yr of pre-event data, we show that between 2 and 3 yr of post-event data are required to produce accurate offset estimates with the weighted mean method. With less data, the mean method should be used, but the uncertainties of the estimated discontinuity are larger
    corecore