140,254 research outputs found
A Comparative Study of Dimensionality Reduction Techniques to Enhance Trace Clustering Performances
Technology Management/ Information System/ EntrepreneurshipProcess mining aims at extracting useful information from event logs. Recently, in order to improve processes, several organizations such as high-tech companies, hospitals, and municipalities utilize process mining techniques. Real-life process logs from such organizations are usually very large and complicated, since the process logs in general contain numerous activities which are executed by many employees. Furthermore, lots of real-life process logs generate spaghetti-like process models due to the complexity of processes. Traditional process mining techniques have problems with discovering and analyzing real-life process logs which come from less structured processes. To overcome the weaknesses of traditional process mining techniques, a trace clustering has been developed. The trace clustering splits an event log into several subsets, and each subset contains homogenous cases. Even though the trace clustering is useful to handle complex process logs, it is time-consuming and computationally expensive due to a large number of features generated from complex logs.
In this thesis, we applied dimensionality reduction (preprocessing) techniques to the trace clustering in order to reduce the number of features. To validate our approach, we conducted experiments to discover relationships between dimensionality reduction techniques and clustering algorithms, and we performed a case study which involves patient treatment processes of a hospital. Among many dimensionality reduction techniques, we used three techniques namely singular value decomposition (SVD), random projection, and principal components analysis (PCA).
The result shows that the trace clustering with dimensionality reduction techniques produce higher average fitness values. Furthermore, processing time of trace clustering is effectively reduced with dimensionality reduction techniques. Moreover, we measured similarity between clustering results to observe the degree of changes in clustering results while applying dimensionality reduction techniques. The similarity is resulted differently according to used clustering algorithm.ope
Alignment-based trace clustering
A novel method to cluster event log traces is presented in this paper. In contrast to the approaches in the literature, the clustering approach of this paper assumes an additional input: a process model that describes the current process. The core idea of the algorithm is to use model traces as centroids of the clusters detected, computed from a generalization of the notion of alignment. This way, model explanations of observed behavior are the driving force to compute the clusters, instead of current model agnostic approaches, e.g., which group log traces merely on their vector-space similarity. We believe alignment-based trace clustering provides results more useful for stakeholders. Moreover, in case of log incompleteness, noisy logs or concept drift, they can be more robust for dealing with highly deviating traces. The technique of this paper can be combined with any clustering technique to provide model explanations to the clusters computed. The proposed technique relies on encoding the individual alignment problems into the (pseudo-)Boolean domain, and has been implemented in our tool DarkSider that uses an open-source solver.Peer ReviewedPostprint (author's final draft
Angular power spectrum of gamma-ray sources for GLAST: blazars and clusters of galaxies
Blazars, a beamed population of active galactic nuclei, radiate high-energy
gamma-rays, and thus are a good target for the Gamma Ray Large Area Space
Telescope (GLAST). As the blazars trace the large-scale structure of the
universe, one may observe spatial clustering of blazars. We calculate the
angular power spectrum of blazars that would be detected by GLAST. We show that
we have the best chance of detecting their clustering at large angular scales,
\theta >~ 10 deg, where shot noise is less important, and the dominant
contribution to the correlation comes from relatively low redshift, z <~ 0.1.
The GLAST can detect the correlation signal, if the blazars detected by GLAST
trace the distribution of low-z quasars observed by optical galaxy surveys,
which have the bias of unity. If the bias of blazars is greater than 1.5, GLAST
will detect the correlation signal unambiguously. We also find that GLAST may
detect spatial clustering of clusters of galaxies in gamma-rays. The shape of
the angular power spectrum is different for blazars and clusters of galaxies;
thus, we can separate these two contributions on the basis of the shape of the
power spectrum.Comment: 14 pages, 10 figures; added references; accepted by MNRA
Large-scale structure in a new deep IRAS galaxy redshift survey
We present here the first results from two recently completed, fully sampled redshift surveys comprising 3703 IRAS Faint Source Survey (FSS) galaxies. An unbiased counts-in-cells analysis finds a clustering strength in broad agreement with other recent redshift surveys and at odds with the standard cold dark matter model. We combine our data with those from the QDOT and 1.2 Jy surveys, producing a single estimate of the IRAS galaxy clustering strength. We compare the data with the power spectrum derived from a mixed dark matter universe. Direct comparison of the clustering strength seen in the IRAS samples with that seen in the APM-Stromlo survey suggests b_O/b_I=1.20+/-0.05 assuming a linear, scale independent biasing. We also perform a cell by cell comparison of our FSS-z sample with galaxies from the first CfA slice, testing the viability of a linear-biasing scheme linking the two. We are able to rule out models in which the FSS-z galaxies identically trace the CfA galaxies on scales 5-20h^{-1}Mpc. On scales of 5 and 10h^{-1}Mpc no linear-biasing model can be found relating the two samples. We argue that this result is expected since the CfA sample includes more elliptical galaxies which have different clustering properties from spirals. On scales of 20h^{-1}Mpc no linear-biasing model with b_O/b_I < 1.70 is acceptable. When comparing the FSS-z galaxies to the CfA spirals, however, the two populations trace the same structures within our uncertaintie
Clustering of MgII absorption line systems around massive galaxies: an important constraint on feedback processes in galaxy formation
We use the latest version of the metal line absorption catalogue of Zhu &
M\'enard (2013) to study the clustering of MgII absorbers around massive
galaxies (~10^11.5 M_sun), quasars and radio-loud AGN with redshifts between
0.4 and 0.75. Clustering is evaluated in two dimensions, by binning absorbers
both in projected radius and in velocity separation. Excess MgII is detected
around massive galaxies out to R_p=20 Mpc. At projected radii less than 800
kpc, the excess extends out to velocity separations of 10,000 km/s. The extent
of the high velocity tail within this radius is independent of the mean stellar
age of the galaxy and whether or not it harbours an active galactic nucleus. We
interpret our results using the publicly available Illustris and Millennium
simulations. Models where the MgII absorbers trace the dark matter particle or
subhalo distributions do not fit the data. They overpredict the clustering on
small scales and do not reproduce the excess high velocity separation MgII
absorbers seen within the virial radius of the halo. The Illustris simulations
which include thermal, but not mechanical feedback from AGN, also do not
provide an adequate fit to the properties of the cool halo gas within the
virial radius. We propose that the large velocity separation MgII absorbers
trace gas that has been pushed out of the dark matter halos, possibly by
multiple episodes of AGN-driven mechanical feedback acting over long
timescales.Comment: 10 pages, 11 figures, accepted in MNRA
- …