59,412 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks
We introduce a framework for the modeling of sequential data capturing
pathways of varying lengths observed in a network. Such data are important,
e.g., when studying click streams in information networks, travel patterns in
transportation systems, information cascades in social networks, biological
pathways or time-stamped social interactions. While it is common to apply graph
analytics and network analysis to such data, recent works have shown that
temporal correlations can invalidate the results of such methods. This raises a
fundamental question: when is a network abstraction of sequential data
justified? Addressing this open question, we propose a framework which combines
Markov chains of multiple, higher orders into a multi-layer graphical model
that captures temporal correlations in pathways at multiple length scales
simultaneously. We develop a model selection technique to infer the optimal
number of layers of such a model and show that it outperforms previously used
Markov order detection techniques. An application to eight real-world data sets
on pathways and temporal networks shows that it allows to infer graphical
models which capture both topological and temporal characteristics of such
data. Our work highlights fallacies of network abstractions and provides a
principled answer to the open question when they are justified. Generalizing
network representations to multi-order graphical models, it opens perspectives
for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy
available on gitHu
Start Time and Duration Distribution Estimation in Semi-Structured Processes
Semi-structured processes are business workflows, where the execution of the workflow is not completely controlled by a workflow engine, i.e., an implementation of a formal workflow model. Examples are workflows where actors potentially have interaction with customers reporting the result of the interaction in a process aware information system. Building a performance model for resource management in these processes is difficult since the required information is only partially recorded. In this paper we propose a systematic approach for the creation of an event log that is suitable for available process mining tools. This event log is created by an incrementally cleansing of data. The proposed approach is evaluated in an experiment
Visual analytics in FCA-based clustering
Visual analytics is a subdomain of data analysis which combines both human
and machine analytical abilities and is applied mostly in decision-making and
data mining tasks. Triclustering, based on Formal Concept Analysis (FCA), was
developed to detect groups of objects with similar properties under similar
conditions. It is used in Social Network Analysis (SNA) and is a basis for
certain types of recommender systems. The problem of triclustering algorithms
is that they do not always produce meaningful clusters. This article describes
a specific triclustering algorithm and a prototype of a visual analytics
platform for working with obtained clusters. This tool is designed as a testing
frameworkis and is intended to help an analyst to grasp the results of
triclustering and recommender algorithms, and to make decisions on
meaningfulness of certain triclusters and recommendations.Comment: 11 pages, 3 figures, 2 algorithms, 3rd International Conference on
Analysis of Images, Social Networks and Texts (AIST'2014). in Supplementary
Proceedings of the 3rd International Conference on Analysis of Images, Social
Networks and Texts (AIST 2014), Vol. 1197, CEUR-WS.org, 201
Technological safety of sustainable development of coal enterprises
Purpose. Substantiation of conceptual base of searching the ways of threats prevention to sustainable development of coal enterprises in the presence of multidirectional vectors of economic pressure of the external environment.
Methods. Methods of the structural and comparative analysis for an assessment of usage of the main definitions of a research, their essence and communication with other categories which define efficiency of development of the coal enterprise are used for the solution of the set tasks in the work; groups and classifications are used for systematization of types of economic security and stability of the enterprise, and also factors which cause them.
Findings. The analysis of main definitions that reflecting essence of such scientific phenomena as “sustainable development” and “economic security” of the enterprise is conducted. Actuality and reasonability of scientific research conducting on formation of methodical base and tools of an assessment of a technological component of sustainable development safety of coal enterprises is substantiated.
Originality. Research of opportunities of the comprehensive programs creation of adaptive management of the mining enterprise including a retrospective and perspective assessment of a pathway of its development.
Practical implications. Introduction of adaptation activity at the coal enterprises in aspect of process of technological safety ensuring.Мета. Обґрунтування концептуальної бази пошуку шляхів запобігання загрозам сталого розвитку вугільних підприємств при наявності різноспрямованих векторів економічного тиску зовнішнього середовища.
Методика. Для вирішення поставлених у роботі завдань використані методи структурно-порівняльного аналізу – для оцінки використання основних дефініцій дослідження, їх сутності та зв’язку з іншими категоріями, які визначають ефективність розвитку вугільного підприємства; групування й класифікації – для систематизації видів економічної безпеки і стійкості підприємства, а також факторів, які їх обумовлюють.
Результати. Проведено аналіз основних визначень, що відображають сутність таких наукових феноменів, як “сталий розвиток” і “економічна безпека” підприємства. Обґрунтована актуальність та доцільність проведення наукових досліджень з формування методичної бази й інструментарію оцінки технологічної складової безпеки сталого розвитку вугільних підприємств.
Наукова новизна. Дослідження можливостей створення комплексних програм адаптивного управління гірничим підприємством, що включають ретроспективну і перспективну оцінки траєкторії його розвитку.
Практична значимість. Впровадження адаптаційної діяльності на вугільних підприємствах в аспекті процесу забезпечення технологічної безпеки.Цель. Обоснование концептуальной базы поиска путей предотвращения угроз устойчивому развитию угольных предприятий при наличии разнонаправленных векторов экономического давления внешней среды.
Методика. Для решения поставленных в работе задач использованы методы структурно-сравнительного анализа – для оценки использования основных дефиниций исследования, их сущности и связи с другими категориями, которые определяют эффективность развития угольного предприятия; группировки и классификации – для систематизации видов экономической безопасности и устойчивости предприятия, а также факторов, которые их обусловливают.
Результаты. Проведен анализ основных определений, отражающих сущность таких научных феноменов, как “устойчивое развитие” и “экономическая безопасность” предприятия. Обоснована актуальность и целесообразность проведения научных исследований по формированию методической базы и инструментария оценки технологической составляющей безопасности устойчивого развития угольных предприятий.
Научная новизна. Исследование возможностей создания комплексных программ адаптивного управления горным предприятием, включающих ретроспективную и перспективную оценку траектории его развития.
Практическая значимость. Внедрение адаптационной деятельности на угольных предприятиях в аспекте процесса обеспечения технологической безопасности.Authors express gratitude for the help and consultations during work performing for director of management of coal mining of LLC “DTEK Energy” Mykhailo Barabash
Special Libraries, October 1919
Volume 10, Issue 7https://scholarworks.sjsu.edu/sla_sl_1919/1006/thumbnail.jp
Special Libraries, February 1916
Volume 7, Issue 2https://scholarworks.sjsu.edu/sla_sl_1916/1001/thumbnail.jp
- …