88 research outputs found
Histospline Method in Nonparametric Regression Models with Application to Clustered/Longitudinal Data
Kernel and smoothing methods for nonparametric function and curve estimation have been particularly successful in standard settings, where function values are observed subject to independent errors. However, when aspects of the function are known parametrically, or where the sampling scheme has significant structure, it can be quite difficult to adapt standard methods in such a way that they retain good statistical performance and continue to enjoy easy computability and good numerical properties. In particular, when using local linear modeling it is often awkward to both respect the sampling scheme and produce an estimator with good variance properties, without resorting to iterative methods: a good case in point is longitudinal and clustered data. In this paper we suggest a simple approach to overcoming these problems. Using a histospline technique we convert a problem in the continuum to one that is governed by only a finite number of parameters, and which is often explicitly solvable. The simple expedient of running a local linear smoother through the histospline produces a function estimator which achieves optimal nonparametric properties, and the raw histospline-based estimator of the semiparametric component itself attains optimal semiparametric performance. The function estimator can be used in its own right or as the starting value for an iterative scheme based on a different approach to inference
Fluorescence Lifetime Imaging Microscopy (FLIM) Data Analysis with TIMP
Fluorescence Lifetime Imaging Microscopy (FLIM) allows fluorescence lifetime images of biological objects to be collected at 250 nm spatial resolution and at (sub-)nanosecond temporal resolution. Often n_comp kinetic processes underlie the observed fluorescence at all locations, but the intensity of the fluorescence associated with each process varies per-location, i.e., per-pixel imaged. Then the statistical challenge is global analysis of the image: use of the fluorescence decay in time at all locations to estimate the n_comp lifetimes associated with the kinetic processes, as well as the amplitude of each kinetic process at each location. Given that typical FLIM images represent on the order of 10^2 timepoints and 10^3 locations, meeting this challenge is computationally intensive. Here the utility of the TIMP package for R to solve parameter estimation problems arising in FLIM image analysis is demonstrated. Case studies on simulated and real data evidence the applicability of the partitioned variable projection algorithm implemented in TIMP to the problem domain, and showcase options included in the package for the visual validation of models for FLIM data.
The Correlation-Based Method for the Movement Compensation in the Analysis of the Results of FRAP Experiments
This paper presents a computational algorithm
for the detection and compensation for intracellular
movement in the FRAP experiments with focal adhesions
in living cells. The developed approach is based on the
calculation of correlation coefficient. It was validated on
the series of the experimental datasets and shows the
successful results in the comparison with other widelyestablished
methods
ПРОГРАММНЫЙ ПАКЕТ CellDataMiner ДЛЯ АНАЛИЗА ЛЮМИНЕСЦЕНТНЫХ ИЗОБРАЖЕНИЙ РАКОВЫХ КЛЕТОК
The paper presents the software package CellDataMiner for data analysis of lumencent images of cancer cells. The comparative analysis of classification and clustering methods is carried out. The most sufficient of them are implemented in the software. The software package is tested on the dataset of the experimental images of breast cancer.Предлагается программный пакет CellDataMiner для анализа люминесцентных изображений раковых клеток. Проводится сравнительный анализ алгоритмов классификации и кластеризации данных с целью реализации в пакете наиболее эффективных из них. Работоспособность программного обеспечения проверяется на экспериментальных данных, представляющих результаты по исследованию опухоли молочной железы
Комплексный анализ данных при исследовании сложных биомолекулярных систем
The biomolecular technology progress is directly related to the development of effective methods and algorithms for processing a large amount of information obtained by modern high-throughput experimental equipment. The priority task is the development of promising computational tools for the analysis and interpretation of biophysical information using the methods of big data and computer models. An integrated approach to processing large datasets, which is based on the methods of data analysis and simulation modelling, is proposed. This approach allows to determine the parameters of biophysical and optical processes occurring in complex biomolecular systems. The idea of an integrated approach is to use simulation modelling of biophysical processes occurring in the object of study, comparing simulated and most relevant experimental data selected by dimension reduction methods, determining the characteristics of the investigated processes using data analysis algorithms. The application of the developed approach to the study of bimolecular systems in fluorescence spectroscopy experiments is considered. The effectiveness of the algorithms of the approach was verified by analyzing of simulated and experimental data representing the systems of molecules and proteins. The use of complex analysis increases the efficiency of the study of biophysical systems during the analysis of big data.Развитие биомолекулярных технологий напрямую связано с разработкой эффективных методов и алгоритмов обработки большого объема информации, получаемой с помощью современного высокопроизводительного экспериментального оборудования. В числе приоритетных задач – разработка перспективных инструментов анализа и интерпретации биофизической информации с использованием методов анализа больших данных и компьютерных моделей.Предложен комплексный подход к обработке больших наборов данных на основе методов интеллектуального анализа данных и имитационного моделирования, позволяющий определять параметры биофизических и оптических процессов, происходящих в сложных биомолекулярных системах. Идея комплексного подхода состоит в использовании имитационного моделирования биофизических процессов, протекающих в объекте исследования, сравнении отобранных методами снижения размерности смоделированных и наиболее информативных экспериментальных данных, определении характеристик исследуемых процессов с применением алгоритмов интеллектуального анализа данных.Рассмотрено применение разработанного подхода для исследования бимолекулярных систем в экспериментах флуоресцентной спектроскопии. Эффективность алгоритмов подхода проверена в ходе анализа смоделированных и экспериментальных данных, представляющих системы молекул и белков. Применение комплексного анализа повышает эффективность исследования биофизических систем в ходе анализа больших данных
RuPersonaChat: a dialog corpus for personalizing conversational agents
Personalization is one of the keyways to improve the performance of conversational agents. It improves the quality of
user interaction with a conversational agent and increases user satisfaction by increasing the consistency and specificity
of responses. The dialogue with the agent becomes more consistent, the inconsistency of responses is reduced, and
the responses become more specific and interesting. Training and testing personalized conversational agents requires
specific datasets containing facts about a persona and texts of persona’s dialogues where replicas use those facts. There
are several datasets in English and Chinese containing an average of five facts about a persona where the dialogues are
composed by crowdsourcing users who repeatedly imitate different personas. This paper proposes a methodology for
collecting an original dataset containing an extended set of facts about a persona and natural dialogues between personas.
The new RuPersonaChat dataset is based on three different recording scenarios: an interview, a short conversation, and
a long conversation. This is the first dataset for dialogue agent personalization collected which includes both natural
dialogues and extended persona’s descriptions. Additionally, in the dataset, the persona’s replicas are annotated with
the facts about the persona from which they are generated. The methodology for collecting an original corpus of test
data proposed in this paper allows for testing language models for various tasks within the framework of personalized
dialogue agent development. The collected dataset includes 139 dialogues and 2608 replicas. This dataset was used to
test answer and question generation models and the best results were obtained using the Gpt3-large model (perplexity
is equal to 15.7). The dataset can be used to test the personalized dialogue agents’ ability to talk about themselves to the
interlocutor, to communicate with the interlocutor utilizing phatic speech and taking into account the extended context
when communicating with the user
- …