135 research outputs found
Outlier detection and ranking based on subspace clustering
Detecting outliers is an important task for many applications
including fraud detection or consistency validation in real world
data. Particularly in the presence of uncertain data or imprecise data,
similar objects regularly deviate in their attribute values. The notion
of outliers has thus to be defined carefully. When considering outlier
detection as a task which is complementary to clustering, binary decisions
whether an object is regarded to be an outlier or not seem to be
near at hand. For high-dimensional data, however, objects may belong
to different clusters in different subspaces. More fine-grained concepts to
define outliers are therefore demanded. By our new OutRank approach,
we address outlier detection in heterogeneous high dimensional data and
propose a novel scoring function that provides a consistent model for
ranking outliers in the presence of different attribute types. Preliminary
experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets
A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial
Методика исследования керна баженовской свиты для выделения естественных коллекторов и определения подсчетных параметров запасов и ресурсов
RainAI -- Precipitation Nowcasting from Satellite Data
This paper presents a solution to the Weather4Cast 2023 competition, where
the goal is to forecast high-resolution precipitation with an 8-hour lead time
using lower-resolution satellite radiance images. We propose a simple, yet
effective method for spatiotemporal feature learning using a 2D U-Net model,
that outperforms the official 3D U-Net baseline in both performance and
efficiency. We place emphasis on refining the dataset, through importance
sampling and dataset preparation, and show that such techniques have a
significant impact on performance. We further study an alternative
cross-entropy loss function that improves performance over the standard mean
squared error loss, while also enabling models to produce probabilistic
outputs. Additional techniques are explored regarding the generation of
predictions at different lead times, specifically through Conditioning Lead
Time. Lastly, to generate high-resolution forecasts, we evaluate standard and
learned upsampling methods. The code and trained parameters are available at
https://github.com/rafapablos/w4c23-rainai
- …