879 research outputs found
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
MoVideo: Motion-Aware Video Generation with Diffusion Models
While recent years have witnessed great progress on using diffusion models
for video generation, most of them are simple extensions of image generation
frameworks, which fail to explicitly consider one of the key differences
between videos and images, i.e., motion. In this paper, we propose a novel
motion-aware video generation (MoVideo) framework that takes motion into
consideration from two aspects: video depth and optical flow. The former
regulates motion by per-frame object distances and spatial layouts, while the
later describes motion by cross-frame correspondences that help in preserving
fine details and improving temporal consistency. More specifically, given a key
frame that exists or generated from text prompts, we first design a diffusion
model with spatio-temporal modules to generate the video depth and the
corresponding optical flows. Then, the video is generated in the latent space
by another spatio-temporal diffusion model under the guidance of depth, optical
flow-based warped latent video and the calculated occlusion mask. Lastly, we
use optical flows again to align and refine different frames for better video
decoding from the latent space to the pixel space. In experiments, MoVideo
achieves state-of-the-art results in both text-to-video and image-to-video
generation, showing promising prompt consistency, frame consistency and visual
quality.Comment: project homepage: https://jingyunliang.github.io/MoVide
From Text to Knowledge
The global information space provided by the World Wide Web has changed dramatically
the way knowledge is shared all over the world. To make this unbelievable huge information
space accessible, search engines index the uploaded contents and provide efficient
algorithmic machinery for ranking the importance of documents with respect to an input
query. All major search engines such as Google, Yahoo or Bing are keyword-based, which
is indisputable a very powerful tool for accessing information needs centered around documents.
However, this unstructured, document-oriented paradigm of the World Wide Web has serious drawbacks, when searching for specific knowledge about real-world entities.
When asking for advanced facts about entities, today's search engines are not very good in providing accurate answers. Hand-built knowledge bases such as Wikipedia or its structured counterpart DBpedia are excellent sources that provide common facts. However, these knowledge bases are far from being complete and most of the knowledge lies still buried in unstructured documents.
Statistical machine learning methods have the great potential to help to bridge the gap between text and knowledge by (semi-)automatically transforming the unstructured representation of the today's World Wide Web to a more structured representation. This
thesis is devoted to reduce this gap with Probabilistic Graphical Models. Probabilistic
Graphical Models play a crucial role in modern pattern recognition as they merge two important fields of applied mathematics: Graph Theory and Probability Theory.
The first part of the thesis will present a novel system called Text2SemRel that is able to (semi-)automatically construct knowledge bases from textual document collections. The resulting knowledge base consists of facts centered around entities and their relations.
Essential part of the system is a novel algorithm for extracting relations between entity
mentions that is based on Conditional Random Fields, which are Undirected Probabilistic Graphical Models.
In the second part of the thesis, we will use the power of Directed Probabilistic Graphical Models to solve important knowledge discovery tasks in semantically annotated large document collections. In particular, we present extensions of the Latent Dirichlet Allocation framework that are able to learn in an unsupervised way the statistical semantic
dependencies between unstructured representations such as documents and their semantic annotations. Semantic annotations of documents might refer to concepts originating from a thesaurus or ontology but also to user-generated informal tags in social tagging
systems. These forms of annotations represent a first step towards the conversion to a more structured form of the World Wide Web.
In the last part of the thesis, we prove the large-scale applicability of the proposed fact extraction system Text2SemRel. In particular, we extract semantic relations between genes and diseases from a large biomedical textual repository. The resulting knowledge
base contains far more potential disease genes exceeding the number of disease genes that
are currently stored in curated databases. Thus, the proposed system is able to unlock
knowledge currently buried in the literature. The literature-derived human gene-disease
network is subject of further analysis with respect to existing curated state of the art
databases. We analyze the derived knowledge base quantitatively by comparing it with
several curated databases with regard to size of the databases and properties of known
disease genes among other things. Our experimental analysis shows that the facts extracted
from the literature are of high quality
- …