11,810 research outputs found

    Converging organoids and extracellular matrix::New insights into liver cancer biology

    Get PDF

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Beam scanning by liquid-crystal biasing in a modified SIW structure

    Get PDF
    A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

    Full text link
    This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.Comment: Data and Code: https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVi

    Fairness Testing: A Comprehensive Survey and Analysis of Trends

    Full text link
    Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test) and testing components (i.e., what to test). Furthermore, we analyze the research focus, trends, and promising directions in the realm of fairness testing. We also identify widely-adopted datasets and open-source tools for fairness testing

    Discriminative Multimodal Learning via Conditional Priors in Generative Models

    Full text link
    Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation demonstrates the benefits of our proposed model, empirical results show that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion, and image and annotation generation

    Generalizable deep learning based medical image segmentation

    Get PDF
    Deep learning is revolutionizing medical image analysis and interpretation. However, its real-world deployment is often hindered by the poor generalization to unseen domains (new imaging modalities and protocols). This lack of generalization ability is further exacerbated by the scarcity of labeled datasets for training: Data collection and annotation can be prohibitively expensive in terms of labor and costs because label quality heavily dependents on the expertise of radiologists. Additionally, unreliable predictions caused by poor model generalization pose safety risks to clinical downstream applications. To mitigate labeling requirements, we investigate and develop a series of techniques to strengthen the generalization ability and the data efficiency of deep medical image computing models. We further improve model accountability and identify unreliable predictions made on out-of-domain data, by designing probability calibration techniques. In the first and the second part of thesis, we discuss two types of problems for handling unexpected domains: unsupervised domain adaptation and single-source domain generalization. For domain adaptation we present a data-efficient technique that adapts a segmentation model trained on a labeled source domain (e.g., MRI) to an unlabeled target domain (e.g., CT), using a small number of unlabeled training images from the target domain. For domain generalization, we focus on both image reconstruction and segmentation. For image reconstruction, we design a simple and effective domain generalization technique for cross-domain MRI reconstruction, by reusing image representations learned from natural image datasets. For image segmentation, we perform causal analysis of the challenging cross-domain image segmentation problem. Guided by this causal analysis we propose an effective data-augmentation-based generalization technique for single-source domains. The proposed method outperforms existing approaches on a large variety of cross-domain image segmentation scenarios. In the third part of the thesis, we present a novel self-supervised method for learning generic image representations that can be used to analyze unexpected objects of interest. The proposed method is designed together with a novel few-shot image segmentation framework that can segment unseen objects of interest by taking only a few labeled examples as references. Superior flexibility over conventional fully-supervised models is demonstrated by our few-shot framework: it does not require any fine-tuning on novel objects of interest. We further build a publicly available comprehensive evaluation environment for few-shot medical image segmentation. In the fourth part of the thesis, we present a novel probability calibration model. To ensure safety in clinical settings, a deep model is expected to be able to alert human radiologists if it has low confidence, especially when confronted with out-of-domain data. To this end we present a plug-and-play model to calibrate prediction probabilities on out-of-domain data. It aligns the prediction probability in line with the actual accuracy on the test data. We evaluate our method on both artifact-corrupted images and images from an unforeseen MRI scanning protocol. Our method demonstrates improved calibration accuracy compared with the state-of-the-art method. Finally, we summarize the major contributions and limitations of our works. We also suggest future research directions that will benefit from the works in this thesis.Open Acces

    optimización da planificación de adquisición de datos LIDAR cara ó modelado 3D de interiores

    Get PDF
    The main objective of this doctoral thesis is the design, validation and implementation of methodologies that allow the geometric and topological modelling of navigable spaces, whether inside buildings or urban environments, to be integrated into three-dimensional geographic information systems (GIS-3D). The input data of this work will consist mainly of point clouds (which can be classified) acquired by LiDAR systems both indoors and outdoors. In addition, the use of BIM infrastructure models and cadastral maps is proposed depending on their availability. Point clouds provide a large amount of environmental information with high accuracy compared to data offered by other acquisition technologies. However, the lack of data structure and volume requires a great deal of processing effort. For this reason, the first step is to structure the data by dividing the input cloud into simpler entities that facilitate subsequent processes. For this first division, the physical elements present in the cloud will be considered, since they can be walls in the case of interior environments or kerbs in the case of exteriors. In order to generate navigation routes adapted to different mobile agents, the next objective will try to establish a semantic subdivision of space according to the functionalities of space. In the case of internal environments, it is possible to use BIM models to evaluate the results and the use of cadastral maps that support the division of the urban environment. Once the navigable space is divided, the design of topologically coherent navigation networks will be parameterized both geometrically and topologically. For this purpose, several spatial discretization techniques, such as 3D tessellations, will be studied to facilitate the establishment of topological relationships, adjacency, connectivity and inclusion between subspaces. Based on the geometric characterization and the topological relations established in the previous phase, the creation of three-dimensional navigation networks with multimodal support will be addressed and different levels of detail will be considered according to the mobility specifications of each agent and its purpose. Finally, the possibility of integrating the networks generated in a GIS-3D visualization system will be considered. For the correct visualization, the level of detail can be adjusted according to geometry and semantics. Aspects such as the type of user or transport, mobility, rights of access to spaces, etc. They must be considered at all times.El objetivo principal de esta tesis doctoral es el diseño, la validación y la implementación de metodologías que permitan el modelado geométrico y topológico de espacios navegables, ya sea de interiores de edificios o entornos urbanos, para integrarse en sistemas de información geográfica tridimensional (SIG). -3D). Los datos de partida de este trabajo consistirán principalmente en nubes de puntos (que pueden estar clasificados) adquiridas por sistemas LiDAR tanto en interiores como en exteriores. Además, se propone el uso de modelos BIM de infraestructuras y mapas catastrales en función de su disponibilidad. Las nubes de puntos proporcionan una gran cantidad de información del entorno con gran precisión con respecto a los datos ofrecidos por otras tecnologías de adquisición. Sin embargo, la falta de estructura de datos y su volumen requiere un gran esfuerzo de procesamiento. Por este motivo, el primer paso que se debe realizar consiste en estructurar los datos dividiendo la nube de entrada en entidades más simples que facilitan los procesos posteriores. Para esta primera división se considerarán los elementos físicos presentes en la nube, ya que pueden ser paredes en el caso de entornos interiores o bordillos en el caso de los exteriores. Con el propósito de generar rutas de navegación adaptadas a diferentes agentes móviles, el próximo objetivo intentará establecer una subdivisión semántica del espacio de acuerdo con las funcionalidades del espacio. En el caso de entornos internos, es posible utilizar modelos BIM para evaluar los resultados y el uso de mapas catastrales que sirven de apoyo en la división del entorno urbano. Una vez que se divide el espacio navegable, se parametrizará tanto geométrica como topológicamente al diseño de redes de navegación topológicamente coherentes. Para este propósito, se estudiarán varias técnicas de discretización espacial, como las teselaciones 3D, para facilitar el establecimiento de relaciones topológicas, la adyacencia, la conectividad y la inclusión entre subespacios. A partir de la caracterización geométrica y las relaciones topológicas establecidas en la fase anterior, se abordará la creación de redes de navegación tridimensionales con soporte multimodal y se considerarán diversos niveles de detalle según las especificaciones de movilidad de cada agente y su propósito. Finalmente, se contemplará la posibilidad de integrar las redes generadas en un sistema de visualización tridimensional 3D SIG 3D. Para la correcta visualización, el nivel de detalle se puede ajustar en función de la geometría y la semántica. Aspectos como el tipo de usuario o transporte, movilidad, derechos de acceso a espacios, etc. Deben ser considerados en todo momento.O obxectivo principal desta tese doutoral é o deseño, validación e implementación de metodoloxías que permitan o modelado xeométrico e topolóxico de espazos navegables, ben sexa de interiores de edificios ou de entornos urbanos, ca fin de seren integrados en Sistemas de Información Xeográfica tridimensionais (SIX-3D). Os datos de partida deste traballo constarán principalmente de nubes de puntos (que poden estar clasificadas) adquiridas por sistemas LiDAR tanto en interiores como en exteriores. Ademáis plantease o uso de modelos BIM de infraestruturas e mapas catastrais dependendo da súa dispoñibilidade. As nubes de puntos proporcionan unha gran cantidade de información do entorno cunha gran precisión respecto os datos que ofrecen outras tecnoloxías de adquisición. Sen embargo, a falta de estrutura dos datos e a seu volume esixe un amplo esforzo de procesado. Por este motivo o primeiro paso a levar a cabo consiste nunha estruturación dos datos mediante a división da nube de entrada en entidades máis sinxelas que faciliten os procesos posteriores. Para esta primeira división consideraranse elementos físicos presentes na nube como poden ser paredes no caso de entornos interiores ou bordillos no caso de exteriores. Coa finalidade de xerar rutas de navegación adaptadas a distintos axentes móbiles, o seguinte obxectivo tratará de establecer unha subdivisión semántica do espazo de acordo as funcionalidades do espazo. No caso de entornos interiores plantease a posibilidade de empregar modelos BIM para avaliar os resultados e o uso de mapas catastrais que sirvan de apoio na división do entorno urbano. Unha vez divido o espazo navigable parametrizarase tanto xeométricamente como topolóxicamene de cara ao deseño de redes de navegación topolóxicamente coherentes. Para este fin estudaranse varias técnicas de discretización de espazos como como son as teselacións 3D co obxectivo de facilitar establecer relacións topolóxicas, de adxacencia, conectividade e inclusión entre subespazos. A partir da caracterización xeométrica e das relación topolóxicas establecidas na fase previa abordarase a creación de redes de navegación tridimensionais con soporte multi-modal e considerando varios niveis de detalle de acordo as especificacións de mobilidade de cada axente e a súa finalidade. Finalmente comtemplarase a posibilidade de integrar as redes xeradas nun sistema SIX 3D visualización tridimensional. Para a correcta visualización o nivel de detalle poderá axustarse en base a xeometría e a semántica. Aspectos como o tipo de usuario ou transporte, mobilidade, dereitos de acceso a espazos, etc. deberán ser considerados en todo momento
    corecore