2,898 research outputs found

    Multi-label learning based semi-global matching forest

    Get PDF
    Semi-Global Matching (SGM) approximates a 2D Markov Random Field (MRF) via multiple 1D scanline optimizations, which serves as a good trade-off between accuracy and efficiency in dense matching. Nevertheless, the performance is limited due to the simple summation of the aggregated costs from all 1D scanline optimizations for the final disparity estimation. SGM-Forest improves the performance of SGM by training a random forest to predict the best scanline according to each scanline’s disparity proposal. The disparity estimated by the best scanline acts as reference to adaptively adopt close proposals for further post-processing. However, in many cases more than one scanline is capable of providing a good prediction. Training the random forest with only one scanline labeled may limit or even confuse the learning procedure when other scanlines can offer similar contributions. In this paper, we propose a multi-label classification strategy to further improve SGM-Forest. Each training sample is allowed to be described by multiple labels (or zero label) if more than one (or none) scanline gives a proper prediction. We test the proposed method on stereo matching datasets, from Middlebury, ETH3D, EuroSDR image matching benchmark, and the 2019 IEEE GRSS data fusion contest. The result indicates that under the framework of SGM-Forest, the multi-label strategy outperforms the single-label scheme consistently

    Efficient Frequent Subtree Mining Beyond Forests

    Get PDF
    A common paradigm in distance-based learning is to embed the instance space into some appropriately chosen feature space equipped with a metric and to define the dissimilarity between instances by the distance of their images in the feature space. If the instances are graphs, then frequent connected subgraphs are a well-suited pattern language to define such feature spaces. Identifying the set of frequent connected subgraphs and subsequently computing embeddings for graph instances, however, is computationally intractable. As a result, existing frequent subgraph mining algorithms either restrict the structural complexity of the instance graphs or require exponential delay between the output of subsequent patterns. Hence distance-based learners lack an efficient way to operate on arbitrary graph data. To resolve this problem, in this thesis we present a mining system that gives up the demand on the completeness of the pattern set to instead guarantee a polynomial delay between subsequent patterns. Complementing this, we devise efficient methods to compute the embedding of arbitrary graphs into the Hamming space spanned by our pattern set. As a result, we present a system that allows to efficiently apply distance-based learning methods to arbitrary graph databases. To overcome the computational intractability of the mining step, we consider only frequent subtrees for arbitrary graph databases. This restriction alone, however, does not suffice to make the problem tractable. We reduce the mining problem from arbitrary graphs to forests by replacing each graph by a polynomially sized forest obtained from a random sample of its spanning trees. This results in an incomplete mining algorithm. However, we prove that the probability of missing a frequent subtree pattern is low. We show empirically that this is true in practice even for very small sized forests. As a result, our algorithm is able to mine frequent subtrees in a range of graph databases where state-of-the-art exact frequent subgraph mining systems fail to produce patterns in reasonable time or even at all. Furthermore, the predictive performance of our patterns is comparable to that of exact frequent connected subgraphs, where available. The above method considers polynomially many spanning trees for the forest, while many graphs have exponentially many spanning trees. The number of patterns found by our mining algorithm can be negatively influenced by this exponential gap. We hence propose a method that can (implicitly) consider forests of exponential size, while remaining computationally tractable. This results in a higher recall for our incomplete mining algorithm. Furthermore, the methods extend the known positive results on the tractability of exact frequent subtree mining to a novel class of transaction graphs. We conjecture that the next natural extension of our results to a larger transaction graph class is at least as difficult as proving whether P = NP, or not. Regarding the graph embedding step, we apply a similar strategy as in the mining step. We represent a novel graph by a forest of its spanning trees and decide whether the frequent trees from the mining step are subgraph isomorphic to this forest. As a result, the embedding computation has one-sided error with respect to the exact subgraph isomorphism test but is computationally tractable. Furthermore, we show that we can leverage a partial order on the pattern set. This structure can be used to reduce the runtime of the embedding computation dramatically. For the special case of Jaccard-similarity between graph embeddings, a further substantial reduction of runtime can be achieved using min-hashing. The Jaccard-distance can be approximated using small sketch vectors that can be computed fast, again using the partial order on the tree patterns

    Parallel Multistage Wide Neural Network

    Get PDF
    Deep learning networks have achieved great success in many areas such as in large scale image processing. They usually need large computing resources and time, and process easy and hard samples inefficiently in the same way. Another undesirable problem is that the network generally needs to be retrained to learn new incoming data. Efforts have been made to reduce the computing resources and realize incremental learning by adjusting architectures, such as scalable effort classifiers, multi-grained cascade forest (gc forest), conditional deep learning (CDL), tree CNN, decision tree structure with knowledge transfer (ERDK), forest of decision trees with RBF networks and knowledge transfer (FDRK). In this paper, a parallel multistage wide neural network (PMWNN) is presented. It is composed of multiple stages to classify different parts of data. First, a wide radial basis function (WRBF) network is designed to learn features efficiently in the wide direction. It can work on both vector and image instances, and be trained fast in one epoch using subsampling and least squares (LS). Secondly, successive stages of WRBF networks are combined to make up the PMWNN. Each stage focuses on the misclassified samples of the previous stage. It can stop growing at an early stage, and a stage can be added incrementally when new training data is acquired. Finally, the stages of the PMWNN can be tested in parallel, thus speeding up the testing process. To sum up, the proposed PMWNN network has the advantages of (1) fast training, (2) optimized computing resources, (3) incremental learning, and (4) parallel testing with stages. The experimental results with the MNIST, a number of large hyperspectral remote sensing data, CVL single digits, SVHN datasets, and audio signal datasets show that the WRBF and PMWNN have the competitive accuracy compared to learning models such as stacked auto encoders, deep belief nets, SVM, MLP, LeNet-5, RBF network, recently proposed CDL, broad learning, gc forest etc. In fact, the PMWNN has often the best classification performance

    Realistic reconstruction and rendering of detailed 3D scenarios from multiple data sources

    Get PDF
    During the last years, we have witnessed significant improvements in digital terrain modeling, mainly through photogrammetric techniques based on satellite and aerial photography, as well as laser scanning. These techniques allow the creation of Digital Elevation Models (DEM) and Digital Surface Models (DSM) that can be streamed over the network and explored through virtual globe applications like Google Earth or NASA WorldWind. The resolution of these 3D scenes has improved noticeably in the last years, reaching in some urban areas resolutions up to 1m or less for DEM and buildings, and less than 10 cm per pixel in the associated aerial imagery. However, in rural, forest or mountainous areas, the typical resolution for elevation datasets ranges between 5 and 30 meters, and typical resolution of corresponding aerial photographs ranges between 25 cm to 1 m. This current level of detail is only sufficient for aerial points of view, but as the viewpoint approaches the surface the terrain loses its realistic appearance. One approach to augment the detail on top of currently available datasets is adding synthetic details in a plausible manner, i.e. including elements that match the features perceived in the aerial view. By combining the real dataset with the instancing of models on the terrain and other procedural detail techniques, the effective resolution can potentially become arbitrary. There are several applications that do not need an exact reproduction of the real elements but would greatly benefit from plausibly enhanced terrain models: videogames and entertainment applications, visual impact assessment (e.g. how a new ski resort would look), virtual tourism, simulations, etc. In this thesis we propose new methods and tools to help the reconstruction and synthesis of high-resolution terrain scenes from currently available data sources, in order to achieve realistically looking ground-level views. In particular, we decided to focus on rural scenarios, mountains and forest areas. Our main goal is the combination of plausible synthetic elements and procedural detail with publicly available real data to create detailed 3D scenes from existing locations. Our research has focused on the following contributions: - An efficient pipeline for aerial imagery segmentation - Plausible terrain enhancement from high-resolution examples - Super-resolution of DEM by transferring details from the aerial photograph - Synthesis of arbitrary tree picture variations from a reduced set of photographs - Reconstruction of 3D tree models from a single image - A compact and efficient tree representation for real-time rendering of forest landscapesDurant els darrers anys, hem presenciat avenços significatius en el modelat digital de terrenys, principalment gràcies a tècniques fotogramètriques, basades en fotografia aèria o satèl·lit, i a escàners làser. Aquestes tècniques permeten crear Models Digitals d'Elevacions (DEM) i Models Digitals de Superfícies (DSM) que es poden retransmetre per la xarxa i ser explorats mitjançant aplicacions de globus virtuals com ara Google Earth o NASA WorldWind. La resolució d'aquestes escenes 3D ha millorat considerablement durant els darrers anys, arribant a algunes àrees urbanes a resolucions d'un metre o menys per al DEM i edificis, i fins a menys de 10 cm per píxel a les fotografies aèries associades. No obstant, en entorns rurals, boscos i zones muntanyoses, la resolució típica per a dades d'elevació es troba entre 5 i 30 metres, i per a les corresponents fotografies aèries varia entre 25 cm i 1m. Aquest nivell de detall només és suficient per a punts de vista aeris, però a mesura que ens apropem a la superfície el terreny perd tot el realisme. Una manera d'augmentar el detall dels conjunts de dades actuals és afegint a l'escena detalls sintètics de manera plausible, és a dir, incloure elements que encaixin amb les característiques que es perceben a la vista aèria. Així, combinant les dades reals amb instàncies de models sobre el terreny i altres tècniques de detall procedural, la resolució efectiva del model pot arribar a ser arbitrària. Hi ha diverses aplicacions per a les quals no cal una reproducció exacta dels elements reals, però que es beneficiarien de models de terreny augmentats de manera plausible: videojocs i aplicacions d'entreteniment, avaluació de l'impacte visual (per exemple, com es veuria una nova estació d'esquí), turisme virtual, simulacions, etc. En aquesta tesi, proposem nous mètodes i eines per ajudar a la reconstrucció i síntesi de terrenys en alta resolució partint de conjunts de dades disponibles públicament, per tal d'aconseguir vistes a nivell de terra realistes. En particular, hem decidit centrar-nos en escenes rurals, muntanyes i àrees boscoses. El nostre principal objectiu és la combinació d'elements sintètics plausibles i detall procedural amb dades reals disponibles públicament per tal de generar escenes 3D d'ubicacions existents. La nostra recerca s'ha centrat en les següents contribucions: - Un pipeline eficient per a segmentació d'imatges aèries - Millora plausible de models de terreny a partir d'exemples d’alta resolució - Super-resolució de models d'elevacions transferint-hi detalls de la fotografia aèria - Síntesis d'un nombre arbitrari de variacions d’imatges d’arbres a partir d'un conjunt reduït de fotografies - Reconstrucció de models 3D d'arbres a partir d'una única fotografia - Una representació compacta i eficient d'arbres per a navegació en temps real d'escenesPostprint (published version

    DecisioNet: A Binary-Tree Structured Neural Network

    Full text link
    Deep neural networks (DNNs) and decision trees (DTs) are both state-of-the-art classifiers. DNNs perform well due to their representational learning capabilities, while DTs are computationally efficient as they perform inference along one route (root-to-leaf) that is dependent on the input data. In this paper, we present DecisioNet (DN), a binary-tree structured neural network. We propose a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model. DecisioNet takes the best of both worlds - it uses neural modules to perform representational learning and utilizes its tree structure to perform only a portion of the computations. We evaluate various DN architectures, along with their corresponding baseline models on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We show that the DN variants achieve similar accuracy while significantly reducing the computational cost of the original network.Comment: We are happy to announce that the paper has been accepted to the ACCV2022 conference. The final version of the paper will be published soon. In the meantime, we are finally able to share the code (link below
    • …
    corecore