13,269 research outputs found

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Full text link
    Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. Unlike existing teacher score-based distillation methods, our proposed approach employs embedding matching tasks to provide a stronger signal to align the representations of the teacher and student models. In addition, it utilizes query generation to explore the data manifold to reduce the discrepancies between the student and the teacher where training data is sparse. Furthermore, our analysis also motivates novel asymmetric architectures for student models which realizes better embedding alignment without increasing online inference cost. On standard benchmarks like MSMARCO, we show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance

    GeoYCSB: A Benchmark Framework for the Performance and Scalability Evaluation of Geospatial NoSQL Databases

    Get PDF
    The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of spatial data that data stores have to manage. Traditional relational databases reveal limitations in handling such big geospatial data, mainly due to their rigid schema requirements and limited scalability. Numerous NoSQL databases have emerged and actively serve as alternative data stores for big spatial data. This study presents a framework, called GeoYCSB, developed for benchmarking NoSQL databases with geospatial workloads. To develop GeoYCSB, we extend YCSB, a de facto benchmark framework for NoSQL systems, by integrating into its design architecture the new components necessary to support geospatial workloads. GeoYCSB supports both microbenchmarks and macrobenchmarks and facilitates the use of real datasets in both. It is extensible to evaluate any NoSQL database, provided they support spatial queries, using geospatial workloads performed on datasets of any geometric complexity. We use GeoYCSB to benchmark two leading document stores, MongoDB and Couchbase, and present the experimental results and analysis. Finally, we demonstrate the extensibility of GeoYCSB by including a new dataset consisting of complex geometries and using it to benchmark a system with a wide variety of geospatial queries: Apache Accumulo, a wide-column store, with the GeoMesa framework applied on top

    Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

    Full text link
    Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this direction. However, they typically fall short at adequately capturing spatial symmetries present in the visual world, which leads to sample inefficiency, such as when entangling object appearance and pose. In this paper, we present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames. We incorporate equivariance to per-object pose transformations into the attention and generation mechanism of Slot Attention by translating, scaling, and rotating position encodings. These changes result in little computational overhead, are easy to implement, and can result in large gains in terms of data efficiency and overall improvements to object discovery. We evaluate our method on a wide range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVRTex, Objects Room and MultiShapeNet, and show promising improvements on the challenging real-world Waymo Open dataset.Comment: Accepted at ICML 2023. Project page: https://invariantsa.github.io

    Dense Video Object Captioning from Disjoint Supervision

    Full text link
    We propose a new task and model for dense video object captioning -- detecting, tracking, and captioning trajectories of all objects in a video. This task unifies spatial and temporal understanding of the video, and requires fine-grained language description. Our model for dense video object captioning is trained end-to-end and consists of different modules for spatial localization, tracking, and captioning. As such, we can train our model with a mixture of disjoint tasks, and leverage diverse, large-scale datasets which supervise different parts of our model. This results in noteworthy zero-shot performance. Moreover, by finetuning a model from this initialization, we can further improve our performance, surpassing strong image-based baselines by a significant margin. Although we are not aware of other work performing this task, we are able to repurpose existing video grounding datasets for our task, namely VidSTG and VLN. We show our task is more general than grounding, and models trained on our task can directly be applied to grounding by finding the bounding box with the maximum likelihood of generating the query sentence. Our model outperforms dedicated, state-of-the-art models for spatial grounding on both VidSTG and VLN

    Complicated objects: artifacts from the Yuanming Yuan in Victorian Britain

    Get PDF
    The 1860 spoliation of the Summer Palace at the close of the Second Opium War by British and French troops was a watershed event within the development of Britain as an imperialist nation, which guaranteed a market for opium produced in its colony India and demonstrated the power of its armed forces. The distribution of the spoils to officers and diplomatic corps by campaign leaders in Beijing was also a sign of the British Army’s rising power as an instrument of the imperialist state. These conditions would suggest that objects looted from the site would be integrated into an imperialist aesthetic that reflected and promoted the material benefits of military engagement overseas and foregrounded the circumstances of their removal to Britain for campaign members and the British public. This study mines sources dating to the two decades following the war – including British newspapers, auction house records, exhibition catalogs and works of art – to test this hypothesis. Findings show that initial movements of looted objects through the military and diplomatic corps did reinforce notions of imperialist power by enabling campaign members to profit from the spoliation through sales of looted objects and trophy displays. However, material from the Summer Palace arrived at a moment when British manufacturers and cultural leaders were engaged in a national effort to improve the quality of British goods to compete in the international marketplace and looted art was quickly interpolated in this national conversation. Ironically, the same “free trade” imperatives that motivated the invasion energized a new design movement that embraced Chinese ornament. As a consequence, political interpretations of the material outside of military collections were quickly joined by a strong response to Chinese ornament from cultural institutions and design leaders. Art from the Summer Palace held a prominent place at industrial art exhibitions of the postwar period and inspired new designs in a number of mediums. While the availability of Chinese imperial art was the consequence of a military invasion and therefore a product of imperialist expansion, evidence presented here shows that the design response to looted objects was not circumscribed by this political reality. Chinese ornament on imperial wares was ultimately celebrated for its formal qualities and acknowledged links to the Summer Palace were an indicator of good design, not a celebration of victory over a failed Chinese state. Therefore, the looting of the Summer Palace was ultimately an essential factor in the development of modern design, the essence of which is a break with Classical ornament

    Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles

    Get PDF
    This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Subsidiary Entrepreneurial Alertness: Antecedents and Outcomes

    Get PDF
    This thesis brings together concepts from both international business and entrepreneurship to develop a framework of the facilitators of subsidiary innovation and performance. This study proposes that Subsidiary Entrepreneurial Alertness (SEA) facilitates the recognition of opportunities (the origin of subsidiary initiatives). First introduced by Kirzner (1979) in the context of the individual, entrepreneurial alertness (EA) is the ability to notice an opportunity without actively searching. Similarly, to entrepreneurial alertness at the individual level, this study argues that SEA enables the subsidiary to best select opportunities based on resources available. The research further develops our conceptualisation of SEA by drawing on work by Tang et al. (2012) identifying three distinct activities of EA: scanning and search (identifying opportunities unseen by others due to their awareness gaps), association and connection of information, and evaluation and judgement to interpret or anticipate future viability of opportunities. This study then hypothesises that SEA leads to opportunity recognition at the subsidiary level and further hypothesises innovation and performance as outcomes of opportunity recognition. This research brings these arguments together to develop and test a comprehensive theoretical model. The theoretical model is tested through a mail survey of the CEOs/MDs of foreign subsidiaries within the Republic of Ireland (an innovative hub for foreign subsidiaries). This method was selected as the best method to reach the targeted respondent, and due to the depth of knowledge the target respondent holds, the survey can answer the desired question more substantially. The results were examined using partial least squares structural equation modelling (PLS-SEM). The study’s findings confirm two critical aspects of subsidiary context, subsidiary brokerage and subsidiary credibility are positively related to SEA. The study establishes a positive link between SEA and both the generation of innovation and the subsidiary’s performance. This thesis makes three significant contributions to the subsidiary literature as it 1) introduces and develops the concept of SEA, 2) identifies the antecedents of SEA, and 3) demonstrates the impact of SEA on subsidiary opportunity recognition. Implications for subsidiaries, headquarters and policy makers are discussed along with the limitations of the study

    CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

    Full text link
    We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark

    Deep Unrestricted Document Image Rectification

    Full text link
    In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the captured image merely involves a local text region, its rectification quality is degraded and unsatisfactory. Our previously proposed DocTr, a transformer-assisted network for document image rectification, also suffers from this limitation. In this work, we present DocTr++, a novel unified framework for document image rectification, without any restrictions on the input distorted images. Our major technical improvements can be concluded in three aspects. Firstly, we upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. Secondly, we reformulate the pixel-wise mapping relationship between the unrestricted distorted document images and the distortion-free counterparts. The obtained data is used to train our DocTr++ for unrestricted document image rectification. Thirdly, we contribute a real-world test set and metrics applicable for evaluating the rectification quality. To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images. Extensive experiments are conducted, and the results demonstrate the effectiveness and superiority of our method. We hope our DocTr++ will serve as a strong baseline for generic document image rectification, prompting the further advancement and application of learning-based algorithms. The source code and the proposed dataset are publicly available at https://github.com/fh2019ustc/DocTr-Plus
    corecore