13,269 research outputs found
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Large neural models (such as Transformers) achieve state-of-the-art
performance for information retrieval (IR). In this paper, we aim to improve
distillation methods that pave the way for the resource-efficient deployment of
such models in practice. Inspired by our theoretical analysis of the
teacher-student generalization gap for IR models, we propose a novel
distillation approach that leverages the relative geometry among queries and
documents learned by the large teacher model. Unlike existing teacher
score-based distillation methods, our proposed approach employs embedding
matching tasks to provide a stronger signal to align the representations of the
teacher and student models. In addition, it utilizes query generation to
explore the data manifold to reduce the discrepancies between the student and
the teacher where training data is sparse. Furthermore, our analysis also
motivates novel asymmetric architectures for student models which realizes
better embedding alignment without increasing online inference cost. On
standard benchmarks like MSMARCO, we show that our approach successfully
distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to
1/10th size asymmetric students that can retain 95-97% of the teacher
performance
GeoYCSB: A Benchmark Framework for the Performance and Scalability Evaluation of Geospatial NoSQL Databases
The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of spatial data that data stores have to manage. Traditional relational databases reveal limitations in handling such big geospatial data, mainly due to their rigid schema requirements and limited scalability. Numerous NoSQL databases have emerged and actively serve as alternative data stores for big spatial data. This study presents a framework, called GeoYCSB, developed for benchmarking NoSQL databases with geospatial workloads. To develop GeoYCSB, we extend YCSB, a de facto benchmark framework for NoSQL systems, by integrating into its design architecture the new components necessary to support geospatial workloads. GeoYCSB supports both microbenchmarks and macrobenchmarks and facilitates the use of real datasets in both. It is extensible to evaluate any NoSQL database, provided they support spatial queries, using geospatial workloads performed on datasets of any geometric complexity. We use GeoYCSB to benchmark two leading document stores, MongoDB and Couchbase, and present the experimental results and analysis. Finally, we demonstrate the extensibility of GeoYCSB by including a new dataset consisting of complex geometries and using it to benchmark a system with a wide variety of geospatial queries: Apache Accumulo, a wide-column store, with the GeoMesa framework applied on top
Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
Automatically discovering composable abstractions from raw perceptual data is
a long-standing challenge in machine learning. Recent slot-based neural
networks that learn about objects in a self-supervised manner have made
exciting progress in this direction. However, they typically fall short at
adequately capturing spatial symmetries present in the visual world, which
leads to sample inefficiency, such as when entangling object appearance and
pose. In this paper, we present a simple yet highly effective method for
incorporating spatial symmetries via slot-centric reference frames. We
incorporate equivariance to per-object pose transformations into the attention
and generation mechanism of Slot Attention by translating, scaling, and
rotating position encodings. These changes result in little computational
overhead, are easy to implement, and can result in large gains in terms of data
efficiency and overall improvements to object discovery. We evaluate our method
on a wide range of synthetic object discovery benchmarks namely CLEVR,
Tetrominoes, CLEVRTex, Objects Room and MultiShapeNet, and show promising
improvements on the challenging real-world Waymo Open dataset.Comment: Accepted at ICML 2023. Project page: https://invariantsa.github.io
Dense Video Object Captioning from Disjoint Supervision
We propose a new task and model for dense video object captioning --
detecting, tracking, and captioning trajectories of all objects in a video.
This task unifies spatial and temporal understanding of the video, and requires
fine-grained language description. Our model for dense video object captioning
is trained end-to-end and consists of different modules for spatial
localization, tracking, and captioning. As such, we can train our model with a
mixture of disjoint tasks, and leverage diverse, large-scale datasets which
supervise different parts of our model. This results in noteworthy zero-shot
performance. Moreover, by finetuning a model from this initialization, we can
further improve our performance, surpassing strong image-based baselines by a
significant margin. Although we are not aware of other work performing this
task, we are able to repurpose existing video grounding datasets for our task,
namely VidSTG and VLN. We show our task is more general than grounding, and
models trained on our task can directly be applied to grounding by finding the
bounding box with the maximum likelihood of generating the query sentence. Our
model outperforms dedicated, state-of-the-art models for spatial grounding on
both VidSTG and VLN
Complicated objects: artifacts from the Yuanming Yuan in Victorian Britain
The 1860 spoliation of the Summer Palace at the close of the Second Opium War by British and French troops was a watershed event within the development of Britain as an imperialist nation, which guaranteed a market for opium produced in its colony India and demonstrated the power of its armed forces. The distribution of the spoils to officers and diplomatic corps by campaign leaders in Beijing was also a sign of the British Army’s rising power as an instrument of the imperialist state. These conditions would suggest that objects looted from the site would be integrated into an imperialist aesthetic that reflected and promoted the material benefits of military engagement overseas and foregrounded the circumstances of their removal to Britain for campaign members and the British public.
This study mines sources dating to the two decades following the war – including British newspapers, auction house records, exhibition catalogs and works of art – to test this hypothesis. Findings show that initial movements of looted objects through the military and diplomatic corps did reinforce notions of imperialist power by enabling campaign members to profit from the spoliation through sales of looted objects and trophy displays. However, material from the Summer Palace arrived at a moment when British manufacturers and cultural leaders were engaged in a national effort to improve the quality of British goods to compete in the international marketplace and looted art was quickly interpolated in this national conversation. Ironically, the same “free trade” imperatives that motivated the invasion energized a new design movement that embraced Chinese ornament.
As a consequence, political interpretations of the material outside of military collections were quickly joined by a strong response to Chinese ornament from cultural institutions and design leaders. Art from the Summer Palace held a prominent place at industrial art exhibitions of the postwar period and inspired new designs in a number of mediums. While the availability of Chinese imperial art was the consequence of a military invasion and therefore a product of imperialist expansion, evidence presented here shows that the design response to looted objects was not circumscribed by this political reality. Chinese ornament on imperial wares was ultimately celebrated for its formal qualities and acknowledged links to the Summer Palace were an indicator of good design, not a celebration of victory over a failed Chinese state. Therefore, the looting of the Summer Palace was ultimately an essential factor in the development of modern design, the essence of which is a break with Classical ornament
Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Subsidiary Entrepreneurial Alertness: Antecedents and Outcomes
This thesis brings together concepts from both international business and entrepreneurship to develop a framework of the facilitators of subsidiary innovation and performance. This study proposes that Subsidiary Entrepreneurial Alertness (SEA) facilitates the recognition of opportunities (the origin of subsidiary initiatives). First introduced by Kirzner (1979) in the context of the individual, entrepreneurial alertness (EA) is the ability to notice an opportunity without actively searching. Similarly, to entrepreneurial alertness at the individual level, this study argues that SEA enables the subsidiary to best select opportunities based on resources available. The research further develops our conceptualisation of SEA by drawing on work by Tang et al. (2012) identifying three distinct activities of EA: scanning and search (identifying opportunities unseen by others due to their awareness gaps), association and connection of information, and evaluation and judgement to interpret or anticipate future viability of opportunities. This study then hypothesises that SEA leads to opportunity recognition at the subsidiary level and further hypothesises innovation and performance as outcomes of opportunity recognition. This research brings these arguments together to develop and test a comprehensive theoretical model.
The theoretical model is tested through a mail survey of the CEOs/MDs of foreign subsidiaries within the Republic of Ireland (an innovative hub for foreign subsidiaries). This method was selected as the best method to reach the targeted respondent, and due to the depth of knowledge the target respondent holds, the survey can answer the desired question more substantially. The results were examined using partial least squares structural equation modelling (PLS-SEM). The study’s findings confirm two critical aspects of subsidiary context, subsidiary brokerage and subsidiary credibility are positively related to SEA. The study establishes a positive link between SEA and both the generation of innovation and the subsidiary’s performance. This thesis makes three significant contributions to the subsidiary literature as it 1) introduces and develops the concept of SEA, 2) identifies the antecedents of SEA, and 3) demonstrates the impact of SEA on subsidiary opportunity recognition. Implications for subsidiaries, headquarters and policy makers are discussed along with the limitations of the study
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
We present CrossLoc3D, a novel 3D place recognition method that solves a
large-scale point matching problem in a cross-source setting. Cross-source
point cloud data corresponds to point sets captured by depth sensors with
different accuracies or from different distances and perspectives. We address
the challenges in terms of developing 3D place recognition methods that account
for the representation gap between points captured by different sources. Our
method handles cross-source data by utilizing multi-grained features and
selecting convolution kernel sizes that correspond to most prominent features.
Inspired by the diffusion models, our method uses a novel iterative refinement
process that gradually shifts the embedding spaces from different sources to a
single canonical space for better metric learning. In addition, we present
CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of
point cloud data from both aerial and ground LiDAR scans. The point clouds in
CS-Campus3D have representation gaps and other features like different views,
point densities, and noise patterns. We show that our CrossLoc3D algorithm can
achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall
on our CS-Campus3D benchmark and achieves performance comparable to
state-of-the-art 3D place recognition method on the Oxford RobotCar. We will
release the code and CS-Campus3D benchmark
Deep Unrestricted Document Image Rectification
In recent years, tremendous efforts have been made on document image
rectification, but existing advanced algorithms are limited to processing
restricted document images, i.e., the input images must incorporate a complete
document. Once the captured image merely involves a local text region, its
rectification quality is degraded and unsatisfactory. Our previously proposed
DocTr, a transformer-assisted network for document image rectification, also
suffers from this limitation. In this work, we present DocTr++, a novel unified
framework for document image rectification, without any restrictions on the
input distorted images. Our major technical improvements can be concluded in
three aspects. Firstly, we upgrade the original architecture by adopting a
hierarchical encoder-decoder structure for multi-scale representation
extraction and parsing. Secondly, we reformulate the pixel-wise mapping
relationship between the unrestricted distorted document images and the
distortion-free counterparts. The obtained data is used to train our DocTr++
for unrestricted document image rectification. Thirdly, we contribute a
real-world test set and metrics applicable for evaluating the rectification
quality. To our best knowledge, this is the first learning-based method for the
rectification of unrestricted document images. Extensive experiments are
conducted, and the results demonstrate the effectiveness and superiority of our
method. We hope our DocTr++ will serve as a strong baseline for generic
document image rectification, prompting the further advancement and application
of learning-based algorithms. The source code and the proposed dataset are
publicly available at https://github.com/fh2019ustc/DocTr-Plus
- …