Search CORE

816 research outputs found

Deep Learning-based Object Detection Models applied to Document Images

Author: Ziran Zahra
Publication venue
Publication date: 01/01/2020
Field of study

Relational models for visual understanding of graphical documents. Application to architectural drawings

Author: Centre de Visió per Computador
de las Heras Lluís-Pere
Universitat Autònoma de Barcelona. Departament de Ciències de la Computació
Publication venue: [Barcelona] : Universitat Autònoma de Barcelona,
Publication date: 01/01/2015
Field of study

Els documents gráfics són documents que expressen continguts semántics utilitzant majoritáriament un llenguatge visual. Aquest llenguatge está format per un vocabulari (símbols) i una sintaxi (relacions estructurals entre els símbols) que conjuntament manifesten certs conceptes en un context determinat. Per tant, la interpretació dun document gráfic per part dun ordinador implica tres fases. (1) Ha de ser capadçe detectar automáticament els símbols del document. (2) Ha de ser capadç extreure les relacions estructurals entre aquests símbols. I (3), ha de tenir un model del domini per tal poder extreure la semántica. Exemples de documents gráfics de diferents dominis són els planells darquitectural i d'enginyeria, mapes, diagrames de flux, etc. El Reconeixement de Gráfics, dintre de lárea de recerca de Análisi de Documents, neix de la necessitat de la indústria dinterpretar la gran quantitat de documents gráfics digitalitzats a partir de laparició de lescáner. Tot i que molts anys han passat daquests inicis, el problema de la interpretació automática de documents sembla encara estar lluny de ser solucionat. Básicament, aquest procés sha alentit per una raó principal: la majoria dels sistemes dinterpretació que han estat presentats per la comunitat són molt centrats en una problemática específica, en el que el domini del document marca clarament la implementació del mètode. Per tant, aquests mètodes són difícils de ser reutilitzats en daltres dades i marcs daplicació, estancant així la seva adopció i evolució en favor del progrés. En aquesta tesi afrontem el problema de la interpretació automática de documents gráfics a partir dun seguit de models relacionals que treballen a tots els nivells del problema, i que han estat dissenyats des dun punt de vista genèric per tal de que puguin ser adaptats a diferents dominis. Per una part, presentem 3 mètodes diferents per a lextracció dels símbols en un document. El primer tracta el problema des dun punt de vista estructural, en el que el coneixement general de lestructura dels símbols permet trobar-los independentment de la seva aparença. El segon és un mètode estad ístic que aprèn laparença dels símbols automáticament i que, per tant, sadapta a la gran variabilitat del problema. Finalment, el tercer mètode és una combinació dambdós, heretant els beneficis de cadascun dels mètodes. Aquesta tercera implementaci ó no necessita de un aprenentatge previ i a més sadapta fácilment a múltiples notacions gráfiques. D'altra banda, presentem dos mètodes per a la extracció del context visuals. El primer mètode segueix una estratègia bottom-up que cerca les relacions estructurals en una representació de graf mitjançant algorismes dintel_ligència artificial. La segona en canvi, és un mètode basat en una gramática que mitjançant un model probabilístic aprèn automáticament lestructura dels planells. Aquest model guia la interpretació del document amb certa independència de la implementació algorísmica. Finalment, hem definit una base del coneixement fent confluir una definició ontol'ogica del domini amb dades reals. Aquest model ens permet raonar les dades des dun punt de vista contextual i trobar inconsistències semántiques entre les dades. Leficiència daquetes contribucions han estat provades en la interpretació de planells darquitectura. Aquest documents no tenen un estándard establert i la seva notació gráfica i inclusió dinformació varia de planell a planell. Per tant, és un marc rellevant del problema de reconeixement gráfic. A més, per tal de promoure la recerca en termes de interpretació de documents gráfics, fem públics tant les dades, leina per generar les dades i els evaluadors del rendiment.Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations among symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation of these sort of documents by computers entails three main steps: the detection of the symbols, the extraction of the structural relations among these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Different domains in graphical documents include: architectural and engineering drawings, maps, flowcharts, etc. Graphics Recognition in particular and Document Image Analysis in general are born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on a very specific problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is difficult to reuse these strategies on different data and on different contexts, hindering thus the natural progress in the field. In this thesis, we face the graphical document understanding problem by proposing several relational models at different levels that are designed from a generic perspective. Firstly, we introduce three different strategies for the detection of symbols. The first method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two inheriting their respective strengths, i.e. copes the big variability and does not need of annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The first one is a full bottom up method that heuristically searches in a graph representation the contextual relations among symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to counter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological definition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents, there exists an enormous notation variability and the sort of information included in the documents also varies from plan to plan. Therefore, floor plan understanding is a relevant task in the graphical document understanding problem. It is also worth to mention that, we make freely available all the resources used in this thesis (the data, the tool used to generate the data, and the evaluation scripts) aiming at fostering the research in graphical document understanding task

Diposit Digital de Documents de la UAB

Semantic Interior Mapology: A Toolbox For Indoor Scene Description From Architectural Floor Plans

Author: Manduchi Roberto
Trinh Viet
Publication venue
Publication date: 01/07/2019
Field of study

We introduce the Semantic Interior Mapology (SIM) toolbox for the conversion of a floor plan and its room contents (such as furnitures) to a vectorized form. The toolbox is composed of the Map Conversion toolkit and the Map Population toolkit. The Map Conversion toolkit allows one to quickly trace the layout of a floor plan, and to generate a GeoJSON file that can be rendered in 3D using web applications such as Mapbox. The Map Population toolkit takes the 3D scan of a room in the building (acquired from an RGB-D camera), and, through a semi-automatic process, populates individual objects of interest with a correct dimension and position in the GeoJSON representation of the building. SIM is easy to use and produces accurate results even in the case of complex building layouts.Comment: 9 pages, 12 figure

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Recommended from our members

Prioritization of responsive maintenance tasks via machine learning-based inference

Author: Broom C
Konstantinou E
Parlikad AK
Wong A
Publication venue: International Conference on Smart Infrastructure and Construction 2019, ICSIC 2019: Driving Data-Informed Decision-Making
Publication date: 01/01/2019
Field of study

Maintenance task prioritization is essential for allocating resources. It is estimated that almost 1/3 of the maintenance cost is wasted to unnecessary activities. Task prioritization is based on risk assessment that takes into account the probability of failure and the criticality of an asset. The criticality analysis is defined by the asset owner based on several parameters, among them safety, downtime cost, productivity, whilst the probability of failure is determined based on deterioration models, regular manual inspections, or installed sensors. Currently, the latter is an extremely complicated and labour intensive procedure, when multiple and different types of assets need to be managed. This paper proposes an innovative method that exploits the advances in mobile communications, social networking, Internet of Things and machine learning to address this shortcoming. This approach brings building elements and assets online using asset tags with an online ‘asset profile’ linked to it. Users of assets are able to scan these tags using a mobile phone app to not only see the information about those assets, but also enter ‘comments’ describing issues and problems on the profiles. These comments are processed through machine learning-based inference methods to estimate the probability that a failure has occurred. This paper validates the proposed method using historical data collected from the Estate Management, of the University of CambridgeInnovate U

Apollo (Cambridge)

Development and Adaptation of Robotic Vision in the Real-World: the Challenge of Door Detection

Author: Antonazzi Michele
Basilico Nicola
Borghese N. Alberto
Luperto Matteo
Publication venue
Publication date: 31/01/2024
Field of study

Mobile service robots are increasingly prevalent in human-centric, real-world domains, operating autonomously in unconstrained indoor environments. In such a context, robotic vision plays a central role in enabling service robots to perceive high-level environmental features from visual observations. Despite the data-driven approaches based on deep learning push the boundaries of vision systems, applying these techniques to real-world robotic scenarios presents unique methodological challenges. Traditional models fail to represent the challenging perception constraints typical of service robots and must be adapted for the specific environment where robots finally operate. We propose a method leveraging photorealistic simulations that balances data quality and acquisition costs for synthesizing visual datasets from the robot perspective used to train deep architectures. Then, we show the benefits in qualifying a general detector for the target domain in which the robot is deployed, showing also the trade-off between the effort for obtaining new examples from such a setting and the performance gain. In our extensive experimental campaign, we focus on the door detection task (namely recognizing the presence and the traversability of doorways) that, in dynamic settings, is useful to infer the topology of the map. Our findings are validated in a real-world robot deployment, comparing prominent deep-learning models and demonstrating the effectiveness of our approach in practical settings

arXiv.org e-Print Archive

A-Scan2BIM: Assistive Scan to Building Information Modeling

Author: Cheng Chin-Yi
Fu Yan
Furukawa Yasutaka
Luo Jieliang
Song Weilian
Zhao Dale
Publication venue
Publication date: 29/11/2023
Field of study

This paper proposes an assistive system for architects that converts a large-scale point cloud into a standardized digital representation of a building for Building Information Modeling (BIM) applications. The process is known as Scan-to-BIM, which requires many hours of manual work even for a single building floor by a professional architect. Given its challenging nature, the paper focuses on helping architects on the Scan-to-BIM process, instead of replacing them. Concretely, we propose an assistive Scan-to-BIM system that takes the raw sensor data and edit history (including the current BIM model), then auto-regressively predicts a sequence of model editing operations as APIs of a professional BIM software (i.e., Autodesk Revit). The paper also presents the first building-scale Scan2BIM dataset that contains a sequence of model editing operations as the APIs of Autodesk Revit. The dataset contains 89 hours of Scan2BIM modeling processes by professional architects over 16 scenes, spanning over 35,000 m^2. We report our system's reconstruction quality with standard metrics, and we introduce a novel metric that measures how natural the order of reconstructed operations is. A simple modification to the reconstruction module helps improve performance, and our method is far superior to two other baselines in the order metric. We will release data, code, and models at a-scan2bim.github.io.Comment: BMVC 2023, order evaluation updated after fixing evaluation bu

arXiv.org e-Print Archive

Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries

Author: Engelmann Francis
Kontogianni Theodora
Schindler Konrad
Yue Yuanwen
Publication venue
Publication date: 27/03/2023
Field of study

We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models are available at: https://github.com/ywyue/RoomFormer.Comment: CVPR 2023 camera-ready. Project page: https://ywyue.github.io/RoomForme

arXiv.org e-Print Archive

Guidance on the Stand Down, Mothball, and Reactivation of Ground Test Facilities

Author: Dunn Steven C.
Volkman Gregrey T.
Publication venue
Publication date: 07/01/2013
Field of study

The development of aerospace and aeronautics products typically requires three distinct types of testing resources across research, development, test, and evaluation: experimental ground testing, computational "testing" and development, and flight testing. Over the last twenty plus years, computational methods have replaced some physical experiments and this trend is continuing. The result is decreased utilization of ground test capabilities and, along with market forces, industry consolidation, and other factors, has resulted in the stand down and oftentimes closure of many ground test facilities. Ground test capabilities are (and very likely will continue to be for many years) required to verify computational results and to provide information for regimes where computational methods remain immature. Ground test capabilities are very costly to build and to maintain, so once constructed and operational it may be desirable to retain access to those capabilities even if not currently needed. One means of doing this while reducing ongoing sustainment costs is to stand down the facility into a "mothball" status - keeping it alive to bring it back when needed. Both NASA and the US Department of Defense have policies to accomplish the mothball of a facility, but with little detail. This paper offers a generic process to follow that can be tailored based on the needs of the owner and the applicable facility

CiteSeerX

NASA Technical Reports Server

Artificial intelligence and smart vision for building and construction 4.0: Machine and deep learning methods and applications

Author: Arashpour Mehrdad
Baduge Shanaka Kristombu
Mendis Priyan
Perera Jude Shalitha
Sharafi Pejman
Shringi Ankit
Teodosio Bertrand
Thilakarathna Sadeep
Publication venue: Elsevier
Publication date: 24/06/2022
Field of study

This article presents a state-of-the-art review of the applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) in building and construction industry 4.0 in the facets of architectural design and visualization; material design and optimization; structural design and analysis; offsite manufacturing and automation; construction management, progress monitoring, and safety; smart operation, building management and health monitoring; and durability, life cycle analysis, and circular economy. This paper presents a unique perspective on applications of AI/DL/ML in these domains for the complete building lifecycle, from conceptual stage, design stage, construction stage, operational and maintenance stage until the end of life. Furthermore, data collection strategies using smart vision and sensors, data cleaning methods (post-processing), data storage for developing these models are discussed, and the challenges in model development and strategies to overcome these challenges are elaborated. Future trends in these domains and possible research avenues are also presented

Victoria University Eprints Repository

A FIRE PROTECTION AND LIFE SAFETY ANALYSIS OF AN OUPATIENT HEALTHCARE SYSTEM BUILDING

Author: Chandler Conrad
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2023
Field of study

The subject building of the report is an outpatient healthcare system building that is a two stories in height and 184,670 SF building. This building offers primary, specialty, and mental health outpatient care to patients throughout its state of location. The facility is a mixed occupancy building with business as the primary occupancy, a surgery suite that classifies as an ambulatory health care occupancy, and assembly uses in the conference rooms, kitchen, and canteen. The Authority Having Jurisdiction (AHJ) has adopted the National Fire Codes (NFC) published by the National Fire Protection Association (NFPA), and throughout this report, the life safety features of the building are assessed against the requirements of Life Safety Code, NFPA 101. The facility is fully sprinklered, with sprinklers appearing to be provided in all areas. There is also an analog addressable fire alarm system that is electrically supervised by a central station monitoring service. Per NFPA 220, Standard on Types of Building Construction, Table 4.1.1 Fire Resistance Ratings for Type I through Type V Construction (hour), this facility appears to be constructed in accordance with the requirements of a Construction Type II (000) rating. The NFPA Type II (000) rating corresponds to an IBC Construction Type IIB. In the following analysis, the facility was evaluated from both prescriptive and performance based design perspectives. The total occupant load for the facility was calculated as 3390 persons in accordance with NFPA 101 Chapter 7. Most spaces were determined to have adequate exit capacity. However, the canteen only has one valid exit and does not meet the required two exits per NFPA 101 for assembly occupancies. All other floors and spaces were determined to have adequate number of exits, separation of exits, and measured travel distances as required by NFPA 101. Code discrepancies were also discovered for fire detection and notification. A discrepancy was discovered between the mounting heights of manual pull stations required by NFPA 72 and those of the fire alarm and detection shop drawings. Pull station placement should be verified. Notification devices in the mechanical penthouses appear to be undersized from a visual notification perspective. Further, audible notification devices in these mechanical penthouses may also be undersized. Field verification of the existing ambient sound levels should be performed. The facility is fully sprinklered, with sprinklers appearing to be provided in all areas. Most of the facility is protected by an automatic wet sprinkler suppression system. There is a small dry sprinkler system located at the loading dock, where the system is subject to freezing conditions. The system appears to have been designed per the AHJ’s Fire Protection Design Manual and NFPA 13-2003. The flow and pressure at the base of the riser (BOR) required to meet the sprinkler system demand is 273.3 gpm and 67.6 psi. The hose stream allowance was previously determined to be 250 gpm. Therefore, the total system demand is 523 gpm at 67.6 PSI. This value exceeds the available water supply of a static pressure of 62 psi, a residual pressure of 20 psi, and 1940 gallons of flow. A computer based analysis should be performed to refine the understanding of the complex hydraulics at the facility. The first floor occupant load was calculated to be 839 persons. Using the hydraulic approximation, the egress time for the first floor was evaluated. If all of the 839 occupants on the first floor start evacuation at the same time, the persons on the first floor will require approximately 1.23 minutes to pass through the exit. The total minimum evacuation time for the 839 persons located on floor 1 is estimated at 5.1 minutes. The second floor occupant load was calculated to be 1210 persons. The second floor exit capacity was calculated to be 1231 persons, which just exceeds the second floor occupant load of 1210 persons. The total minimum evacuation time for the 1231 persons located on floor 2 is estimated at 8.2 minutes. The assumptions used in hydraulic approximation model all tend to optimize egress times and therefore will tend to underestimate actual egress times. The occupant characteristics of these user groups within the facility’s building population were reviewed, and the key characteristics of the groups were evaluated. Since the purpose of this outpatient clinic is to provide medical care to patients, a conservative approach is necessary to protect occupants that may have preexisting medical conditions. Employees regularly participate in fire drills and can typically be expected to efficiently respond to the fire alarm system and start evacuating. However, careful consideration of pre-movement times is especially important with employees as they can be prone to social influence, and procedural requirements. Patients are the most likely to have an issue perceiving an alarm, interpreting the alarm, and deciding on a course of action. Three different design fires were evaluated for this facility. Design Fire #1 investigates the impact of large fuel load of palletized computer equipment on egress in a first floor corridor. Egress is expected to be highly compromised. This design fire is similar to NFPA 101 5.5.3.2 Design Fire Scenario 2 which has the characteristics of an ultrafast developing fire, in the primary means of egress. Design Fire #2 investigates the impact of a Christmas Tree in the building’s main atrium. This fire offers the opportunity to evaluate the impact of a real life fuel load on one of the primary egress paths. This design fire is similar to NFPA 101 5.5.3.1 Design Fire Scenario 1 and is an occupancy specific fire representative of a typical fire for the occupancy. Design Fire #3 evaluates the impact of a large fuel load of furniture in a storage room that is adjacent to the facilities 6 combinable conference rooms. The worst-case scenario for this space is the potential for migration into the adjacent hallway and affecting egress for the nearby conference rooms. This design fire is similar to NFPA 101 5.5.3.3 Design Fire Scenario 3 which includes a fire that starts in a normally unoccupied room, potentially endangering a large number of occupants in a large room or other area. Performance criteria for tenability was investigated, and reference values were proposed. The selected tenability criteria include 13 m for visibility, an FED of 1 for Carbon Monoxide, 60 °C for exposure temperature, and 1.7 kW·m-2 for radiant heat exposure. Fire Dynamics Simulator (FDS) was used to model Design Fire #1, which presented an abnormally large fuel load of computer equipment in a hallway outside of the Supply Chain Management office. This fire provides an ultrafast developing fire, in the primary means of egress, and addresses a concern regarding a reduction in the number of available means of egress. Visibility is the first tenability criteria to be reached in a time frame of 92 seconds, followed by exposure temperature at 105 seconds. The reality of this ultrafast fire is that egress for the Supply Chain Management Office will be severely compromised, and may not provide ample time for the occupants of the Supply Chain Management Office to escape. Further modeling could be performed with additional information on the building\u27s construction materials, ventilation systems, and fire suppression systems. The response of the fire suppression system, and its effectiveness on the fuel load should be evaluated and could potentially help egress from the Supply Chain Management Office. Since the calculated Required Safe Egress Time (RSET) is calculated 12.72 minutes, and the Available Safe Egress Time (ASET) is 92 seconds, egress for the Supply Chain Management Office can be expected to be compromised. Based on the results of modeling Design Fire #1, it is recommended to relocate the commodity to a warehouse. The surveyed fuel load is inappropriate for an exit corridor in a Business Occupancy

DigitalCommons@CalPoly