2,814 research outputs found

    Towards Comparative Web Content Mining using Object Oriented Model

    Get PDF
    Web content data are heterogeneous in nature; usually composed of different types of contents and data structure. Thus, extraction and mining of web content data is a challenging branch of data mining. Traditional web content extraction and mining techniques are classified into three categories: programming language based wrappers, wrapper (data extraction program) induction techniques, and automatic wrapper generation techniques. First category constructs data extraction system by providing some specialized pattern specification languages, second category is a supervised learning, which learns data extraction rules and third category is automatic extraction process. All these data extraction techniques rely on web document presentation structures, which need complicated matching and tree alignment algorithms, routine maintenance, hard to unify for vast variety of websites and fail to catch heterogeneous data together. To catch more diversity of web documents, a feasible implementation of an automatic data extraction technique based on object oriented data model technique, 00Web, had been proposed in Annoni and Ezeife (2009). This thesis implements, materializes and extends the structured automatic data extraction technique. We developed a system (called WebOMiner) for extraction and mining of structured web contents based on object-oriented data model. Thesis extends the extraction algorithms proposed by Annoni and Ezeife (2009) and develops an automata based automatic wrapper generation algorithm for extraction and mining of structured web content data. Our algorithm identifies data blocks from flat array data structure and generates Non-Deterministic Finite Automata (NFA) pattern for different types of content data for extraction. Objective of this thesis is to extract and mine heterogeneous web content and relieve the hard effort of matching, tree alignment and routine maintenance. Experimental results show that our system is highly effective and it performs the mining task with 100% precision and 96.22% recall value

    C2Ideas: Supporting Creative Interior Color Design Ideation with Large Language Model

    Full text link
    Interior color design is a creative process that endeavors to allocate colors to furniture and other elements within an interior space. While much research focuses on generating realistic interior designs, these automated approaches often misalign with user intention and disregard design rationales. Informed by a need-finding preliminary study, we develop C2Ideas, an innovative system for designers to creatively ideate color schemes enabled by an intent-aligned and domain-oriented large language model. C2Ideas integrates a three-stage process: Idea Prompting stage distills user intentions into color linguistic prompts; Word-Color Association stage transforms the prompts into semantically and stylistically coherent color schemes; and Interior Coloring stage assigns colors to interior elements complying with design principles. We also develop an interactive interface that enables flexible user refinement and interpretable reasoning. C2Ideas has undergone a series of indoor cases and user studies, demonstrating its effectiveness and high recognition of interactive functionality by designers.Comment: 26 pages, 11 figure

    Mixed Reality Interiors: Exploring Augmented Reality Immersive Space Planning Design Archetypes for the Creation of Interior Spatial Volume 3D User Interfaces

    Get PDF
    Augmented reality is an increasingly relevant medium of interaction and media reception with the advances in user worn or hand-held input/output technologies endowing perception of the digital nested within and reactive to the native physical. Our interior spaces are becoming the media interface and this emergence affords designers the opportunity to delve further into crafting an aesthetics for the medium. Beyond having the virtual assets and applications in correct registration with the real-world environment, critical topics are addressed such as the compositional roles of virtual and physical design features including their purpose, modulation, interaction potentials and implementation into varying indoor settings. Examining and formulating methodologies for mixed reality interior 3D UI schemes derived from the convergence of digital media and interior design disciplines comprise the scope of this design research endeavor. A holistic approach is investigated to produce a framework for augmented reality 3D user interface interiors through research and development of pattern language systems for the balanced blending of complimentary digital and physical design elements. These foundational attributes serve in the creation, organization and exploration of interactive possibilities and implications of these hybrid futuristic spatial interface layouts.M.S., Digital Media -- Drexel University, 201

    Mining Multiple Web Sources Using Non-Deterministic Finite State Automata

    Get PDF
    Existing web content extracting systems use unsupervised, supervised, and semi-supervised approaches. The WebOMiner system is an automatic web content data extraction system which models a specific Business to Customer (B2C) web site such as bestbuy.com using object oriented database schema. WebOMiner system extracts different web page content types like product, list, text using non deterministic finite automaton (NFA) generated manually. This thesis extends the automatic web content data extraction techniques proposed in the WebOMiner system to handle multiple web sites and generate integrated data warehouse automatically. We develop the WebOMiner-2 which generates NFA of specific domain classes from regular expressions extracted from web page DOM trees\u27 frequent patterns. Our algorithm can also handle NFA epsilon([varepsilon]) transition and convert it to deterministic finite automata (DFA) to identify different content tuples from list of tuples. Experimental results show that our system is highly effective and performs the content extraction task with 100% precision and 98.35% recall value

    Usefulness of social tagging in organizing and providing access to the web: An analysis of indexing consistency and quality

    Get PDF
    This dissertation research points out major challenging problems with current Knowledge Organization (KO) systems, such as subject gateways or web directories: (1) the current systems use traditional knowledge organization systems based on controlled vocabulary which is not very well suited to web resources, and (2) information is organized by professionals not by users, which means it does not reflect intuitively and instantaneously expressed users’ current needs. In order to explore users’ needs, I examined social tags which are user-generated uncontrolled vocabulary. As investment in professionally-developed subject gateways and web directories diminishes (support for both BUBL and Intute, examined in this study, is being discontinued), understanding characteristics of social tagging becomes even more critical. Several researchers have discussed social tagging behavior and its usefulness for classification or retrieval; however, further research is needed to qualitatively and quantitatively investigate social tagging in order to verify its quality and benefit. This research particularly examined the indexing consistency of social tagging in comparison to professional indexing to examine the quality and efficacy of tagging. The data analysis was divided into three phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of tag attributes. Most indexing consistency studies have been conducted with a small number of professional indexers, and they tended to exclude users. Furthermore, the studies mainly have focused on physical library collections. This dissertation research bridged these gaps by (1) extending the scope of resources to various web documents indexed by users and (2) employing the Information Retrieval (IR) Vector Space Model (VSM) - based indexing consistency method since it is suitable for dealing with a large number of indexers. As a second phase, an analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. Finally, to investigate tagging pattern and behaviors, a content analysis on tag attributes was conducted based on the FRBR model. The findings revealed that there was greater consistency over all subjects among taggers compared to that for two groups of professionals. The analysis of tagging exhaustivity and tag specificity in relation to tagging effectiveness was conducted to ameliorate difficulties associated with limitations in the analysis of indexing consistency based on only the quantitative measures of vocabulary matching. Examination of exhaustivity and specificity of social tags provided insights into particular characteristics of tagging behavior and its variation across subjects. To further investigate the quality of tags, a Latent Semantic Analysis (LSA) was conducted to determine to what extent tags are conceptually related to professionals’ keywords and it was found that tags of higher specificity tended to have a higher semantic relatedness to professionals’ keywords. This leads to the conclusion that the term’s power as a differentiator is related to its semantic relatedness to documents. The findings on tag attributes identified the important bibliographic attributes of tags beyond describing subjects or topics of a document. The findings also showed that tags have essential attributes matching those defined in FRBR. Furthermore, in terms of specific subject areas, the findings originally identified that taggers exhibited different tagging behaviors representing distinctive features and tendencies on web documents characterizing digital heterogeneous media resources. These results have led to the conclusion that there should be an increased awareness of diverse user needs by subject in order to improve metadata in practical applications. This dissertation research is the first necessary step to utilize social tagging in digital information organization by verifying the quality and efficacy of social tagging. This dissertation research combined both quantitative (statistics) and qualitative (content analysis using FRBR) approaches to vocabulary analysis of tags which provided a more complete examination of the quality of tags. Through the detailed analysis of tag properties undertaken in this dissertation, we have a clearer understanding of the extent to which social tagging can be used to replace (and in some cases to improve upon) professional indexing

    A case study from on-road load handling

    Get PDF
    This thesis is a research exploration into the commercial viability of advanced driver assistance services in the load handling industry, which eventually enable automation and autonomous activity. Commercial viability is important for understanding the rate of change as well as the capability for digital transformation in the industry. Ongoing trends support the addition of technology onto hardware; this research seeks to understand whether this is commercially viable. A case study method is used to deep-dive into the practicality from on-road load handling and the construction of advanced driver assistance services through the installation of software systems and external hardware. The case study company is a large original equipment manufacturer in the industry. A design science methodology is constructed and used, with the case study providing a research artifact for exploration. Main results conclude that technology costs are greater than the monetary value created for customers, stating that in the short-term advanced driver assistance services are not commercially viable. However, trends such as urbanization, digitalization and the declining skills of drivers in the load handling industry support the long-term vision of automation and autonomous activity; in addition, there is a strong demand pull from the customers for increased automation, which also support the construction of driver assistance services

    A 3D Digital Approach to the Stylistic and Typo-Technological Study of Small Figurines from Ayia Irini, Cyprus

    Get PDF
    The thesis aims to develop a 3D digital approach to the stylistic and typo-technological study of coroplastic, focusing on small figurines. The case study to test the method is a sample of terracotta statuettes from an assemblage of approximately 2000 statues and figurines found at the beginning of the 20th century in a rural open-air sanctuary at Ayia Irini (Cyprus) by the archaeologists of the Swedish Cyprus Expedition. The excavators identified continuity of worship at the sanctuary from the Late Cypriot III (circa 1200 BC) to the end of the Cypro-Archaic II period (ca. 475 BC). They attributed the small figurines to the Cypro-Archaic I-II. Although the excavation was one of the first performed through the newly established stratigraphic method, the archaeologists studied the site and its material following a traditional, merely qualitative approach. Theanalysis of the published results identified a classification of the material with no-clear-cut criteria, and their overlap between types highlights ambiguities in creating groups and classes. Similarly, stratigraphic arguments and different opinions among archaeologists highlight the need for revising. Moreover, pastlegislation allowed the excavators to export half of the excavated antiquities, creating a dispersion of the assemblage. Today, the assemblage is still partly exhibited at the Cyprus Museum in Nicosia and in four different museums in Sweden. Such a setting prevents to study, analyse and interpret the assemblageholistically. This research proposes a 3D chaîne opératoire methodology to study the collection’s small terracotta figurines, aiming to understand the context’s function and social role as reflected by the classification obtained with the 3D digital approach. The integration proposed in this research of traditional archaeological studies, and computer-assisted investigation based on quantitative criteria, identified and defined with 3D measurements and analytical investigations, is adopted as a solution to the biases of a solely qualitative approach. The 3D geometric analysis of the figurines focuses on the objects’ shape and components, mode of manufacture, level of expertise, specialisation or skills of the craftsman and production techniques. The analysis leads to the creation of classes of artefacts which allow archaeologists to formulate hypotheses on the production process, identify a common production (e.g., same hand, same workshop) and establish a relative chronological sequence. 3D reconstruction of the excavation’s area contributes to the virtual re-unification of the assemblage for its holistic study, the relative chronological dating of the figurines and the interpretation of their social and ritual purposes. The results obtained from the selected sample prove the efficacy of the proposed 3D approach and support the expansion of the analysis to the whole assemblage, and possibly initiate quantitative and systematic studies on Cypriot coroplastic production

    Semantic annotation services for 3D models of cultural heritage artefacts

    Get PDF
    • …
    corecore