3,297 research outputs found
Optimizing E-Commerce Product Classification Using Transfer Learning
The global e-commerce market is snowballing at a rate of 23% per year. In 2017, retail e-commerce users were 1.66 billion and sales worldwide amounted to 2.3 trillion US dollars, and e-retail revenues are projected to grow to 4.88 trillion USD in 2021. With the immense popularity that e-commerce has gained over past few years comes the responsibility to deliver relevant results to provide rich user experience. In order to do this, it is essential that the products on the ecommerce website be organized correctly into their respective categories. Misclassification of products leads to irrelevant results for users which not just reflects badly on the website, it could also lead to lost customers. With ecommerce sites nowadays providing their portal as a platform for third party merchants to sell their products as well, maintaining a consistency in product categorization becomes difficult. Therefore, automating this process could be of great utilization. This task of automation done on the basis of text could lead to discrepancies since the website itself, its various merchants, and users, all could use different terminologies for a product and its category. Thus, using images becomes a plausible solution for this problem. Dealing with images can best be done using deep learning in the form of convolutional neural networks. This is a computationally expensive task, and in order to keep the accuracy of a traditional convolutional neural network while reducing the hours it takes for the model to train, this project aims at using a technique called transfer learning. Transfer learning refers to sharing the knowledge gained from one task for another where new model does not need to be trained from scratch in order to reduce the time it takes for training. This project aims at using various product images belonging to five categories from an ecommerce platform and developing an algorithm that can accurately classify products in their respective categories while taking as less time as possible. The goal is to first test the performance of transfer learning against traditional convolutional networks. Then the next step is to apply transfer learning to the downloaded dataset and assess its performance on the accuracy and time taken to classify test data that the model has never seen before
Data management for production quality deep learning models: Challenges and solutions
Deep learning (DL) based software systems are difficult to develop and maintain in industrial settings due to several challenges. Data management is one of the most prominent challenges which complicates DL in industrial deployments. DL models are data-hungry and require high-quality data. Therefore, the volume, variety, velocity, and quality of data cannot be compromised. This study aims to explore the data management challenges encountered by practitioners developing systems with DL components, identify the potential solutions from the literature and validate the solutions through a multiple case study. We identified 20 data management challenges experienced by DL practitioners through a multiple interpretive case study. Further, we identified 48 articles through a systematic literature review that discuss the solutions for the data management challenges. With the second round of multiple case study, we show that many of these solutions have limitations and are not used in practice due to a combination of four factors: high cost, lack of skill-set and infrastructure, inability to solve the problem completely, and incompatibility with certain DL use cases. Thus, data management for data-intensive DL models in production is complicated. Although the DL technology has achieved very promising results, there is still a significant need for further research in the field of data management to build high-quality datasets and streams that can be used for building production-ready DL systems. Furthermore, we have classified the data management challenges into four categories based on the availability of the solutions.(c) 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Information Extraction on Para-Relational Data.
Para-relational data (such as spreadsheets and diagrams) refers to a type of nearly
relational data that shares the important qualities of relational data but does not
present itself in a relational format. Para-relational data often conveys highly valuable
information and is widely used in many different areas. If we can convert para-relational
data into the relational format, many existing tools can be leveraged for a
variety of interesting applications, such as data analysis with relational query systems
and data integration applications.
This dissertation aims to convert para-relational data into a high-quality relational
form with little user assistance. We have developed four standalone systems, each
addressing a specific type of para-relational data. Senbazuru is a prototype spreadsheet
database management system that extracts relational information from a large
number of spreadsheets. Anthias is an extension of the Senbazuru system to convert
a broader range of spreadsheets into a relational format. Lyretail is an extraction
system to detect long-tail dictionary entities on webpages. Finally, DiagramFlyer is
a web-based search system that obtains a large number of diagrams automatically
extracted from web-crawled PDFs. Together, these four systems demonstrate that
converting para-relational data into the relational format is possible today, and also
suggest directions for future systems.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120853/1/chenzhe_1.pd
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
We introduce VOCALExplore, a system designed to support users in building
domain-specific models over video datasets. VOCALExplore supports interactive
labeling sessions and trains models using user-supplied labels. VOCALExplore
maximizes model quality by automatically deciding how to select samples based
on observed skew in the collected labels. It also selects the optimal video
representations to use when training models by casting feature selection as a
rising bandit problem. Finally, VOCALExplore implements optimizations to
achieve low latency without sacrificing model performance. We demonstrate that
VOCALExplore achieves close to the best possible model quality given candidate
acquisition functions and feature extractors, and it does so with low visible
latency (~1 second per iteration) and no expensive preprocessing
Biomedical Information Extraction Pipelines for Public Health in the Age of Deep Learning
abstract: Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201
Indexing, learning and content-based retrieval for special purpose image databases
This chapter deals with content-based image retrieval in special purpose image databases. As image data is amassed ever more effortlessly, building efficient systems for searching and browsing of image databases becomes increasingly urgent. We provide an overview of the current state-of-the art by taking a tour along the entir
Review: Artificial Intelligence for Liquid-Vapor Phase-Change Heat Transfer
Artificial intelligence (AI) is shifting the paradigm of two-phase heat
transfer research. Recent innovations in AI and machine learning uniquely offer
the potential for collecting new types of physically meaningful features that
have not been addressed in the past, for making their insights available to
other domains, and for solving for physical quantities based on first
principles for phase-change thermofluidic systems. This review outlines core
ideas of current AI technologies connected to thermal energy science to
illustrate how they can be used to push the limit of our knowledge boundaries
about boiling and condensation phenomena. AI technologies for meta-analysis,
data extraction, and data stream analysis are described with their potential
challenges, opportunities, and alternative approaches. Finally, we offer
outlooks and perspectives regarding physics-centered machine learning,
sustainable cyberinfrastructures, and multidisciplinary efforts that will help
foster the growing trend of AI for phase-change heat and mass transfer
- …