70 research outputs found
A comparative study on polyp classification using convolutional neural networks
This work is licensed under a Creative Commons Attribution 4.0 International License.Colorectal cancer is the third most common cancer diagnosed in both men and women in the United States. Most colorectal cancers start as a growth on the inner lining of the colon or rectum, called ‘polyp’. Not all polyps are cancerous, but some can develop into cancer. Early detection and recognition of the type of polyps is critical to prevent cancer and change outcomes. However, visual classification of polyps is challenging due to varying illumination conditions of endoscopy, variant texture, appearance, and overlapping morphology between polyps. More importantly, evaluation of polyp patterns by gastroenterologists is subjective leading to a poor agreement among observers. Deep convolutional neural networks have proven very successful in object classification across various object categories. In this work, we compare the performance of the state-of-the-art general object classification models for polyp classification. We trained a total of six CNN models end-to-end using a dataset of 157 video sequences composed of two types of polyps: hyperplastic and adenomatous. Our results demonstrate that the state-of-the-art CNN models can successfully classify polyps with an accuracy comparable or better than reported among gastroenterologists. The results of this study can guide future research in polyp classification.University of Kansas grant (2228901
Large Language Models for Intent-Driven Session Recommendations
Intent-aware session recommendation (ISR) is pivotal in discerning user
intents within sessions for precise predictions. Traditional approaches,
however, face limitations due to their presumption of a uniform number of
intents across all sessions. This assumption overlooks the dynamic nature of
user sessions, where the number and type of intentions can significantly vary.
In addition, these methods typically operate in latent spaces, thus hinder the
model's transparency.Addressing these challenges, we introduce a novel ISR
approach, utilizing the advanced reasoning capabilities of large language
models (LLMs). First, this approach begins by generating an initial prompt that
guides LLMs to predict the next item in a session, based on the varied intents
manifested in user sessions. Then, to refine this process, we introduce an
innovative prompt optimization mechanism that iteratively self-reflects and
adjusts prompts. Furthermore, our prompt selection module, built upon the LLMs'
broad adaptability, swiftly selects the most optimized prompts across diverse
domains. This new paradigm empowers LLMs to discern diverse user intents at a
semantic level, leading to more accurate and interpretable session
recommendations. Our extensive experiments on three real-world datasets
demonstrate the effectiveness of our method, marking a significant advancement
in ISR systems
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild
Aphid infestations can cause extensive damage to wheat and sorghum fields and
spread plant viruses, resulting in significant yield losses in agriculture. To
address this issue, farmers often rely on chemical pesticides, which are
inefficiently applied over large areas of fields. As a result, a considerable
amount of pesticide is wasted on areas without pests, while inadequate amounts
are applied to areas with severe infestations. The paper focuses on the urgent
need for an intelligent autonomous system that can locate and spray
infestations within complex crop canopies, reducing pesticide use and
environmental impact. We have collected and labeled a large aphid image dataset
in the field, and propose the use of real-time semantic segmentation models to
segment clusters of aphids. A multiscale dataset is generated to allow for
learning the clusters at different scales. We compare the segmentation speeds
and accuracy of four state-of-the-art real-time semantic segmentation models on
the aphid cluster dataset, benchmarking them against nonreal-time models. The
study results show the effectiveness of a real-time solution, which can reduce
inefficient pesticide use and increase crop yields, paving the way towards an
autonomous pest detection system
Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations
Colorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists. The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification. We have also compared the performance of eight state-of-the-art deep learning-based object detection models. The results demonstrate that deep CNN models are promising in CRC screening. This work can serve as a baseline for future research in polyp detection and classification
Yi: Open Foundation Models by 01.AI
We introduce the Yi model family, a series of language and multimodal models
that demonstrate strong multi-dimensional capabilities. The Yi model family is
based on 6B and 34B pretrained language models, then we extend them to chat
models, 200K long context models, depth-upscaled models, and vision-language
models. Our base models achieve strong performance on a wide range of
benchmarks like MMLU, and our finetuned chat models deliver strong human
preference rate on major evaluation platforms like AlpacaEval and Chatbot
Arena. Building upon our scalable super-computing infrastructure and the
classical transformer architecture, we attribute the performance of Yi models
primarily to its data quality resulting from our data-engineering efforts. For
pretraining, we construct 3.1 trillion tokens of English and Chinese corpora
using a cascaded data deduplication and quality filtering pipeline. For
finetuning, we polish a small scale (less than 10K) instruction dataset over
multiple iterations such that every single instance has been verified directly
by our machine learning engineers. For vision-language, we combine the chat
language model with a vision transformer encoder and train the model to align
visual representations to the semantic space of the language model. We further
extend the context length to 200K through lightweight continual pretraining and
demonstrate strong needle-in-a-haystack retrieval performance. We show that
extending the depth of the pretrained checkpoint through continual pretraining
further improves performance. We believe that given our current results,
continuing to scale up model parameters using thoroughly optimized data will
lead to even stronger frontier models
- …