Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images

The Australian National University
Conferance 2023

* Indicates Equal Contribution

Abstract

The Stable Diffusion model is a popular and effective model for image generation. But sometimes the image of the human hand it generates is not standard, such as a hand with less than or more than five fingers. Building upon the foundational HaGRID dataset, we curated our own dataset tailored to the specific challenges of non-standard hand representations. This research addresses this issue by introducing a comprehensive pipeline that not only detects these inaccuracies but also restores them to closely resemble real-world hand images, termed as standard hands. Our methodology incorporates a detection phase using a fine-tuned YOLO model, proficiently identifying and categorizing hand types across diverse datasets: images generated by Stable Diffusion, real photographs, and redrawn samples from the HaGRID dataset. Following detection, our multi-phased restoration process involves body pose estimation, control image generation, and subsequent inpainting processes, effectively transforming non-standard hand to their standard hand counterparts. The conducted experiments validate the robustness and efficacy of our approach, marking a significant advancement in enhancing the Stable Diffusion model's capabilities in hand image generation. For quick and easy use, we have encapsulated our methodology into an interactive web application. This platform empowers users to quick upload images and get immediate restoration feedback.

MY ALT TEXT

Flowchart of the whole pipeline. First, we input the hand with a non-standard form and use YOLOv8 to label the bounding box of the non-standard hand, producing a mask we refer to as the bounding box mask. The body skeleton is then calculated using MediaPipe. Based on this skeleton, the template is placed precisely where the non-standard hand is, termed the control image. The template's bounding box combined with the bounding box mask create the union mask. We use the control image and a prompt describing the template to fix the area covered by the union mask. Using IP2P with its prompt, we further refine the texture to achieve the final output.

Poster

BibTeX


@article{zhang2023detecting,
    title={Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images},
    author={Zhang, Yiqun and Qin, Zhenyue and Liu, Yang and Campbell, Dylan},
    journal={arXiv preprint arXiv:2312.04236},
    year={2023}
}