UGG: Unified Generative Grasping
ECCV 2024

Abstract

Dexterous grasping aims to produce diverse grasping postures with a high grasping success rate. Regression-based methods that directly predict grasping parameters given the object may achieve a high success rate but often lack diversity. Generation-based methods that generate grasping postures conditioned on the object can often produce diverse grasping, but they are insufficient for high grasping success due to lack of discriminative information. To mitigate, we introduce a unified diffusion-based dexterous grasp generation model, dubbed the name UGG, which operates within the object point cloud and hand parameter spaces. Our all-transformer architecture unifies the information from the object, the hand, and the contacts, introducing a novel representation of contact points for improved contact modeling. The flexibility and quality of our model enable the integration of a lightweight discriminator, benefiting from simulated discriminative data, which pushes for a high success rate while preserving high diversity. Beyond grasp generation, our model can also generate objects based on hand information, offering valuable insights into object design and studying how the generative model perceives objects. Our model achieves state-of-the-art dexterous grasping on the large-scale DexGraspNet dataset while facilitating human-centric object design, marking a significant advancement in dexterous grasping research.

Method Overview

Overview of the proposed method UGG: Our approach involves encoding and embedding the object, contact anchors, and hand to facilitate the learning of a unified diffusion model. During inference, random seeds are sampled and subjected to a denoising process to generate samples. To discern potentially successful grasps, a physics discriminator is introduced. Subsequently, an optimization stage is undertaken for all selected grasps, utilizing the generated contact anchor and input point cloud.

Grasp Generation


Visualization of the generated diverse grasps of UGG on the DexGraspNet objects (mesh used only for visualization). Top: grasps of objects from seen categories; Bottom: grasps for objects of novel categories.

Object Generation


Visualization of object generation by UGG across three subsets, utilizing grey hand poses subjected to different transformations. The hand poses within each column are the same. The top row presents the results of the model trained on the entire dataset, while the middle and bottom rows exhibit the results of the models trained on the 20 objects subset and the 10 bottles subset, respectively.

Joint Generation


Joint generation visualization of UGG, illustrating simultaneous generation of hand and objects. Results are presented for models trained on two subsets.

3D Visualization

Please refer to this page for more 3D visualization of our model. (The loading may be slow due to the heavy model and please use Google Chrome for better adapatability.)

Citation



The website template was borrowed from Michaƫl Gharbi.