IEEE Transactions on Robotics (TRO)

DexRepNet++

Learning Dexterous Robotic Manipulation with
Geometric and Spatial Hand-Object Representations

Qingtao Liu¹, Zhengnan Sun¹, Yu Cui¹, Haoming Li¹, Gaofeng Li¹, Lin Shao², Jiming Chen¹, Qi Ye¹

¹Zhejiang University ²National University of Singapore

Introducing a novel representation that captures object surface features and spatial relations, enabling generalizable skills across grasping, in-hand reorientation, and bimanual handover.

Paper Arxiv Code for Grasping Code for Other Tasks Watch Video

Abstract

Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts. While existing methods focus on sample efficiency, less attention has been paid to the representations for generalization in complex hand-object interactions.

We propose DexRep, a novel representation capturing object surface features and spatial relations. Our method achieves a 87.9% success rate on 5,000+ unseen objects and boosts performance by 20% to 40% in reorientation and bimanual handover tasks.

Fig 1. The DexRepNet++ framework: Integrating geometric surface features and spatial hand-object relations.

DexRep Representation

A novel geometric and spatial representation for robust hand-object interaction learning.

Massive Generalization

Trained and validated on 5,000+ objects, achieving state-of-the-art success rates.

Sim-to-Real Transfer

Seamless deployment on real robotic platforms with zero-shot or minimal fine-tuning.

Experiments & Results

We evaluate DexRep across single and dual-hand tasks, demonstrating superior generalization and robustness.

Fig 2. Policies learned with DexRep perform grasping, in-hand reorientation, and handover tasks.

Task 1

Dexterous Grasping

Policies learned with only 40 objects generalize to 5,000+ unseen shapes.

88.1% Success Rate

Task 2

In-Hand Reorientation

Dynamic rotation to reach target orientations with high precision.

86.0% Success Rate

Task 3

Bimanual Handover

Smooth transfer between two hands, boosting success rates by up to 40%.

77.3% Success Rate

Quantitative Results & Analysis

Fig 3. Quantitative comparison of component contributions across random seeds.

Ablation & Component Analysis

Through extensive ablation experiments on different representation components, we derive the following core insights:

Surface Dominance

Empirical results show that the surface feature ($f_s$) is the most critical individual component, achieving a success rate of approximately 93.5% on unseen objects independently.

Synergetic Effects

Peak performance (96.6% SR) is achieved only by combining occupancy ($f_o$), surface distance ($f_s$), and local geometry ($f_l$) for complex topologies.

Local vs. Global

Integrating global PointNet features (pGlo) degrades performance, proving that local geometric representations possess significantly stronger transferability.

Multi-Morphology Support

Configuration	2-Finger	3-Finger	4-Finger
DexRep (%)	65.4%	78.2%	81.5%

Hand Agnosticism: DexRep models the spatial-geometric relationship of interactions rather than joint signals, allowing seamless adaptation to robots with different finger counts.

Robustness Analysis

Real-world Deployment

85.0% Success Rate

under partial observations

Noise Tolerance: Voxel-based encoding provides inherent tolerance to sensor noise and occlusions typical of commodity cameras.

Zero-Shot Transfer: Despite incomplete point clouds, DexRep maintains a minimal sim-to-real gap (under 5% drop).

Hardware Platform

Fig 4. Our experimental setup: Allegro Hand v4, Unitree Z1 arm, and Azure Kinect DK sensors.

BibTeX

@ARTICLE{liu2026dexrepnet++,
  author={Liu, Qingtao and Sun, Zhengnan and Cui, Yu and Li, Haoming and Li, Gaofeng and Shao, Lin and Chen, Jiming and Ye, Qi},
  journal={IEEE Transactions on Robotics}, 
  title={DexRepNet++: Learning Dexterous Robotic Manipulation With Geometric and Spatial Hand-Object Representations}, 
  year={2026},
  volume={42},
  number={},
  pages={799-818},
  keywords={Hands;Geometry;Grasping;Robots;Encoding;Handover;Training;Shape;Feature extraction;Visualization;Deep learning in robotics and automation;dexterous manipulation;hand-object representation;reinforcement learning (RL)},
  doi={10.1109/TRO.2026.3651669}}
}