Benchmark Results
The following results are trained on 50 episodes and evaluated on another 50 episodes.
In-distribution Evaluation
Task name |
RGB |
RGBD |
PointCloud |
resnet18 |
resnet18 |
ViT |
MultiViT |
pointnet |
spUnet |
CloseBoxL0 |
0.81 |
0.91 |
0.89 |
0.80 |
0.82 |
0.92 |
CloseBoxL1 |
0.40 |
0.58 |
0.40 |
0.42 |
0.73 |
0.88 |
CloseBoxL2 |
0.42 |
0.30 |
0.30 |
0.32 |
0.82 |
0.62 |
StackCubeL0 |
0.91 |
0.87 |
0.06 |
0.06 |
0.00 |
0.00 |
StackCubeL1 |
0.01 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
StackCubeL2 |
0.01 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
Out-of-distribution Evaluation(Zero-shot)
Task name |
RGB |
RGBD |
PointCloud |
resnet18 |
resnet18 |
ViT |
MultiViT |
pointnet |
spUnet |
CloseBoxL0 |
0.52 |
0.72 |
0.68 |
0.80 |
0.60 |
0.94 |
CloseBoxL1 |
0.20 |
0.50 |
0.36 |
0.34 |
0.77 |
0.88 |
CloseBoxL2 |
0.32 |
0.38 |
0.40 |
0.32 |
0.38 |
0.42 |
StackCubeL0 |
0.29 |
0.19 |
0.00 |
0.02 |
0.00 |
0.00 |
StackCubeL1 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
StackCubeL2 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |