University of California, Berkeley
Consumer demand for augmented reality (AR) in mobile phone applications, such as the Apple ARKit. Such applications have potential to expand access to robot grasp planning systems such as Dex-Net. AR apps use structure from motion methods to compute a point cloud from a sequence of RGB images taken by the camera as it is moved around an object. However, the resulting point clouds are often noisy due to estimation errors. We present a distributed pipeline, Dex-Net AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object. We implement Dex-Net AR using the iPhone and ARKit and compare results with those generated with high-performance depth sensors. The success rates with AR on harder adversarial objects are higher than traditional depth images.
Via K-Nearest Neighbors and Random Samples Consensus (RANSAC)
PhoXi depth camera is only able to capture top-down view depth maps, thus limiting the DoF and even grasp robustness.
Using AR point clouds, we can leverage the 3D geometry and render depth maps from arbitrary viewpoints, which could result in better physical grasps.
However, top-down depth grasps rarely deal with ground collision, as opposed to multi-view.