That Transform Technology Summits launch on October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!
The human hand is one of the fascinating creations in nature and one of the highly coveted targets of artificial intelligence and robot scientists. A robotic hand that could manipulate objects, as we do, would be hugely useful in factories, warehouses, offices, and homes.
Despite huge advances in the field, research on robot technology is still extremely expensive and limited to a few very wealthy companies and research laboratories.
Now, new research promises to make robotics technology accessible to resource-constrained organizations. In a paper published on arXiv, researchers at the University of Toronto, Nvidia and other organizations have unveiled a new system that utilizes highly effective deep-reinforcement learning techniques and optimized simulated environments to train robotic hands at a fraction of the cost it would normally take.
It is expensive to train robot hand
All we know is the technology to create human-like robots is not here yet. However, given enough resources and time, you can make significant progress with specific tasks, such as manipulating objects with a robot hand.
In 2019, OpenAI presented Dactyl, a robotic hand that could manipulate a Rubik’s cube with impressive dexterity (though still significantly inferior to human dexterity). But it took 13,000 years of training to get it to the point where it could handle objects reliably.
How do you fit 13,000 years of training into a short period of time? Fortunately, many software tasks can be parallelized. You can train multiple reinforcement learning agents simultaneously and merge their learned parameters. Parallelization can help reduce the time it takes to train AI that controls the robotic hand.
However, speed comes at a price. One solution is to create thousands of physical robot hands and train them simultaneously, a path that would be economically unaffordable, even for the richest technology companies. Another solution is to use a simulated environment. With simulated environments, scientists can train hundreds of AI agents at the same time and then fine-tune the model on a real physical robot. The combination of simulation and physical training has become the norm in robotics, autonomous driving and other areas of research that require interaction with the real world.
However, simulations have their own challenges and the cost of computing can still be too great for smaller businesses.
OpenAI, which has financial backing from some of the richest companies and investors, developed Dactyl using expensive robotic hands and an even more expensive computer cluster comprising around 30,000 CPU cores.
Lowers the cost of robot research
In 2020, a group of researchers at the Max Planck Institute for Intelligent Systems and New York University proposed an open source robotic research platform that was dynamic and used hardware at an affordable price. The name TriFinger, the system used PyBullet physics engine for simulated learning and a cheap robot hand with three fingers and six degrees of freedom (6DoF). The researchers later launched the Real Robot Challenge (RRC), a Europe-based platform that gave researchers remote access to physical robots to test their reinforcement learning models.
The TriFinger platform reduced the cost of robotic research, but still had more challenges. PyBullet, a CPU-based environment, is noisy and slow, making it difficult to train amplification learning models effectively. Poorly simulated learning creates complications and widens the “sim2real hole”, the performance drop that the trained RL model suffers when transferred to a physical robot. Therefore, robot researchers have to go through several cycles of switching between simulated training and physical testing to adjust their RL models.
“Previous work with hand manipulation required large clusters of CPUs to run. In addition, the technical effort required to scale learning methods for reinforcement has been prohibitive for most research teams, ”Arthur Allshire, lead author of the paper and a simulation and robotics trainee at Nvidia, told TechTalks. “This meant that despite advances in scaling deep RL, further algorithmic or system advances have been difficult. And the hardware cost and maintenance time associated with systems like Shadow Hand [used in OpenAI Dactyl] … has limited hardware availability for testing learning algorithms. ”
Based on the work of the TriFinger team, this new group of researchers aimed to improve the quality of simulated learning while keeping costs down.
Training of RL agents with single-GPU simulation
The researchers replaced PyBullet with Nvidia’s Isaac Gym, a simulated environment that can run efficiently on desktop-grade GPUs. Isaac Gym leverages Nvidia’s PhysX GPU-accelerated engine to allow thousands of parallel simulations on a single GPU. It can deliver about 100,000 samples per second on an RTX 3090 GPU.
“Our task is suitable for resource-limited research laboratories. Our method took a day to train on a single desktop level GPU and CPU. Every academic laboratory working in machine learning has access to this level of resources, ”said Allshire.
According to the newspaper, an entire setup to run the system, including training, completion and physical robot hardware, can be purchased for less than $ 10,000.
The efficiency of the GPU-powered virtual environment enabled researchers to train their amplification learning models in a high-fidelity simulation without reducing the speed of the training process. Higher loyalty makes the training environment more realistic, reducing the sim2real gap and the need to fine-tune the model with physical robots.
The researchers used a sample object manipulation task to test their reinforcement learning system. As input, the RL model receives proprioceptive data from the simulated robot along with eight key points representing the position of the target object in the three-dimensional Euclidean space. The output of the model is the torques applied to the motors in the nine links of the robot.
The system uses Proximal Policy Optimization (PPO), a model-free RL algorithm. Model-free algorithms eliminate the need to calculate all the details of the environment, which is very expensive, especially when dealing with the physical world. AI researchers often seek cost-effective, model-free solutions to their reinforcement learning problems.
The researchers designed the reward for robot hand RL as a balance between the distance of the fingers to the object, the object’s destination, and the intended pose.
To further improve the robustness of the model, the researchers added random noise to various elements of the environment during training.
Test on real robots
Once the reinforcement learning system had been trained in the simulated environment, the researchers tested it in the real world through remote access to the TriFinger robots provided by the Real Robot Challenge. They replaced the simulator’s proprioceptive and image input with the sensor and camera information from the external robot lab.
The trained system transferred its capabilities to the real robot with a drop of seven percent in accuracy, an impressive sim2real gap improvement over previous methods.
The key point-based object tracking was particularly useful in ensuring that the robot’s object handling properties are generalized across different scales, positions, conditions, and objects.
“One limitation of our method – implementation on a cluster we did not have direct physical access to – was the difficulty of trying other objects. However, we were able to test other objects in simulation, and our policies proved to be relatively robust with zero-shot transfer performance from the dice, ”said Allshire.
Researchers say the same technique can work on robotic hands with multiple degrees of freedom. They did not have the physical robot to measure the sim2real space, but the Isaac Gym simulator also contains complex robot hands like the Shadow Hand used in Dactyl.
This system can be integrated with other reinforcement learning systems that address other aspects of robot technology, such as navigation and pathfinding, to form a more complete solution for training mobile robots. “For example, you can have our method of controlling at a low level of a grabber, while higher-level planners or even learning-based algorithms are able to operate at a higher level of abstraction,” Allshire said.
The researchers believe that their work presents “a path to democratization of robotics and a viable solution through large-scale simulation and robotics-as-a-service.”
Ben Dickson is a software engineer and founder of TechTalks. He writes about technology, business and politics.
This story originally appeared on Bdtechtalks.com. Copyright 2021
VentureBeat’s mission is to be a digital urban space for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides important information about data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:
- updated information on topics that interest you
- our newsletters
- gated thought-leader content and discount access to our valued events, such as Transform 2021: Learn more
- networking features and more