Jump to content

Tesla’s autonomy event: Impressive progress with an unrealistic timeline


Karlston

Recommended Posts

An old software joke explains Elon Musk's implausible autonomy timeline.

Tesla’s autonomy event: Impressive progress with an unrealistic timeline
Enlarge Getty / Aurich Lawson / Tesla

There's an old joke in the software engineering world, sometimes attributed to Tom Cargill of Bell Labs: "the first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."

 

On Monday, Tesla held a major event to show off the company's impressive progress toward full self-driving technology. The company demonstrated a new neural network computer that seems to be competitive with industry leader Nvidia. And Tesla explained how it leverages its vast fleet of customer-owned vehicles to collect data that helps the company train its neural networks.

 

Elon Musk's big message was that Tesla was close to reaching the holy grail of fully self-driving cars. Musk predicts that by the end of the year, Tesla's cars will be able to navigate both surface streets and freeways, allowing them to drive between any two points without human input.

 

At this point, the cars will be "feature complete," in Musk's terminology, but will still need a human driver to monitor the vehicle and intervene in the case of a malfunction. But Musk predicts it will only take about six more months for the software to become reliable enough to no longer require human supervision. By the end of 2020, Musk expects Tesla to have thousands of Tesla vehicles providing driverless rides to people in an Uber-style taxi service.

 

In other words, Musk seems to believe that once Tesla's cars become "feature complete" later this year, they will be 90 percent of the way to full autonomy. The big question is whether that's actually true—or whether it's only true in the Cargill sense.

Two stages of self-driving car development

Waymo engineers represent road situations using complex diagrams like this.
Enlarge / Waymo engineers represent road situations using complex diagrams like this.

You can think of self-driving car development as occurring in two stages. Stage one is focused on developing a static understanding of the world. Where is the road? Where are other cars? Are there any pedestrians or bicycles nearby? What are the traffic laws in this particular area?

 

Once software has mastered this part of the self-driving task, it should be able to drive flawlessly between any two points on empty roads—and it should mostly be able to avoid running into things even on crowded roads. This is the level of autonomy Musk has dubbed "feature complete." Waymo achieved this level of autonomy around 2015, while Tesla is aiming to reach it later this year.

 

But building a car suitable for use as a driverless taxi requires a second stage of development—one focused on mastering the complex interactions with other drivers, pedestrians, and other road users. Without such mastery, a self-driving car will frequently get frozen with indecision. It will have trouble merging onto crowded freeways, navigating roundabouts, and making unprotected left turns. It might find it impossible to move forward in areas with a lot of pedestrians crossing the road, for fear one might jump in front of the car. It will have no idea what to do near construction sites or in busy parking lots.

 

A car like this might get you to your destination eventually, but it might be such a slow and erratic ride that no one wants to use it. And its clumsy driving style might drive other road users crazy and turn the public against self-driving technology.

 

In this second stage, a company also needs to handle a "long tail" of increasingly unusual situations: a car driving the wrong way on a one-way road; a truck losing traction on an icy road and slipping backward toward your vehicle; a forest fire, flood, or tornado that makes a road impassable. Some events may be rare enough that a company might test its software for years and still never see them.

 

Waymo has spent the last three years in the second stage of self-driving development. By contrast, Elon Musk seems to view it as trivial. He seems to believe that once Tesla's cars can recognize lane markings and other objects on the road that it will be just about ready for fully driverless operation.

Tesla’s new self-driving chip

A self-driving Tesla prototype using Nvidia Drive PX 2 AI technology.
Enlarge / A self-driving Tesla prototype using Nvidia Drive PX 2 AI technology. Nvidia

Over the last decade there has been a deep-learning revolution as researchers discovered that the performance of neural networks keeps improving with a combination of deeper networks, more data, and a lot more computing power. Early deep learning experiments were conducted using the parallel processing power of consumer-grade GPUs. More recently, companies like Google and Nvidia have begun designing custom chips specifically for deep learning workloads.

 

Since 2016, Autopilot has been powered by Nvidia's Drive PX hardware. But last year we learned that Tesla was dumping Nvidia in favor of a custom-designed chip. Monday's event served as a coming-out party for that chip—officially known as the Full Self-Driving Computer.

 

Musk invited Pete Bannon, a chip designer Tesla hired away from Apple in 2016, to explain his work. Bannon said that the new system is designed to be a drop-in replacement for the previous Nvidia-based system.

 

"These are two independent computers that boot up and run their own operating systems," Bannon said. Each computer will have an independent source of power. If one of the computers crashes, the car will be able to continue driving.

 

Each self-driving chip has 6 billion transistors, Bannon said, and the system is designed to perform a handful of operations used by neural networks in a massively parallel way. Each chip has two compute engines capable of performing 9,216 multiply-add operations—the heart of neural network computations—every clock cycle. Each Full Self-Driving system will have two of these chips, resulting in a total computing capacity of 144 trillion operations per second.

 

Tesla says that's a 21-fold improvement over the Nvidia chips the company was using before. Of course, Nvidia has produced newer chips since 2016, but Tesla says that its chips are more powerful than even Nvidia's current Drive Xavier chip—144 TOPS compared to 21 TOPS.

 

But Nvidia argues that's not a fair comparison. The company says its Xavier chip delivers 30 TOPS, not 21. More importantly, Nvidia says it typically packages the Xavier on a chip with a powerful GPU chip, yielding 160 TOPS of computing power. And like Tesla, Nvidia packages these systems in pairs for redundancy, producing an overall system with 320 TOPS of computing power.

 

Of course, what really matters isn't the number of theoretical operations a system can perform, but how well the system performs on actual workloads. Tesla claims that its chips are specifically designed for high performance and low power consumption for self-driving applications, which could yield better performance than Nvidia's more general-purpose chips. Regardless, both companies are working on next-generation designs, so any advantage either company achieves is likely to be fleeting.

Data from the Tesla fleet trains neural networks

Examples of trucks attached to bikes seen by cars in Tesla's customer fleet.
Enlarge / Examples of trucks attached to bikes seen by cars in Tesla's customer fleet. Tesla

A classic application for neural networks—one that's highly relevant for self-driving cars—is image recognition. Self-driving software needs to know if a nearby object is a car, a bicycle, a pedestrian, a lamp-post, or a bag of trash. This information helps the software determine how the object is likely to move in the future—and how serious a problem it would be to hit it.

 

Neural networks are well-suited to this kind of image classification problem. To train a neural network, a programmer will typically build a large database of images labeled with the type of objects it contains. The system then uses a technique called back-propagation to "train" the network to classify images correctly.

 

Over the last decade, researchers have found that deep neural networks become more and more accurate as you throw more data and computing power at them. But crucially, more data only adds value if the data represents the full complexity of the real world. A neural network is extremely literal; it can only learn to recognize a particular type of object if it sees many examples of that object in its training data.

 

Tesla AI guru Andrej Karpathy gave a good example of this in his Monday presentation. He showed some pictures of cars or SUVs with bikes strapped to the back of them. It's important for self-driving software to understand that this should be treated as a single car-like object rather than worrying about the bike spontaneously driving off in a different direction.

 

Karpathy said that Tesla has the ability to query the Tesla fleet for examples of unusual situations encountered in the wild. For example, if the company is worried that its software doesn't do a good enough job of recognizing bikes strapped to vehicles, it can ask cars in the fleet to watch out for images of bikes and cars in close proximity. Tesla can then hire people to spot-check these images and verify that they contain bikes attached to vehicles. Images are then added to Tesla's training set, helping to make future versions of the software better at understanding bike-vehicle combinations.

Running down the long tail will take time

The strategy Tesla laid out on Monday seems well-suited to getting Tesla through what I've described as the first stage of self-driving development: it'll get Autopilot to the point where it can drive between any two points as long as the roads aren't too crowded and nothing weird happens. At that point, Tesla will enter the second stage: dealing with complex human interactions and the "long tail" of rare but potentially dangerous situations.

 

Tesla's basic argument on Monday was that its ability to draw data from the Tesla fleet gives it a big advantage in tackling the long-tail problem.

 

"The rate of progress at which we can actually address these problems, iterate on the software, and really feed the neural networks with the right data, that rate of progress is really proportional to how often you encounter these situation in the wild," Karpathy said.

 

But while Tesla has potential access to a vast amount of data, Tesla vehicles don't have enough bandwidth to stream every minute of driving to Tesla headquarters. Instead, Tesla's engineers tell cars what types of situations to watch for, and cars then stream back short video clips when they encounter something that matches one of the queries.

 

This means that Tesla will only be able to collect data on a rare situation if engineers have the foresight to request that type of data from the cars. If a situation is so strange that its engineers never think to look for it, then it may never make its way into Tesla's training data.

 

Fortunately, Tesla has some techniques for flagging situations that its software doesn't understand well enough. Driver interventions provide one clue that Autopilot was doing something wrong. Conversely, when Autopilot is inactive Tesla can still run its self-driving software in "shadow mode"—computing what the car would have done if Autopilot had been active. If the driver does something different—especially if the driver does something surprising, like swerving or braking hard—that's a sign that something unusual was happening. Then the car might send back some sensor data for Tesla staffers to check it out and see if it's a corner case that needs to be handled better by Tesla's self-driving software.

 

The problem is that this is a pretty noisy signal. Drivers are going to deactivate Autopilot hundreds of thousands of times every day—far too often for a human being to spot-check more than a tiny fraction of these cases. The better the software gets, the more this method will be like looking for a needle in a haystack.

 

This isn't to say that Tesla's approach won't work. Hiring people to pore through billions of miles of data harvested from the Tesla fleet is certainly less labor-intensive than Waymo's approach, which requires hiring human drivers to actually drive millions of miles. But as Karpathy himself puts it, "chasing some of the last few nines is going to be very tricky and very difficult." It's going to take a while, no matter how much raw data a company has access to.

Understanding humans is hard

This is Delphi's driving interface depicting the view of its self-driving car. Blue lines are map information, dots are points spotted by LiDAR, Xes are points spotted by radar. Crosswalks turn green when no pedestrians are in them.
Enlarge / This is Delphi's driving interface depicting the view of its self-driving car. Blue lines are map information, dots are points spotted by LiDAR, Xes are points spotted by radar. Crosswalks turn green when no pedestrians are in them.

Self-driving cars don't just need a static understanding of the world as it exists this second—they also need a dynamic understanding of how the world could change in the next few seconds. In particular, self-driving software needs to understand how human beings are likely to behave and how they are likely to respond to possible actions by the self-driving car.

 

One of the most insightful questions at Monday's event focused on exactly this question. The answers were revealing.

 

"This is very helpful for understanding annotations, where the objects are, and how the car drives," said a man whose name wasn't identified in Monday's webcast. "But what about the negotiation aspects for parking and roundabouts and other things where there are other cars on the road that are human driven where it's more art than science?"

 

"It does pretty good, actually," Musk said. "Like with cut-ins and stuff it's doing really well."

 

Karpathy offered a more nuanced take:

We're using a lot of machine learning, right now, in terms of predicting, creating an explicit representation of what the world looks like, and then there's an explicit planner and a controller on top of that representation, and there's a lot of heuristics for how to traverse and negotiate and so on. There is a long tail—just like in what visual environments look like—there is a long tail in just those negotiations and the little game of chicken you play with other people and so on. And so we have a lot of confidence that eventually there must be a fleet learning component to how you actually do that. Because writing all of those rules by hand is going to quickly plateau I think.

This exchange was revealing in a couple of ways. First, it seems clear that Karpathy has thought about this problem a lot more deeply than Musk has. Cut-ins are one situation where a driver needs to predict another driver's actions, but it's one of the simplest human-to-human interactions on the road. The fact that Tesla's cars do a "pretty good" job here doesn't prove much about the viability of Tesla's approach.

 

By contrast, Karpathy seems to recognize the complexity here. His statement that "eventually there must be a fleet-learning component" suggests that Tesla hasn't made much progress actually developing a fleet-learning solution to this class of problems. To the contrary, he acknowledges that Autopilot's planning module uses a "lot of heuristics" to handle interactions with other drivers.

 

Musk acknowledged this point during the question-and-answer period.

 

"Essentially, right now AI and neural nets are used really for object recognition," Musk said. "We're still basically using it as still frames and identifying objects as still frames and tying it together in a perception/path planning layer thereafter."

 

There seems to be a lot of tension between these answers and Musk's aggressive timeline for the overall self-driving project. The argument for Tesla to make faster progress than its rivals is the use of neural networks trained on massive amounts of data harvested from Tesla's fleet.

 

But by Karpathy and Musk's own admission, Tesla is just starting to experiment with the use of neural networks for the more complex aspects of perception and path planning. It seems hard to believe that Tesla will be able to completely re-write its perception and path-planning software using neural networks and then rigorously test it—all in the next 15 months. Karpathy says that this will happen "eventually," which suggests that he's not that confident it will happen in the next 15 months.

 

And that makes perfect sense because it's the kind of problem that Waymo—which has plenty of its own machine-learning talent—has been struggling with for three years or more. It's not a knock on Tesla to predict that the company will be wrestling with it for several years to come.

Dismissing lidar and HD maps out of hand is foolish

An example of a three-layer image captured by camera and a lidar system from Ouster. The top image shows ambient light, the middle one shows reflected laser light, and the bottom one provides depth readings. Below that is a three-dimensional render of the traditional lidar "point cloud" format.
Enlarge / An example of a three-layer image captured by camera and a lidar system from Ouster. The top image shows ambient light, the middle one shows reflected laser light, and the bottom one provides depth readings. Below that is a three-dimensional render of the traditional lidar "point cloud" format. Ouster

Another major theme of Monday's presentation was Musk's belief that lidar and high-definition maps are worse than useless. Musk repeatedly argued that self-driving systems that relied on these systems were "brittle" and prone to failure. He predicted that Tesla's rivals would eventually be forced to drop them.

 

Musk is probably right that it's possible to navigate the world using only cameras, since human beings do it. But we have no idea if it will take two, 10, or 30 years to develop software to perform the same feat.

 

It's also true that using lidar or HD maps can be counterproductive if software relies on them too heavily. For example, it wouldn't be good for a self-driving car to blindly follow a path laid out on an HD map without checking to make sure the real world hasn't changed since the map was created.

 

But it seems obvious that lidar and HD maps can add value if used intelligently. Tesla fans like to say that you don't need lidar if you have good enough vision algorithms, but this ignores the fact that self-driving software is inherently probabilistic.

For example, software might calculate that it is 97-percent confident that one object is a lamp-post, and 83 percent certain that a hard-to-classify object is actually just a sensor artifact.

 

If the car has an HD map, it can check to see whether there's supposed to be a lamp-post in that location. If so, then the car can move forward with greater confidence. If there's no lamp-post shown on the HD map, then the car will need to consider other possible interpretations—and to slow down if it can't figure out what the object is.

 

Similarly, lidar can help to confirm whether there's actually an object in the direction of the apparent sensor artifact—helping the car figure out if it's safe to ignore it.

 

The more different ways a car has to perceive the world, the less likely it is to make a mistake based on misleading data from any one sensor. Lidar and HD maps both provide a driverless car with data that can help self-driving software to confirm or disconfirm data derived from other sensors.

 

But the more fundamental thing to notice is how much Musk's criticism of lidar focuses on the relatively simple task of constructing a static model of objects in the world around a vehicle.

 

In his presentation, Karpathy spent time describing Tesla's strategy for using cameras to detect lane lines and to determine distances to objects in camera frames. Tesla may be making rapid progress toward solving these problems without lidar and cameras. But the reality is that Waymo has had these problems more or less solved for several years. Waymo's heavy reliance on lidar and HD maps may not be the most cost-effective or technically elegant solution. But it works well enough, and it's weird for a company that's still struggling to solve the same problem to dismiss Waymo's solution as unworkable.

 

Source: Tesla’s autonomy event: Impressive progress with an unrealistic timeline (Ars Technica)

Link to comment
Share on other sites


  • Views 682
  • Created
  • Last Reply

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...