We need to build a map of the vehicle's immediate surroundings, which we then use to decide where it's safe to drive. Here are some notes on how to do that.
Two papers from CMU provide background for this note. These are Moravac's classic "certainty grid" paper from 1985, and the CMU NavLab offroad driving paper from the 1990s.
Certainty grids were first developed for building local maps using data from sonar rangefinders, which is of very poor quality. Sonar rangefinders return a range to the nearest object in a 20 degree cone. This is like exploring your environment with the blunt end of a broom. Yet good maps, showing objects smaller than the beam, can be constructed by combining multiple sensor readings obtained from a moving vehicle. (I tried this myself in the late 1980s, using a model R/C car with a steerable Polaroid sonar on top. It works.)
The NavLab paper describes the building of a data structure much like a certainty grid from laser rangefinder data. This data was used to drive a HUMMV off-road. Driving was slow, but it worked.
From here on, I assume the reader has read both of those papers.
Large-volume, low-quality data
Stereo vision generates data at high data rates. We may be able to get range images at 640x480x15FPS. Each pixel can have a range and a data quality value. This is about 9MB/sec. We'll need to get that down. More on this later.
Stereo range data has some important error properties. Most obviously, the error in the range information increases with distance. (Note that time-of-flight lidar and radar systems do not have that property; their error is constant with respect to distance, although they have range limits.) Range data will have areas where no range can be obtained. Smooth surfaces, with no detail the correlator can find, won't return range information. The opposite extreme, where noise prevents the correlator from obtaining a match, is also a possibility. Those two situations can be distinguished, at some computational cost. Smooth areas can potentially be filled in using range information from surrounding areas, which might be helpful when looking at smooth roads or sand dunes.
Range image stabilization
The first step is to get the range image rotated into world, rather than vehicle, coordinates. For this, we need to know which way is down, and what direction the vehicle is headed. Crossbow sells gyro/accelerometer units which provide such attitude gyro information. We need to rotate the range image using the roll information from the attitude to make the horizon in the image horizontal. The pitch information gives us the horizon line, and the heading information gives us the view direction of the image.
This is all straightforward, but we need to calibrate and correct for data lag from the different sensor systems, so that image and gyro information are in sync when the vehicle is bouncing around. We can check this by drawing the horizon line and heading tickmarks on stored images collected by driving around, then examining them to make sure they stay in sync. This is a good project to do in a test vehicle; we don't need the autonomous vehicle for this.
Here's where it gets interesting. The map we have to build is a specialized one, designed to answer the question "can we drive over this place"? We also want to update our map in ways which are least susceptible to the types of systematic errors we expect to have in our sensor data.
The map is a grid of cells. Cells are reasonably large, about the size of a pothole big enough to bother a wheel.
For each cell, I propose to store
[This really needs some good illustrations. Maybe later.]
The first two of these are the same as what the NavLab offroad navigation system stored. The NavLab system also stored an altitude for each cell. Rather than doing that, I propose to store a "neighbor tilt vector", which represents the tilt of the average of a cell relative to the neighboring cells. The rationale is that very slight errors in the attitude measurement of the stereo vision head result in large errors in altitude. But those large errors are across the entire field of view; the local error for nearby pixels is far smaller. Thus, maintaining a "neighbor tilt vector" is relatively immune to the inevitable attitude errors from jouncing around offroad. We'll have to see how this works out in practice, but I have hopes that it may eliminate the need to gyro-stabilize the stereo head. Just mounting the stereo head with enough damping that we get a clear image for each frame may be good enough.
This map has some unusual properties. First, "neighbor tilt" is entirely relative to neighbor cells; there's no absolute orientation. Second, and more problematical, neighbor tilt isn't guaranteed consistent - it's quite possible to have a set of relative tilts that don't convert consistently to an absolute orientation. This makes displaying the map difficult. At least for display, we'll have to have a way to determine an absolute orientation of each cell. We could either work outward from cells near the vehicle (we presumably know the vehicle's orientation) or use a numerical relaxation technique to solve the problem as a set of constraints. Working outward from the vehicle location probably makes more sense - range data is better near the vehicle, and correct orientation of cells matters more near the vehicle.
For driving purposes, relative orientation matters more than absolute, so this isn't a crucial problem.
Hex grids vs. rectangular grids
Hexagonal grids have some advantages over rectangular grids here. Hexagonal grids have a common edge between all adjacent cells. There's only one kind of adjacent cell; there's no corner vs edge issue, as with rectangular grids. This simplifies the calculations involving neighboring cells. On the other hand, hexagonal grids are more difficult to address. Unclear which way to go.
The system described so far has one serious problem - overhead objects are treated like solid obstacles. The vehicle would refuse to go under an underpass, a tree limb, or even a finish line banner. Because this is a desert race, overhead objects will be rare, but we probably have to consider them. We may need to have two maps; one for objects below the horizon (the "ground map"), and an inverted one for objects above it (the "sky map:). Most of the time we won't have any non-infinite depth values for objects above the horizon, and can just skip updating the sky map. (Stereo vision maps are actually 1/distance; no displacement between images means a large distance. This is good; unlike active rangefinding systems, which have a power-limited maximum range, stereo vision distinguishes between "clear for a long way ahead" from "can't see anything")
The sky map needs slightly different data. All we really need is the average (smallest?) distance between a sky cell and its corresponding ground cell. We don't care about flatness in the sky map.
The "horizon line" used to divide above from below could be the vehicle-relative horizon, the absolute horizon, or some other line derived from the sensor data. This probably doesn't matter too much. The driving system needs to slow down when faced with an overhead obstacle until it gets close, at which point the map data should be good enough that we can tell if the vehicle can get under it. There may be a false-alarm problem going up steep hills, with the hill ahead looking like an overhead obstacle. But that just slows down hill climbing; it doesn't stop it.
Updating the map
It's probably easiest, from a data volume standpoint, if we give the vision systems enough attitude and location info that they can conver their coordinates to map cell coordinates. They can then generate update events for map cells. This gets the data volume down to a reasonable level. We'll only have a few thousand, at most, map cells being updated at any one time. Updates basically consist of a big array of update items containing the same information as a map cell, plus some indication of confidence for each update item.
The map merge operation will be one of the most critical components that determines system performance. There's a bit of Bayesian inference involved here. Moravec's papers go into this in some detail. The basic operation is weighted averaging. Averaging is not sufficient, though. We may need to deal with outliers and bimodal distributions. We may be able to use Bayesian theory to help us here. But we probably need to collect some live data first.
Scrolling the map
Logically, the map is relative to the vehicle, and the data in the map is shifted and rotated as the vehicle moves.. In practice, it seems to work better if the map never rotates with respect to the ground. . The map is scrolled as the vehicle moves, or the data is moved from cell to cell.
Decaying the map.
Over time, new data added to the map replaces old data, in a low-pass filter sense. This gives us some resilience should the environment not be static. The system doesn't really handle a changing environment, but, over time, the map information about old obstacles decays. So if we're stopped by a tumbleweed, and it then blows away, we can proceed. (The vehicle can drive around obstacles, of course. But not through them.)
The map contains information about places we have seen, but can't currently see. This is marginally useful. Maybe for backing up.
All the sensor systems, the near and far vision systems and the microwave rangefinders, all update the same map. This is where the sensor data comes together. Some of those systems may also have more direct control over the vehicle, like stopping it from hitting something or going over a cliff. But they also update the map. We could drive completely blind at creeping speed if we put enough of those little microwave rangefinders on the vehicle. (Say, three on the front bumper pointed down to watch for cliffs, one on either side i in front for side obstacles, the main Vortec radar straight ahead, and two on the back bumper for backing up.) That's probably worth doing.