Most of the useful literature is from CMU, which has been struggling with this problem since the late 1980s.
Vision for offroad driving
The CMU offroad NavLab work gives us a clue. Good references to the CMU work are
There's substantial material in there, and it's worth reading quite a bit of it.
As an overview of how they do it, here are some key extracts, with commentary:
From An Integrated System For Off-Road Navigation, D. Langer, J. Rosenblatt, and M. Hebert, 1994. (PostScript, compressed with gzip, file may be damaged)
There's the first step they did after obtaining a depth image. It's almost like constructing a height field, but not quite. The points are mapped into 3D space, after adjusting for vehicle orientation (note that a gyro/accelerometer INS system is needed for this) , and tallied in a grid map of 20cm cells. They require at least five points per cell. From this we can calculate the system's range. 0.5 degree subtends 20cm at a range of 22 meters, which sounds good, but if we require 5 points per cell, we get less than half that range at best. This CMU project drove at slow speeds, 3-10 MPH. For higher-speed operation, we're going to need to do this at several scales, with lower-resolution maps for more distant terrain.
The local map in that system is a map of untraversable cells. This is simple and straightforward, but relies on decisions about untraversability made very early. We probably need a more quantatitive measure of traversability. Less-traversable regions (bumpy or tilted) may need to be traversed, but at lower speeds or from more favorable approach directions.
Above that lies the level that actually makes driving decisions, which is a subject for a separate note.
CMU's most successful on-road driving system, ALVINN, used a completely different approach - one camera recognizing roads with neural nets. This is the system used in the "Hands off across America" test, accomplishing over a thousand miles of on-road driving with a human standing by to take over at any moment. The neural nets were in control over 99% of the time.
Neural nets are somewhat out of fashion at the moment. Yet the ALVINN project, in the early 1990s, got surprisingly good results with a quite dumb algorithm. The basic idea is simple enough - 30 neural-net recognizers are provided for a range of situations from "road curving sharply ot the left" through "straight road" through "road curving sharply to the right". The set of outputs tends to have a peak near the correct result. The nets are all pre-trained from a set of modified images of actual roads.
This is a first cut at the problem.
Currently, I'm thinking in terms of two camera systems. The three main cameras are behind the windshield, arranged in a triangle, and there are three of them. They may be gyro-stabilized and must be mounted to filter out enough vibration that there's little noticeable vibration during a frame time. These are our main system.
Stabilization will be tough, because the cameras have to be stabilized as a unit. This needs to be looked into. One useful trick might be to have a hood ornament visible in the field of each camera, and use it as an alignment guide.
The auxiliary cameras are a pair, mounted in the front brush guard and aimed forward and down. Something like the Point Grey Bumblebee. We might install it inside a transparent plastic cylinder. These cameras don't have to be gyro-stabilized, because they're for use only in slow-speed situations. Their main job is to deal with the blind spot topping a rise, when you can't see the ground through the windshield. This prevents going over a cliff, or, more likely, into a ditch.
It should be possible to drive the vehicle on either set of cameras alone, for redundancy, although at slower speeds and not in as many situations.