PTAM (Parallel Tracking & Mapping) is a robust visual SLAM approach developed by Dr. George Klein at Oxford University. It tracks the 3D pose of the camera quite rapidly at frame rate which in turn becomes an ideal platform for implementing marker-less augmented reality. Through this post I’m going to reveal my own insights about PTAM with the help of my hands-on experience with it.
PTAM runs tracking & mapping in two separate threads. Inside the PTAM implementation we can find two files namely, Tracker.cc & MapMaker.cc (NOT the MapViewer.cc). Before start tracking, it demands an initial map of the environment and it was being built by the tracker. In System.cc there’s function called Run (). The tracking thread runs in this function. In order to build the initial map, user should supply a stereo image pair, particularly on a planar surface. PTAM calculates the initial pose with Homography Matrix, whereas the 3D coordinates of the initial map points were generated with Triangulation. Then the tracker grabs each frame in a tight loop and calculates the camera pose. The tracker performs the pose calculation in following manner. This was implemented inside the TrackFrame() function (Tracker.cc).
- A prior post was estimated from the motion model. The motion model is an estimation technique whereby it allows estimating the pose of the current frame by considering the pose of the previous frame.
- Iterate through all the map points (features) and re-project the points that are likely to be visible in the current image frame.
- Find a coarse set of matches (around 50) in the current frame. This was done through a patch search. (PTAM’s author has implemented the patch analysis from the scratch – brilliant).
- Update the camera pose.
- Take another set of map points (this time around 1000) and re-project them to the image frame.
- Find the matches again.
- Update the pose until the re-projection was minimized. This was done using the M-Estimator.
The tracker calculates the final pose of the new keyframe (inside TrackMap method) and finally this keyframe was added to a Queue.
The MapMaker, which was running asynchronously checks whether there are any keyframes on the queue. If there are any, it fetches each keyframe and finds the 3D coordinates of the new map-points. Depth cannot be calculated with a single keyframe. Therefore the depth was computed by triangulating the current keyframe with the closet keyframe. Obviously this triangulation requires a set of correspondences, which was obtained through an Epipolar search.
Sometimes there might not be enough keyframes visible to find a corresponding point for a new map point (Particularly when the camera moves fast). In that case, MapMaker waits for some keyframes to be added to the queue. Later it runs the data association refinement. The depth of the new map point always depends upon the mean distribution of the depth of other map points inside the keyframe. Finally the new map points (in the code these were called as Candidates) are added to map by the MapMaker. Mapviewer on the other hand uses this map to render and display 3D positions of those map points.