Condensation algorithm

The condensation algorithm (Conditional Density Propagation) is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object recognition. Being able to identify which pixels in an image make up the contour of an object is a non-trivial problem. Condensation is a probabilistic algorithm that attempts to solve this problem.

The algorithm itself is described in detail by Isard and Blake in a publication in the International Journal of Computer Vision in 1998.^[1] One of the most interesting facets of the algorithm is that it does not compute on every pixel of the image. Rather, pixels to process are chosen at random, and only a subset of the pixels end up being processed. Multiple hypotheses about what is moving are supported naturally by the probabilistic nature of the approach. The evaluation functions come largely from previous work in the area and include many standard statistical approaches. The original part of this work is the application of particle filter estimation techniques.

The algorithm’s creation was inspired by the inability of Kalman filtering to perform object tracking well in the presence of significant background clutter. The presence of clutter tends to produce probability distributions for the object state which are multi-modal and therefore poorly modeled by the Kalman filter. The condensation algorithm in its most general form requires no assumptions about the probability distributions of the object or measurements.

Algorithm overview

The condensation algorithm seeks to solve the problem of estimating the conformation of an object described by a vector $\mathbf {x_{t}}$ at time $t$ , given observations $\mathbf {z_{1},...,z_{t}}$ of the detected features in the images up to and including the current time. The algorithm outputs an estimate to the state conditional probability density $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ by applying a nonlinear filter based on factored sampling and can be thought of as a development of a Monte-Carlo method.^[1] $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ is a representation of the probability of possible conformations for the objects based on previous conformations and measurements. The condensation algorithm is a generative model^[2] since it models the joint distribution of the object and the observer.

The conditional density of the object at the current time $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ is estimated as a weighted, time-indexed sample set $\{s_{t}^{(n)},n=1,...,N\}$ with weights $\pi _{t}^{(n)}$ . N is a parameter determining the number of sample sets chosen. A realization of $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ is obtained by sampling with replacement from the set $s_{t}$ with probability equal to the corresponding element of $\pi _{t}$ .^[1]

The assumptions that object dynamics form a temporal Markov chain and that observations are independent of each other and the dynamics facilitate the implementation of the condensation algorithm. The first assumption allows the dynamics of the object to be entirely determined by the conditional density $p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )$ . The model of the system dynamics determined by $p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )$ must also be selected for the algorithm, and generally includes both deterministic and stochastic dynamics.

The algorithm can be summarized by initialization at time $t=0$ and three steps at each time t:

Initialization

Form the initial sample set and weights by sampling according to the prior distribution. For example, specify as Gaussian and set the weights equal to each other.

Iterative procedure

Sample with replacement $N$ times from the set $\{s_{0}^{(n)},n=1,...,N\}$ with probability $\{\pi _{0}^{(n)},n=1,...,N\}$ to generate a realization of $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ .
Apply the learned dynamics $p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )$ to each element of this new set, to generate a new set $\{s_{t}^{(n)}\}$ .
To take into account the current observation $\mathbf {z_{t}}$ , set $\pi _{t}^{(n)}={\frac {p(\mathbf {z_{t}} |s^{(n)})}{\sum _{j=1}^{N}p(\mathbf {z_{t}} |s^{(j)})}}$ for each element $\{s_{t}^{(n)}\}$ .

This algorithm outputs the probability distribution $p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )$ which can be directly used to calculate the mean position of the tracked object, as well as the other moments of the tracked object.

Cumulative weights can instead be used to achieve a more efficient sampling.^[1]

Implementation considerations

Since object-tracking can be a real-time objective, consideration of algorithm efficiency becomes important. The condensation algorithm is relatively simple when compared to the computational intensity of the Ricatti equation required for Kalman filtering. The parameter $N$ , which determines the number of samples in the sample set, will clearly hold a trade-off in efficiency versus performance.

One way to increase efficiency of the algorithm is by selecting a low degree of freedom model for representing the shape of the object. The model used by Isard 1998 is a linear parameterization of B-splines in which the splines are limited to certain configurations. Suitable configurations were found by analytically determining combinations of contours from multiple views, of the object in different poses, and through principal component analysis (PCA) on the deforming object.

Isard and Blake model the object dynamics $p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )$ as a second order difference equation with deterministic and stochastic components: $p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )\propto e^{-{\frac {1}{2}}||B^{-1}((\mathbf {x_{t}} -\mathbf {\bar {x}} )-A(\mathbf {x_{t-1}} -\mathbf {\bar {x}} ))||^{2})}$

where $\mathbf {\bar {x}}$ is the mean value of the state, and $A$ , $B$ are matrices representing the deterministic and stochastic components of the dynamical model respectively. $A$ , $B$ , and $\mathbf {\bar {x}}$ are estimated via Maximum Likelihood Estimation while the object performs typical movements.^[1]^[3]

The observation model $p(\mathbf {z} |\mathbf {x} )$ cannot be directly estimated from the data, requiring assumptions to be made in order to estimate it. Isard 1998 assumes that the clutter which may make the object not visible is a Poisson random process with spatial density $\lambda$ and that any true target measurement is unbiased and normally distributed with standard deviation $\sigma$ .

The basic condensation algorithm is used to track a single object in time. It is possible to extend the condensation algorithm using a single probability distribution to describe the likely states of multiple objects to track multiple objects in a scene at the same time.^[4]

Since clutter can cause the object probability distribution to split into multiple peaks, each peak represents a hypothesis about the object configuration. Smoothing is a statistical technique of conditioning the distribution based on both past and future measurements once the tracking is complete in order to reduce the effects of multiple peaks.^[5] Smoothing cannot be directly done in real-time since it requires information of future measurements.

Applications

The algorithm can be used for vision-based robot localization of mobile robots.^[6] Instead of tracking the position of an object in the scene, however, the position of the camera platform is tracked. This allows the camera platform to be globally localized given a visual map of the environment.

Extensions of the condensation algorithm have also been used to recognize human gestures in image sequences. This application of the condensation algorithm impacts the range of human–computer interactions possible. It has been used to recognize simple gestures of a user at a whiteboard to control actions such as selecting regions of the boards to print or save them.^[7] Other extensions have also been used for tracking multiple cars in the same scene.^[8]

The condensation algorithm has also been used for face recognition in a video sequence.^[9]

Resources

An implementation of the condensation algorithm in C can be found on Michael Isard’s website.

An implementation in MATLAB can be found on the Mathworks File Exchange.

An example of implementation using the OpenCV library can be found on the OpenCV forums.

References

^ ^a ^b ^c ^d ^e Isard, M.; Blake, A (August 1998). "CONDENSATION-- conditional density propagation of visual tracking". International Journal of Computer Vision. 29 (1): 5–28. doi:10.1023/A:1008078328650. S2CID 6821810.
^ Sminchisescu, C.; Kanaujia, A.; Metaxas, D.N. (November 2007). "BM3E: Discriminative Density Propagation for Visual Tracking". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (11): 2030–2044. CiteSeerX 10.1.1.78.1751. doi:10.1109/tpami.2007.1111. PMID 17848782. S2CID 1949783.
^ Blake, Andrea; Isard, Michael; Reynard, David (October 1995). "Learning to track the visual motion of contours". Artificial Intelligence. 78 (1–2): 179–212. doi:10.1016/0004-3702(95)00032-1.
^ Koller-Meier, Esther B.; Ade, Frank (28 February 2001). "Tracking multiple objects using the Condensation algorithm". Robotics and Autonomous Systems. 34 (2–3): 93–105. doi:10.1016/s0921-8890(00)00114-7.
^ Isard, Michael; Blake, Andrew (28 May 2006). "A smoothing filter for condensation". Computer Vision — ECCV'98. Lecture Notes in Computer Science. Vol. 1406. pp. 767–781. doi:10.1007/BFb0055703. ISBN 978-3-540-64569-6.
^ Dellaert, F.; Burgard, W.; Fox, D.; Thrun, S. (1999). "Using the CONDENSATION algorithm for robust, vision-based mobile robot localization". Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). Vol. 2. pp. 588–594. doi:10.1109/CVPR.1999.784976. hdl:1853/21565. ISBN 0-7695-0149-4. S2CID 16130780.
^ Black, M.J.; Jepson, A.D. (14 April 1998). "Recognizing temporal trajectories using the condensation algorithm". Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. pp. 16–21. CiteSeerX 10.1.1.154.1402. doi:10.1109/AFGR.1998.670919. ISBN 0-8186-8344-9. S2CID 5159845.
^ Meier, E.B.; Ade, Frank (1999). "Tracking cars in range images using the CONDENSATION algorithm". Proceedings 199 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems (Cat. No.99TH8383). pp. 129–134. doi:10.1109/ITSC.1999.821040. ISBN 0-7803-4975-X. S2CID 12548469.
^ Zhou, Shaohua; Krueger, V.; Chellappa, R. (21 May 2002). "Face recognition from video: A CONDENSATION approach". Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition. pp. 221–226. doi:10.1109/AFGR.2002.1004158. ISBN 0-7695-1602-5. S2CID 8505547.

[Isard1998-1] Isard, M.; Blake, A (August 1998). "CONDENSATION-- conditional density propagation of visual tracking". International Journal of Computer Vision. 29 (1): 5–28. doi:10.1023/A:1008078328650. S2CID 6821810.

[2] Sminchisescu, C.; Kanaujia, A.; Metaxas, D.N. (November 2007). "BM3E: Discriminative Density Propagation for Visual Tracking". IEEE Transactions on Pattern Analysis and Machine Intelligence. 29 (11): 2030–2044. CiteSeerX 10.1.1.78.1751. doi:10.1109/tpami.2007.1111. PMID 17848782. S2CID 1949783.

[3] Blake, Andrea; Isard, Michael; Reynard, David (October 1995). "Learning to track the visual motion of contours". Artificial Intelligence. 78 (1–2): 179–212. doi:10.1016/0004-3702(95)00032-1.

[4] Koller-Meier, Esther B.; Ade, Frank (28 February 2001). "Tracking multiple objects using the Condensation algorithm". Robotics and Autonomous Systems. 34 (2–3): 93–105. doi:10.1016/s0921-8890(00)00114-7.

[5] Isard, Michael; Blake, Andrew (28 May 2006). "A smoothing filter for condensation". Computer Vision — ECCV'98. Lecture Notes in Computer Science. Vol. 1406. pp. 767–781. doi:10.1007/BFb0055703. ISBN 978-3-540-64569-6.

[6] Dellaert, F.; Burgard, W.; Fox, D.; Thrun, S. (1999). "Using the CONDENSATION algorithm for robust, vision-based mobile robot localization". Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). Vol. 2. pp. 588–594. doi:10.1109/CVPR.1999.784976. hdl:1853/21565. ISBN 0-7695-0149-4. S2CID 16130780.

[7] Black, M.J.; Jepson, A.D. (14 April 1998). "Recognizing temporal trajectories using the condensation algorithm". Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. pp. 16–21. CiteSeerX 10.1.1.154.1402. doi:10.1109/AFGR.1998.670919. ISBN 0-8186-8344-9. S2CID 5159845.

[8] Meier, E.B.; Ade, Frank (1999). "Tracking cars in range images using the CONDENSATION algorithm". Proceedings 199 IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems (Cat. No.99TH8383). pp. 129–134. doi:10.1109/ITSC.1999.821040. ISBN 0-7803-4975-X. S2CID 12548469.

[9] Zhou, Shaohua; Krueger, V.; Chellappa, R. (21 May 2002). "Face recognition from video: A CONDENSATION approach". Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition. pp. 221–226. doi:10.1109/AFGR.2002.1004158. ISBN 0-7695-1602-5. S2CID 8505547.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]