Sensor or camera networks will play an important role in future applications, from surveillance tasks for workplace safety or security in general, over driver assisting systems in automotive and last but not least intelligent homes or assisted living for the elderly. Computer vision in sensor or camera networks defines a couple of currently unsolved problems. First of all, how can we calibrate cameras distributed arbitrarily in the scene without placing artificial or natural calibration patterns in the scene? Second, how do we select and fuse the information provided by different, also multimodal sensors to solve a given problem? Finally, can we handle reconstruction, recognition and tracking tasks in complex and highly dynamic natural scenes which are in almost all cases the environment camera networks are designed for?

