with in image , e.g. based on computing the cornerness of each pixel (see [5]). At this stage each feature point is typically assigned with some kind of a descriptor , which is used in the second stage for the re-identification of the feature. This descriptor could be a simple local neighbourhood of pixels around or a more abstract descriptor such as the SIFT/SURF descriptors described by [9].
Re-identification—The general task of feature tracking is the successful re-identification of the initial set of features from image in the subsequent frame . Generally this can be described as an optimisation problem where the distance between a descriptor for pixel from and the given descriptor should be minimised by varying within the image boundaries. In most cases the optimisation problem is not just driven by varying the image coordinates, but also by using some kind of a motion model which tries to compensate the change in the descriptors appearance based on an estimation of the cameras movement between and . In order to reduce the computational complexity of the minimisation the range for varying both the pixel coordinates and the motion model parameters are limited to certain search regions. The general procedure of feature tracking is visualised in Fig. 1.
Fig. 1
Re-identification of single feature point in two subsequent frames of an image sequence
As it was shown by Aufderheide et al. [2], there are many ways for a feature tracking method to fail completely or produce a non-negligible number of incorrect matches. This can be clearly seen from a mathematical point of view by the fact that either the optimisation problem converges within a local minimum or not at all.
In Aufderheide et al. [1], we described a general approach for the combination of visual and inertial measurements within a parallel multi-sensory data fusion network for 3D scene reconstruction called VISrec!. Closely related to this work is the adaptation of ideas presented by Hwangbo et al. [6] for using the inertial measurements not only as an aiding modality during the estimation of the cameras egomotion, but also during the feature tracking itself.
The first stage for realising this was the development of an inertial smart sensor system (S 3) based on a bank of inertial measurement units in MEMS1 technology. The S 3 is able to compute the actual absolute camera pose (position and orientation) for each frame. The hardware employed and the corresponding navigation algorithm are described in Sect. 2. As a second step a visual feature tracking algorithm, as described in Sect. 3, needs to be implemented. This algorithm considers prior motion estimates from the inertial S 3 in order to guarantee a greater convergence region of the optimisation problem and deliver an improved overall tracking performance. The results are briefly discussed in Sect. 4. Finally Sect. 5 concludes the whole work and describes potential future work.
2 Inertial Smart Sensor System
For the implementation of an Inertial Fusion Cell (IFC) a smart sensor system (S 3) is suggested here, which is composed as a bank of different micro-electromechanical systems (MEMS). The proposed system contains accelerometers, gyroscopes and magnetometers. All of them are sensory units with three degrees of freedom (DoF). The S 3 contains the sensors itself, signal conditioning (filtering) and a multi-sensor data fusion (MSDF) scheme for pose (position and orientation) estimation.
2.1 General S 3 Architecture
The general architecture of the S 3 is shown in the following Fig. 2, where the overall architecture contains the main ‘organ’ consisting of the sensory units as described in Sect. 2.2. A single micro controller is used for analogue-digital-conversion (ADC), signal conditioning (SC) and the transfer of sensor data to a PC. The actual sensor fusion scheme is realised on the PC.
Fig. 2
General architecture of the inertial S 3
2.2 Hardware
The hardware setup of the S 3 is inspired by the standard configuration of a multi-sensor orientation system (MODS) as defined in [13]. The used system consists of a LY530AL single-axis gyro and a LPR530AL dual-axis gyro both from STMicroelectronics, which are measure the rotational velocities around the three main axis of the inertial coordinate system ICS (see Fig. 3). The accelerations of translational movements are measured by a triple-axis accelerometer ADXL345 from Analog Devices. Finally a 3-DoF magnetometer from Honeywell (HMC5843) is used to measure the earth’s magnetic field. All IMU sensors are connected to a micro controller (ATMega328) which is responsible for initialisation, signal conditioning and communication. The interface between sensor and micro controller is based on -Bus for the accelerometer and magnetometer, while the gyroscope is directly connected to ADC channels of the AVR. So the used sensor setup consists of three orthogonal arranged accelerometers measuring a three dimensional acceleration normalised with the gravitational acceleration constant g. Here b indicates the actual body coordinate system in which the entities are measured. The triple-axis gyroscope measures the corresponding angular velocities around the sensitivity axes of the accelerometers. The magnetometer is used to sense the earth’s magnetic field . Figure 3 shows the general configuration of all sensory units and the corresponding measured entities.
Fig. 3
General architecture of the inertial measurement units and measured entities
2.3 Sensor Modelling and Signal Conditioning
Measurements from MEMS devices in general and inertial MEMS sensors in particular suffer from different error sources. Due to this it is necessary to implement both: an adequate calibration framework and a signal conditioning routine. The calibration of the sensory units is only possible if a reasonable sensor model is available in advance. The sensor model should address all possible error sources. Here the proposed model from [14] was utilised and adapted for the given context. It contains:
Misalignment of sensitivity axes—Ideally the three independent sensitivity axes of each inertial sensor should be orthogonal. Due to imprecise construction of MEMS-based IMUs this is not the case for the vast majority of sensory packages. The misalignment can be compensated by finding a matrix which transforms the non-orthogonal axis to a orthogonal setup.
Biases—The output of the gyroscopes and accelerometers should be exactly zero if the S 3 is not moved at all. However there is typically a time-varying offset for real sensors. It is possible to differentiate g-independent biases (e.g. for gyroscopes) and g-dependent biases. For the latter there is a relation between the applied acceleration and the bias. The bias is modelled by incorporation of a bias vector .
Measurement noise—The general measurement noise has to be taken into account. The standard sensor model contains a white noise term .
Scaling factors—In most cases there is an unknown scaling factor between the measured physical quantity and the real signal. The scaling can be compensated for by introducing a scale matrix .
A block-diagram of the general sensor model is shown in the following figure (Fig. 4).
Fig. 4
General sensor model
Based on this it is possible to define three separate sensor models for all three sensor types2, as shown in the following equations:
It was shown that and can be determined by sensor calibration procedure in which the sensor array is moved to different known locations to determine the calibration parameters. Due to their time-varying character, the noise and bias terms cannot be determined a-priori. The signal conditioning step on the μC takes care of the measurement noise by integrating an FIR digital filter structure. The implementation realises a low-pass FIR filter based on the assumption that the frequencies of the measurement noise are much higher than the frequencies of the signal itself. The complete filter was realised in software on the μC, where the cut-off-frequencies for the different sensory units were determined by an experimental evaluation.
(1)
(2)
(3)
2.4 Basic Principles of Inertial Navigation
Classical approaches for inertial navigation are stable-platform systems which are isolated from any external rotational motion by specialised mechanical platforms. In comparison to those classical stable platform systems, the MEMS sensors are mounted rigidly to the device (here: the camera). In such a strapdown system, it is necessary to transform the measured quantities of the accelerometers, into a global coordinate system by using known orientations computed from gyroscope measurements. In general the mechanis system level operation of a strapdown inertial navigation systems (INS) can be described by the computational elements indicated in Fig. 5. The main problem with this classical framework is that location is determined by integrating measurements from gyros (orientation) and accelerometers (position). Due to superimposed sensor drift and noise, which is especially significant for MEMS devices, the errors for the egomotion estimation tend to grow unbounded.
Fig. 5
Computational elements of an INS
Fig. 6
Drifting error for orientation estimates based on gyroscope measurements, for two separate experiments
The necessary computation of the orientation of the S 3 based on the gyroscope measurements and a start orientation can be described as follows:
The integration of the measured rotational velocities would lead to an unbounded drifting error in the absolute orientation estimates. Figure 6 shows two examples for this typical drifting behaviour for all three Euler angles. For the two experiments shown in Fig. 6, the S 3 was not moved, but even after a short period of time (here: ) there is an absolute orientation error of up to clearly recognisable. For the estimation of the absolute position these problems are even more severe, because the position can be computed from acceleration measurements, in the inertial reference frame , only by double integration:
Possible errors in the orientation estimation stage would lead also to a wrong position, due to the necessity to transform the accelerations in the body coordinate frame to the inertial reference frame (here indicated by the subscript i).
(4)
(5)
The following figure (Fig. 7) demonstrates the typical drifting error for the absolute position (one axis) computed by using the classical strapdown methodology.
Fig. 7
Drifting error for absolute position estimates based on classical strapdown mechanisation of an inertial navigation system (left: acceleration measurements; right: absolute position estimate)
By using only gyroscopes, there is actually no way to control the drifting error for the orientation in a reasonable way. It is necessary to use other information channels. So the final framework for pose estimation considers two steps: an orientation estimation and a position estimation as shown in Fig. 8. In comparison to the classical strapdown method, the suggested approach here incorporates also the accelerometers for orientation estimation. The suggested fusion network is given in the following figure, and the different sub-fusion processes are described in Sects. 2.5 and 2.6.