Motion Estimation Without Integer-Pel Search
IEEE Transactions on Image Processing，2012，22（4）：1340 - 135 | 2012年11月20日 | 10.1109/TIP.2012.2228495
The typical motion estimation (ME) consists of three main steps, including spatial-temporal prediction, integer-pel search, and fractional-pel search. The integer-pel search, which seeks the best matched integer-pel position within a search window, is considered to be crucial for video encoding. It occupies over 50% of the overall encoding time (when adopting the full search scheme) for software encoders, and introduces remarkable area cost, memory traffic, and power consumption to hardware encoders. In this paper, we find that video sequences (especially high-resolution videos) can often be encoded effectively and efficiently even without integer-pel search. Such counter-intuitive phenomenon is not only because that spatial-temporal prediction and fractional-pel search are accurate enough for the ME of many blocks. In fact, we observe that when the predicted motion vector is biased from the optimal motion vector (mainly for boundary blocks of irregularly moving objects), it is also hard for integer-pel search to reduce the final rate-distortion cost: the deviation of reference position could be alleviated with the fractional-pel interpolation and rate-distortion optimization techniques (e.g., adaptive macroblock mode). Considering the decreasing proportion of boundary blocks caused by the increasing resolution of videos, integer-pel search may be rather cost-ineffective in the era of high-resolution. Experimental results on 36 typical sequences of different resolutions encoded with x264, which is a widely-used video encoder, comply with our analysis well. For 1080p sequences, removing the integer-pel search saves 57.9% of the overall H.264 encoding time on average (compared to the original x264 with full integer-pel search using default parameters), while the resultant performance loss is negligible: the bit-rate is increased by only 0.18%, while the peak signal-to-noise ratio is decreased by only 0.01 dB per frame averagely.