Hiroshi Inoue  Hiroshi Inoue photo       

contact information

Ph.D., Research Staff Member
IBM Research - Tokyo
  +81dash3dash3808dash5345

links

Professional Associations

Professional Associations:  ACM SIGPLAN  |  IEEE Computer Society  |  Information Processing Society of Japan (IPSJ)


"Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions"
Hiroshi Inoue, Moriyoshi Ohara and Kenjiro Taura
PVLDB 8(3), pp 293-304, to present in 41st International Conference on Very Large Data Bases (VLDB)

Full text [PDF at Publisher web page]: p293-inoue.pdf
Slides [PDF]: VLDB2015_SetIntersection_slides.pdf
Poster [PDF]: SIMDSetIntersection_A0.pdf
Casual explanation (in Japanese): blog post

Abstract
Set intersection is one of the most important operations for many applications such as Web search engines or database management systems. This paper describes our new algorithm to efficiently find set intersections with sorted arrays on modern processors with SIMD instructions and high branch misprediction penalties. Our algorithm efficiently exploits SIMD instructions and can drastically reduce branch mispredictions. Our algorithm extends a merge-based algorithm by reading multiple elements, instead of just one element, from each of two input arrays and compares all of the pairs of elements from the two arrays to find the elements with the same values. The key insight for our improvement is that we can reduce the number of costly hard-to-predict conditional branches by advancing a pointer by more than one element at a time. Although this algorithm increases the total number of comparisons, we can execute these comparisons more efficiently using the SIMD instructions and gain the benefits of the reduced branch misprediction overhead. Our algorithm is suitable to replace existing standard library functions, such as std::set_intersection in C++, thus accelerating many applications, because the algorithm is simple and requires no preprocessing to generate additional data structures. We implemented our algorithm on Xeon and POWER7+ with and without using SIMD instructions. The experimental results show our algorithm outperforms the std::set_intersection implementation delivered with gcc by up to 5.2x using SIMD instructions and by up to 2.1x even without using SIMD instructions for 32-bit and 64-bit integer datasets. Our SIMD algorithm also outperformed an existing algorithm that can leverage SIMD instructions.