Tuesday, May 14, 2013

Sorting Large Objects


As far as I know, the standard sort, is partially implemented with a merge-sort algorithm which has a O(n log(n)) worst case running time. Quick-sort is also inexpensive with an average running time of O(n log(n)). There is an important detail to remember when sorting large objects: 'std::swap' is called allot for both merge-sort and quick-sort. Below, is a possible implementation of quick-sort. Note the use of "std::swap" in the partition function.

template <class ForwardIterator, typename _Compare>
void __quick_sort(ForwardIterator first, ForwardIterator last, _Compare __comp)
{
    if (std::distance(first, last) <= 1) return;

    ForwardIterator mid = __partition(first, last, __comp);
    ForwardIterator first1 = first;
    ForwardIterator last1 = mid;
    ForwardIterator first2 = mid;
    ForwardIterator last2 = last;
    std::advance(first2, 1);    

    __quick_sort(first1, last1, __comp);
    __quick_sort(first2, last2, __comp);
}

template <class ForwardIterator, typename _Compare>
ForwardIterator __partition(
       ForwardIterator first, ForwardIterator last, _Compare __comp)
{
    ForwardIterator pivot = first;  // randomly choose pivot
    auto idx = rand() % std::distance(first, last);
    std::advance(pivot, idx);
    std::swap(*pivot, *first);
    pivot = first;
    
    ForwardIterator i = first;
    ForwardIterator j = first;
    std::advance(i, 1);
    std::advance(j, 1);
    ForwardIterator i_prev=i;
    for ( ;j!=last; ++j) {
        if (__comp(*j, *pivot)) {
            i_prev = i;
            std::swap(*j, *i);
            std::advance(i, 1);
        }
    }
    std::swap(*pivot, *i_prev);
    return i_prev;
}

Each call to 'std::swap' can potentially mean three copies. This can make sorting large objects stored by value an expensive task. See the std::swap function below.

template <typename T>
void swap( T& x, T& y )
{
    T tmp( std::move(x) ); // a hint to use move constructor
    x = std::move(y);
    y = std::move(tmp);
}

The compiler will use the move constructor (or move assignment) if possible and default to the copy constructor otherwise. For example, if class T does not have a move constructor the copy constructor is used. According to my measurements, copying large objects log n times adds at least 2 orders of overhead to the sorting algorithm. I compared sorting large objects by value against sorting large objects by pointer. Swapping pointers is done in constant time. No move or copy constructors are called. When you do not want to deal with raw pointers, you can use std::unique_ptr which provides similar results to ordinary pointers (std::swap is overloaded for unique_ptr).

        by ptr       :         2155  us  
        by unique_ptr:         2732  us  
        by value     :       791678  us     

Below, you can see the "large object" and the test. The large object is implemented in terms of an array which does not have a move constructor and therefore is copied when std::swap is called.

struct LargeArrayObject {
    LargeArrayObject(int id) : id(id) {}
    int id;
    std::array<int,10000> array;
};

std::vector<LargeArrayObject> byval;
std::vector<LargeArrayObject*> byptr;
std::vector<unique_ptr<LargeArrayObject>> byuniqptr;

for (int i = 0; i < 10000; ++i) {
   int id = rand() % 100000;
   byval.push_back(LargeArrayObject(id));
   byptr.push_back(new LargeArrayObject(id));
   byuniqptr.push_back(unique_ptr<T>(new LargeArrayObject(id)));
}

auto start = std::chrono::high_resolution_clock::now();
quick_sort(byval.begin(), byval.end(), [](const T& a, const T& b) {
   return a.id<b.id; });
//quick_sort(byptr.begin(), byptr.end(), [](const T* a, const T* b) {
//   return a->id<b->id;});
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;

An alternative to using pointers for speeding-up swap operations is to implement the large object class in terms of dynamic memory allocation, effectively making it a small object. One could use a std::vector instead of the std::array:

struct LargeVectorObject {
    LargeVectorObject(int id) : id(id), array(10000, 0) {}
    int id;

    // Swap exchanges the elements between two vectors in constant time.
    std::vector<int> array;
};

Below, you can see the measurements I made using LargeVectorObject instead of LargeArrayObject:


        by ptr       :                2155  us  
        by unique_ptr:                2732  us  
        by value     :                1500  us