As far as I know, the standard sort, is partially implemented with a merge-sort algorithm which has a O(n log(n)) worst case running time. Quick-sort is also inexpensive with an average running time of O(n log(n)). There is an important detail to remember when sorting large objects: 'std::swap' is called allot for both merge-sort and quick-sort. Below, is a possible implementation of quick-sort. Note the use of "std::swap" in the partition function.
template <class ForwardIterator, typename _Compare>
void __quick_sort(ForwardIterator first, ForwardIterator last, _Compare __comp)
{
if (std::distance(first, last) <= 1) return;
ForwardIterator mid = __partition(first, last, __comp);
ForwardIterator first1 = first;
ForwardIterator last1 = mid;
ForwardIterator first2 = mid;
ForwardIterator last2 = last;
std::advance(first2, 1);
__quick_sort(first1, last1, __comp);
__quick_sort(first2, last2, __comp);
}
template <class ForwardIterator, typename _Compare>
ForwardIterator __partition(
ForwardIterator first, ForwardIterator last, _Compare __comp)
{
ForwardIterator pivot = first; // randomly choose pivot
auto idx = rand() % std::distance(first, last);
std::advance(pivot, idx);
std::swap(*pivot, *first);
pivot = first;
ForwardIterator i = first;
ForwardIterator j = first;
std::advance(i, 1);
std::advance(j, 1);
ForwardIterator i_prev=i;
for ( ;j!=last; ++j) {
if (__comp(*j, *pivot)) {
i_prev = i;
std::swap(*j, *i);
std::advance(i, 1);
}
}
std::swap(*pivot, *i_prev);
return i_prev;
}
Each call to 'std::swap' can potentially mean three copies. This can make sorting large objects stored by value an expensive task. See the std::swap function below.
template <typename T>
void swap( T& x, T& y )
{
T tmp( std::move(x) ); // a hint to use move constructor
x = std::move(y);
y = std::move(tmp);
}
The compiler will use the move constructor (or move assignment) if possible and default to the copy constructor otherwise. For example, if class T does not have a move constructor the copy constructor is used. According to my measurements, copying large objects log n times adds at least 2 orders of overhead to the sorting algorithm. I compared sorting large objects by value against sorting large objects by pointer. Swapping pointers is done in constant time. No move or copy constructors are called. When you do not want to deal with raw pointers, you can use std::unique_ptr which provides similar results to ordinary pointers (std::swap is overloaded for unique_ptr).
by ptr : 2155 us
by unique_ptr: 2732 us
by value : 791678 us
Below, you can see the "large object" and the test. The large object is implemented in terms of an array which does not have a move constructor and therefore is copied when std::swap is called.
struct LargeArrayObject {
LargeArrayObject(int id) : id(id) {}
int id;
std::array<int,10000> array;
};
std::vector<LargeArrayObject> byval;
std::vector<LargeArrayObject*> byptr;
std::vector<unique_ptr<LargeArrayObject>> byuniqptr;
for (int i = 0; i < 10000; ++i) {
int id = rand() % 100000;
byval.push_back(LargeArrayObject(id));
byptr.push_back(new LargeArrayObject(id));
byuniqptr.push_back(unique_ptr<T>(new LargeArrayObject(id)));
}
auto start = std::chrono::high_resolution_clock::now();
quick_sort(byval.begin(), byval.end(), [](const T& a, const T& b) {
return a.id<b.id; });
//quick_sort(byptr.begin(), byptr.end(), [](const T* a, const T* b) {
// return a->id<b->id;});
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = end - start;
An alternative to using pointers for speeding-up swap operations is to implement the large object class in terms of dynamic memory allocation, effectively making it a small object. One could use a std::vector instead of the std::array:
struct LargeVectorObject {
LargeVectorObject(int id) : id(id), array(10000, 0) {}
int id;
// Swap exchanges the elements between two vectors in constant time.
std::vector<int> array;
};
Below, you can see the measurements I made using LargeVectorObject instead of LargeArrayObject:
by ptr : 2155 us
by unique_ptr: 2732 us
by value : 1500 us