Jesal Patel

Understanding Go's Sort Implementation: pdqsort

I remember learning that we shouldn’t typically implement and use our own sorting algorithms unless we have a good reason. There are plenty of reasons for this such as performance, reliability, and simply not creating unnecessary code to maintain. It did make me wonder what the sorting implementations we use on a daily basis actually look like. Some common options seem to be timsort and pattern-defeating quicksort (pdqsort)1. Go’s standard library currently uses pdqsort so I would like to walk through and understand Go’s implementation.

Pattern-defeating quicksort is an unstable sorting algorithm that dynamically makes use of modified versions of quicksort, heapsort, and insertion sort in various cases. The algorithm was created by Orson Peters and is based on introsort.

Function Structure

Let’s take a look at what happens when we call slices.Sort(x)2. The first thing that happens is we encounter a helper function that calls the pdqsort function with its required arguments:

func Sort[S ~[]E, E cmp.Ordered](x S) {
    n := len(x)
    // pdqsortOrdered[E cmp.Ordered](data []E, a, b, limit int)
    pdqsortOrdered(x, 0, n, bits.Len(uint(n)))
}

The function call itself looks pretty straightforward. The a and b parameters are the start (inclusive) and end (exclusive) of the slice. The limit in this context is the upper bound for the amount of bad pivots allowed before the algorithm falls back to a different strategy. Go is using the amount of bits it takes to represent the length of our input slice to determine the limit. This is just ceil(log2(n+1)). This seems like a logical approach since that amount of good pivots would have likely produced a sorted array.

The structure of the actual pdqsortOrdered function looks like this:

func pdqsortOrdered[E cmp.Ordered](data []E, a, b, limit int) {
    const maxInsertion = 12
    var (
        wasBalanced    = true // whether the last partitioning was reasonably balanced
        wasPartitioned = true // whether the slice was already partitioned
    )

    for {
        // ...
    }
}

The function sets up some variables that it will use to make decisions during each loop. The maxInsertion constant is used to determine if the length of the slice is small enough that insertion sort will be used. This is a good choice because insertion sort is highly efficient for small arrays. The rest of the function is just the loop that will do the sorting. One thing to keep in mind that we won’t see until later in the implementation is that the function recursively calls itself for one partition and handles the remaining partition iteratively in the next loop.

Base Case

// ... (pdqsortOrdered func)
for {
    length := b - a
    if length <= maxInsertion {
        insertionSortOrdered(data, a, b)
        return
    }
    // ...
}

The first check in the loop is to check if the current length is small enough for insertion sort, and if so then the sorting of this portion is finished. This serves as the base case for the recursion that we’ll see later on in the implementation.

Heapsort Fallback

// ... (pdqsortOrdered func)
if limit == 0 {
    heapSortOrdered(data, a, b)
    return
}

Next the algorithm checks to see if it has reached the limit for poor partitions in previous iterations of the loop, and if so, it will fall back to heapsort for the rest of the sorting. Regardless of the pattern that is causing the algorithm to partition poorly, heapsort will provide guaranteed O(nlogn) performance regardless of any patterns in the input.

Pattern Breaking

An interesting property of this algorithm is that it does attempt to break up patterns in the input when it encounters imbalance. This is the next step, if there was an imbalance in the previous partition.

// ... (pdqsortOrdered func)
if !wasBalanced {
    breakPatternsOrdered(data, a, b)
    limit--
}

Let’s take a quick look at how the breakPatternsOrdered function:

// ... (pdqsortOrdered func)
func breakPatternsOrdered[E cmp.Ordered](data []E, a, b int) {
    length := b - a
    if length >= 8 {
        random := xorshift(length)
        modulus := nextPowerOfTwo(length)

        for idx := a + (length/4)*2 - 1; idx <= a+(length/4)*2+1; idx++ {
            other := int(uint(random.Next()) & (modulus - 1))
            if other >= length {
             other -= length
            }
            data[idx], data[a+other] = data[a+other], data[idx]
        }
    }
}

We can see that they only try to break the pattern if the current length is larger than 8. It’s likely not worth spending the compute to break the pattern on very small inputs. In this context, we can think of the random variable as a fast and efficient random number generator3. modulus is used as a bitmask to keep the randomly generated number within a specific range.

The loop itself iterates three times over values near the center of the slice. It’s possible for the randomly generated other variable to be outside of the bounds of the subarray since modulus uses the next power of two. In these cases, the length is just subtracted so that other can be used to select a random position in the array to swap with one of the center ones.

Pivot Selection

// ... (pdqsortOrdered func)
pivot, hint := choosePivotOrdered(data, a, b)

Now we’re at the point where we can select a pivot for partitioning the current subarray. While doing so, the algorithm tries to determine if it can get a hint to whether the subarray is decreasing or increasing. Let’s see how this is done.

func choosePivotOrdered[E cmp.Ordered](data []E, a, b int) (pivot int, hint sortedHint) {
    const (
        shortestNinther = 50
        maxSwaps        = 4 * 3
    )

    l := b - a
    var (
        swaps int
        i     = a + l/4*1
        j     = a + l/4*2
        k     = a + l/4*3
    )
    // ...
}

This algorithm is aiming to select a pivot with different strategies based on the slice length. The shortestNinther constant is the threshold for when Tukey’s ninther is used. Tukey’s ninther basically takes three groups of three, finds the median of each group, and then finds the median of those medians.

The maxSwaps constant is used to help the algorithm determine if we have an increasing or decreasing hint. Finding the median of three values can be done by swapping the values into their ordered positions and this will need at most three swaps. Since we find the median of three groups, and then find the median of that result we find the median four times which is where the maxSwaps = 4 * 3 comes from.

The algorithm selects three indices which will be our groups. i at around the 25% mark, j at around the 50% mark, and k at around the 75% mark.

// ... (choosePivotOrdered func)
if l >= 8 {
    if l >= shortestNinther {
        i = medianAdjacentOrdered(data, i, &swaps)
        j = medianAdjacentOrdered(data, j, &swaps)
        k = medianAdjacentOrdered(data, k, &swaps)
    }
    j = medianOrdered(data, i, j, k, &swaps)
}

Next, we see that we’re only going to bother finding any of the medians if the subarray length is 8 or larger. If the subarray is larger than shortestNinther (50)4, then it will do what we mentioned earlier where it finds the median of the three selected groups. medianAdjacentOrdered is just finding the median of [data[i-1], data[i], data[i+1]]. If the length is in [8, 50) then we won’t spend time the extra time finding using three groups and we’ll just find the median of the three initially selected indices.

// ... (choosePivotOrdered func)
switch swaps {
case 0:
    return j, increasingHint
case maxSwaps:
    return j, decreasingHint
default:
    return j, unknownHint
}

The last part of the function is to return the pivot and any hint. The algorithm provides an increasing hint if it never had to make any swaps when finding the medians meaning that all the values checked were already in increasing order. For a decreasing hint, it means that all the values checked were in decreasing order meaning that all the values had to be swapped. Swaps in [1, 11] will be an unknown hint. It’s interesting to note that the algorithm will only ever provide a decreasing hint if it has a subarray length at the shortestNinther threshold and does full Tukey’s ninther. The increasing hint can potentially be given for any lengths greater than or equal to 8 and will always be given for lengths less than 8.

Using Hints

Now that a pivot has been selected we can head back to the main loop of the pdqsort function:

// ... (pdqsortOrdered func)
pivot, hint := choosePivotOrdered(data, a, b)
if hint == decreasingHint {
    reverseRangeOrdered(data, a, b)
    pivot = (b - 1) - (pivot - a)
    hint = increasingHint
}

The first thing the algorithm does after finding the pivot is check the decreasing hint, and if there is one, make the necessary changes to be able to treat it like an increasing hint. To do this it reverses the subarray, adjusts the pivot to its new position after the reversal, and changes the hint to increasing.

// ... (pdqsortOrdered func)
if wasBalanced && wasPartitioned && hint == increasingHint {
    if partialInsertionSortOrdered(data, a, b) {
        return
    }
}

Next, the algorithm will make use of the hint it found, if any. If the previous partition was balanced, partitioned, and it found an increasing hint then it will run a modified insertion sort since it seems like there’s a decent chance the subarray is already sorted. The insertion sort is modified in a couple simple ways. If the function results in a fully sorted array, then it returns true, otherwise false. It will attempt to sort the array, but if it encounters 5 out of position elements then it will stop and return false. This early stopping makes sense since we don’t want to waste resources using insertion sort when it turned out the array wasn’t nearly sorted. The modified insertion sort will also not do any sorting and return false if the provided subarray has less than 50 elements. If the modified insertion sort was able to find/produce a fully sorted array then it goes and returns from this branch early.

Optimizing Duplicates

// ... (pdqsortOrdered func)
if a > 0 && !cmp.Less(data[a-1], data[pivot]) {
    mid := partitionEqualOrdered(data, a, b, pivot)
    a = mid
    continue
}

This next part is an optimization first implemented in pdqsort. The idea is that we can peek the element directly to the left of the current subarray and then that value being greater than or equal to the pivot suggests that the current subarray may contain many duplicates. Since quicksort is based on finding the final position of a pivot in a subarray we know that there won’t actually be values larger than the pivot position to the left of the current subarray. Finding that the element to the left and our pivot are the same value is what ultimately signals to us that there are likely many duplicates here (there may not be that many duplicates if the pivot was selected poorly). This optimization can save significant time/resources by making it so that we don’t have to recurse on the left portion of the subarray after partitioning since all those values are equal to the pivot value and are therefore already sorted!

The code above shows that it skips recursing on the left portion after partitioning. After the partition, it just updates the starting point and continues to the next loop to handle the right portion.

Partitioning

// ... (pdqsortOrdered func)
mid, alreadyPartitioned := partitionOrdered(data, a, b, pivot)
wasPartitioned = alreadyPartitioned

The partitioning used is a straight forward implementation of the Hoare partition scheme, but also returns a boolean indicating true if no swaps were needed. As we saw before, this info will be useful in the next iteration as it contributes to whether we want to assume that a slice may already be sorted.

Recursing

// ... (pdqsortOrdered func)
leftLen, rightLen := mid-a, b-mid
balanceThreshold := length / 8
if leftLen < rightLen {
    wasBalanced = leftLen >= balanceThreshold
    pdqsortOrdered(data, a, mid, limit)
    a = mid + 1
} else {
    wasBalanced = rightLen >= balanceThreshold
    pdqsortOrdered(data, mid+1, b, limit)
    b = mid
}
// end of loop

This is the final block of code that makes up pdqsort. The main point of this structure is to recursively handle the smaller partition and handle the larger partition in the next iteration. We also get to see how the algorithm determines if a partition was balanced or not which is the metric that’s important to determine if we should fall back to heap sort. In this case it’s whether the smaller partition was at least 1/8th the size of current subarray.

Conclusion

I think it was worthwhile and enjoyable walking through and understanding Go’s implementation of pdqsort. It helped that the Go standard library is very well written and that Go itself is easy to understand. I really enjoyed Tukey’s ninther in pivot selection (plus the clever increasing/decreasing hints) and the optimization that saves a left recursive step when there may be many duplicates. It’s interesting seeing the optimizations, strategies, but also simplicity that goes into a sorting algorithm robust enough to use in a popular language’s standard library.

  1. Rust recently updated to use ipnsort for its unstable sort which was developed with pdqsort as a starting point and was co-authored by Orson Peters, who authored pdqsort ↩︎
  2. The Go standard library seems to call slices.Sort under the hood if you use the sort package’s sort.Ints function ↩︎
  3. You can read this paper if you’re interested in learning more about xorshift ↩︎
  4. If you check out other implementations you’ll find that it varies what the constants throughout the entire algorithm are set to ↩︎