Maxim Arbuzov: Merge sort with O(sqrt(n)) auxiliary memory complexity (and even less)

In this text an algorithm is described that merges two sorted arrays into one sorted array. Time complexity is O(n), auxiliary memory complexity is O(sqrt(n)), and stable. If this algorithm is used for merge sort, then resulting sorting algorithm will be stable, have O(nlog(n)) time complexity and have O(sqrt(n)) auxiliary memory (and even less for some cases).

I hope the reader knows how classical merge sort works. This knowledge is necessary to understand this text better; the described algorithm is a modification of array merging algorithm that is used as a part for merge sort.

The essential concept of this algorithm is to reuse memory that are being released during merging of sub-arrays A and B. And in order to handle that memory, array is divided into pages of equal size and page table is used. While any action with page – like filling, copying, moving, swapping – do updates of entries of the page table. Moving (copying, swapping) of pages means moving data in those pages – pages themselves are just coordinates and they keep their initial location.

Now the description of the algorithm

1. The function receives array or array interval that contains two sorted sub-arrays A и B, temporary buffer 3 pages in size, page table, page size, and parameters that indicate boundary and size of sub-arrays A and B. Temporary buffer and page table are external towards algorithm and created before launching sorting.

2. Divide array into pages. As a result array, pages, buffer and page table will look as below.

3. Find a free page and perform merging of sub-arrays A and B in to that page until the page is filled or elements from sub-arrays A and B run out. After the page is filled do update in page table, specifically the page position where it would be in the finally sorted array, it means where to move the page later.

Merging is a process used in classical merge sort, it is performed by the same rules.

A free page can be found by analyzing current element index and free space starting index in sub-arrays, A first then B.

4. Repeat step 3 while there are elements in sub-arrays A and B.

At the beginning there are three free pages in the buffer. They will be filled first. After they are filled another free pages will appear in sub-arrays A and B. See figure below.

Pages that become free are filled too. Somewhere in the middle of algorithm working right after filling the next page, data will look as below.

After all elements from sub-arrays A and B are merged, data will look like below.

5. Move the last filled page (number 9 on the figure above) in to the end of the array – that page is partly filled in general but takes entire page. That page is needed to be placed in final position (and forget) in order to simplify working with rest pages. The data will look like as on the figure below.

Move three pages from the buffer into three free pages in the array (pages 4, 5, 9 are free on the picture above), it is also to simplify further working. It is not necessary – without that it might become a bit faster but more complex. While moving pages do updating entries in the page table. And the data will look like below.

6. Now rearrange pages to place them into correct position. To do it inspect all entries in the page table from the beginning to the end. Read page number where the current page should move. If that number equals to current page number, then it means the page is in correct position and no need to move it, go to the next entry in page table. If those number differ, then swap current page with destination page and update entries in page table. After that compare page numbers again, swap pages if needed and so on.

Step-by-step process of rearranging pages is shown on the example below (10 figures total).

After step 5, just before rearranging pages (below)

Pages 1 and 4 swapped (below)

Page 1 is in correct position, go to the next entry (below)

Pages 2 and 6 swapped (below)

Pages 2 and 5 swapped (below)

Page 2 is in correct position, go to the next entry (below)

Pages 3 and 7 swapped (below)

Pages 3 and 8 swapped (below)

Pages 3 and 9 swapped (below)

Page 3 is in correct position, go to the next entry (below)

And so on until all entries in page table are inspected.

Auxiliary memory complexity

Memory is required for three pages in the buffer and page table. If array has length of n elements and page length is p elements, then page table will contain n/p entries. Necessary additional memory will be O(p + n/p). In order to minimize memory complexity assume page size p = sqrt(n*Size_E /(3*Size_T)) + 1, where Size_T – size of one array element, Size_E – size of one page table entry, 3 – count of pages in the buffer, +1 is for p > 0. Then auxiliary memory size will be O(sqrt(n) + n/sqrt(n)) = O(sqrt(n)).

The description of algorithm ends here. There will be some modifications further.

How to reduce auxiliary memory complexity to O(log(n))

Some memory is needed for page table that stores page related information. In some cases such information can be stored inside pages themselves. It is due to pages may contain redundant information. For example, if pages contain sorted numbers, then almost all pages contain numbers of the same sign and sign can be used to store additional information.

Another example is pointers. If the array contains pointers and the pointers are aligned (usually it is true), then several less significant bits can be used for storing additional information. In case of 64-bit pointers some most significant bits can be used too.

Another way to store additional information is swapping elements. It is simple if all elements are different, but complicated if some elements are equal to each other.

How much is auxiliary memory needed if storing information into pages?

A page of size p can store c*p bits of extra information. It is enough to handle 2^(cp/k) pages, where value c is determined by the method of storing extra information, and value k determined by the amount of necessary information for each page. It yields the possible array length up to n = p*2^(cp/k), following log(n) = log(p) + cp/k, or approximately p = log(n)*k/c. This value of p is the lowest page size. Considering the sorting algorithm requires 3-page buffer and page table is no more needed, it gives auxiliary memory complexity of O(3*log(n)*k/c) = O(log(n)).

This variant of the algorithm is not universal (except the approach of swapping elements, but it is complex to implement). There are data limitations, also there might be architecture liimitations and language limitations. Before run this variant it is necessary to check whether the data are suitable, for example check less significant bits of every pointer in the array.

A bit more memory to reduce

Similar to storing extra information in pages, it is possible to store some array elements into another elements. Saying other words – to pack data in the array, then released memory can be used as a buffer. Auxiliary memory complexity will be O(1) in this variant. There are more limitations to this variant – the array should be large enough in order to release enough memory for the buffer. Also there are data type limitations – number are not more suitable because the is not sorted. This variant is suitable for data elements that have unused bits, for example aligned pointers or narrowed range integers. The data should be checked before run this variant.

Some modifications

There are some improvements and modifications to this algorithm. Below are some of them.

1. If sub-array A is aligned to page size, then two pages for the buffer will be enough instead of three pages. Saying other words – by adjusting the size of sub-array A it is possible to reduce auxiliary memory by sqrt(3/2) times. Optimal page size will increase by sqrt(3/2) times, that will reduce time spending to handle pages.

2. It is obvious that page numbers are not needed to store somewhere, because page numbers are determined by pages location and can be calculated easily. Page numbers start from 0 (zero-based), not as shown on the figures.

3. If sub-array A entirely fits into the buffer, then use modified classic merging (which requires n/2 auxiliary memory); it means move sub-array A into the buffer, then do merging of A and B into the array. The similar is applicable to sub-array B, but a bit more complex to implement.

4. If one of the sub-arrays A or B has run out of elements, then merging can be stopped. To process such cases the algorithm will become more complex.

5. When swapping pages in step 6 of the algorithm check that destination page numbers differ. If they equal it means that two pages points to the same final position that is incorrect. Another way is to limit the number of page swaps – if number of swaps higher than expected limit, then throw an exception. Exception is better than hanging.

6. Merging of sub-arrays A and B can be done directly into final position. To do it release a destination page by moving unprocessed elements from sub-array A (possibly from sub-array B too) into a free page, then perform merging into destination page. Page table will need to store numbers of the next and the previous pages. In this variant there is no need to rearrange pages in step 6. Elements of sub-array B moves only once instead of two times, it will add some performance.

7. During rearranging pages (step 6 of the algorithm) make a chain of moves until meet free page, then move pages to free pages, starting from the last page in the chain. The chain of moves can be stored in one of buffer pages, that will be free to this moment, in case data type allows this.

If the chain makes a loop, then the loop is broken by moving any page into the buffer, for example the page where looping has been detected. Koop length is analyzed – if the loop is quite short then it is more optimal to perform page swaps instead of moving to free space. For example, if the loop contains 2 pages then it is faster to just swap that pages.

If the chain is too long then the chain is broken by moving any page into the buffer, for example, the page, that is in the end of the chain. After arranging pages in the chain, the page in the buffer is moved to free space, or make another chain starting from that page.

8. In the page table it may be also stored the page number that will move to the current page. In this case when rearranging pages (step. 6 of the algorithm) find free pages and move into them the pages that should be there. It means, there will be copying of pages into free space instead of swapping. It will allow not to make the chain of moves from the previous point. If there are no free pages in the array, then free the page that is not in the final position by moving it into the buffer.

Modifications 6, 7, 8 might improve speed if copying of pages into free space is faster than swapping of pages.

References.

1. Merge sort algorithm. https://en.wikipedia.org/wiki/Merge_sort

2. Storing extra data in pointers. https://en.wikipedia.org/wiki/Tagged_pointer

Maxim Arbuzov

понедельник, 18 октября 2021 г.

Merge sort with O(sqrt(n)) auxiliary memory complexity (and even less)

Комментариев нет:

Отправить комментарий

понедельник, 18 октября 2021 г.

Merge sort with O(sqrt(n)) auxiliary memory complexity (and even less)

Комментариев нет:

Отправить комментарий

понедельник, 18 октября 2021 г.