Selection.java

  1. /*
  2.  * Licensed to the Apache Software Foundation (ASF) under one or more
  3.  * contributor license agreements.  See the NOTICE file distributed with
  4.  * this work for additional information regarding copyright ownership.
  5.  * The ASF licenses this file to You under the Apache License, Version 2.0
  6.  * (the "License"); you may not use this file except in compliance with
  7.  * the License.  You may obtain a copy of the License at
  8.  *
  9.  *      http://www.apache.org/licenses/LICENSE-2.0
  10.  *
  11.  * Unless required by applicable law or agreed to in writing, software
  12.  * distributed under the License is distributed on an "AS IS" BASIS,
  13.  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14.  * See the License for the specific language governing permissions and
  15.  * limitations under the License.
  16.  */

  17. package org.apache.commons.numbers.arrays;

  18. /**
  19.  * Select indices in array data.
  20.  *
  21.  * <p>Arranges elements such that indices {@code k} correspond to their correctly
  22.  * sorted value in the equivalent fully sorted array. For all indices {@code k}
  23.  * and any index {@code i}:
  24.  *
  25.  * <pre>{@code
  26.  * data[i < k] <= data[k] <= data[k < i]
  27.  * }</pre>
  28.  *
  29.  * <p>Examples:
  30.  *
  31.  * <pre>
  32.  * data    [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
  33.  *
  34.  *
  35.  * k=4   : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7]
  36.  * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7]
  37.  * </pre>
  38.  *
  39.  * <p>This implementation can select on multiple indices and will handle duplicate and
  40.  * unordered indices. The method detects ordered indices (with or without duplicates) and
  41.  * uses this during processing. Passing ordered indices is recommended if the order is already
  42.  * known; for example using uniform spacing through the array data, or to select the top and
  43.  * bottom {@code n} values from the data.
  44.  *
  45.  * <p>A quickselect adaptive method is used for single indices. This uses analysis of the
  46.  * partition sizes after each division to update the algorithm mode. If the partition
  47.  * containing the target does not sufficiently reduce in size then the algorithm is
  48.  * progressively changed to use partitions with guaranteed margins. This ensures a set fraction
  49.  * of data is eliminated each step and worse-case linear run time performance. This method can
  50.  * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of
  51.  * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at
  52.  * the edge of the partition bounded by {@code ka}.
  53.  *
  54.  * <p>Multiple keys are partitioned collectively using an introsort method which only recurses
  55.  * into partitions containing indices. Excess recursion will trigger use of a heapselect
  56.  * on the remaining range of indices ensuring non-quadratic worse case performance. Any
  57.  * partition containing a single index, adjacent pair of indices, or range of indices with a
  58.  * small separation will use the quickselect adaptive method for single keys. Note that the
  59.  * maximum number of times that {@code n} indices can be split is {@code n - 1} before all
  60.  * indices are handled as singles.
  61.  *
  62.  * <p>Floating-point order
  63.  *
  64.  * <p>The {@code <} relation does not impose a total order on all floating-point values.
  65.  * This class respects the ordering imposed by {@link Double#compare(double, double)}.
  66.  * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is
  67.  * considered greater than any other value; and all {@code Double.NaN} values are
  68.  * considered equal.
  69.  *
  70.  * <p>References
  71.  *
  72.  * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n}
  73.  * using repeat division of the data around a partition element, recursing into the
  74.  * partition that contains {@code k}.
  75.  *
  76.  * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in
  77.  * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst
  78.  * case bound.
  79.  *
  80.  * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is
  81.  * performed in the SELECT algorithm of Floyd and Rivest [3, 4].
  82.  *
  83.  * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses
  84.  * the median of medians as a partition element for selection which ensures a minimum fraction of
  85.  * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice
  86.  * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of
  87.  * Alexandrescu [6].
  88.  *
  89.  * <ol>
  90.  * <li>Hoare (1961)
  91.  * Algorithm 65: Find
  92.  * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a>
  93.  * <li>Musser (1999)
  94.  * Introspective Sorting and Selection Algorithms
  95.  * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23">
  96.  * Software: Practice and Experience 27, 983-993.</a>
  97.  * <li>Floyd and Rivest (1975)
  98.  * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements.
  99.  * Comm. ACM. 18 (3): 173.
  100.  * <li>Kiwiel (2005)
  101.  * On Floyd and Rivest's SELECT algorithm.
  102.  * Theoretical Computer Science 347, 214-238.
  103.  * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
  104.  * Time bounds for selection.
  105.  * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9">
  106.  * Journal of Computer and System Sciences. 7 (4): 448–461</a>.
  107.  * <li>Alexandrescu (2016)
  108.  * Fast Deterministic Selection
  109.  * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.
  110.  * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a>
  111.  * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a>
  112.  * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a>
  113.  * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a>
  114.  * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a>
  115.  * </ol>
  116.  *
  117.  * @since 1.2
  118.  */
  119. public final class Selection {

  120.     /** No instances. */
  121.     private Selection() {}

  122.     /**
  123.      * Partition the array such that index {@code k} corresponds to its correctly
  124.      * sorted value in the equivalent fully sorted array.
  125.      *
  126.      * @param a Values.
  127.      * @param k Index.
  128.      * @throws IndexOutOfBoundsException if index {@code k} is not within the
  129.      * sub-range {@code [0, a.length)}
  130.      */
  131.     public static void select(double[] a, int k) {
  132.         IndexSupport.checkIndex(0, a.length, k);
  133.         doSelect(a, 0, a.length, k);
  134.     }

  135.     /**
  136.      * Partition the array such that indices {@code k} correspond to their correctly
  137.      * sorted value in the equivalent fully sorted array.
  138.      *
  139.      * @param a Values.
  140.      * @param k Indices (may be destructively modified).
  141.      * @throws IndexOutOfBoundsException if any index {@code k} is not within the
  142.      * sub-range {@code [0, a.length)}
  143.      */
  144.     public static void select(double[] a, int[] k) {
  145.         IndexSupport.checkIndices(0, a.length, k);
  146.         doSelect(a, 0, a.length, k);
  147.     }

  148.     /**
  149.      * Partition the array such that index {@code k} corresponds to its correctly
  150.      * sorted value in the equivalent fully sorted array.
  151.      *
  152.      * @param a Values.
  153.      * @param fromIndex Index of the first element (inclusive).
  154.      * @param toIndex Index of the last element (exclusive).
  155.      * @param k Index.
  156.      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
  157.      * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
  158.      * sub-range {@code [fromIndex, toIndex)}
  159.      */
  160.     public static void select(double[] a, int fromIndex, int toIndex, int k) {
  161.         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
  162.         IndexSupport.checkIndex(fromIndex, toIndex, k);
  163.         doSelect(a, fromIndex, toIndex, k);
  164.     }

  165.     /**
  166.      * Partition the array such that indices {@code k} correspond to their correctly
  167.      * sorted value in the equivalent fully sorted array.
  168.      *
  169.      * @param a Values.
  170.      * @param fromIndex Index of the first element (inclusive).
  171.      * @param toIndex Index of the last element (exclusive).
  172.      * @param k Indices (may be destructively modified).
  173.      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
  174.      * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
  175.      * sub-range {@code [fromIndex, toIndex)}
  176.      */
  177.     public static void select(double[] a, int fromIndex, int toIndex, int[] k) {
  178.         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
  179.         IndexSupport.checkIndices(fromIndex, toIndex, k);
  180.         doSelect(a, fromIndex, toIndex, k);
  181.     }

  182.     /**
  183.      * Partition the array such that index {@code k} corresponds to its correctly
  184.      * sorted value in the equivalent fully sorted array.
  185.      *
  186.      * <p>This method pre/post-processes the data and indices to respect the ordering
  187.      * imposed by {@link Double#compare(double, double)}.
  188.      *
  189.      * @param fromIndex Index of the first element (inclusive).
  190.      * @param toIndex Index of the last element (exclusive).
  191.      * @param a Values.
  192.      * @param k Index.
  193.      */
  194.     private static void doSelect(double[] a, int fromIndex, int toIndex, int k) {
  195.         if (toIndex - fromIndex <= 1) {
  196.             return;
  197.         }
  198.         // Sort NaN / count signed zeros.
  199.         // Caution: This loop contributes significantly to the runtime.
  200.         int cn = 0;
  201.         int end = toIndex;
  202.         for (int i = toIndex; --i >= fromIndex;) {
  203.             final double v = a[i];
  204.             // Count negative zeros using a sign bit check
  205.             if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
  206.                 cn++;
  207.                 // Change to positive zero.
  208.                 // Data must be repaired after selection.
  209.                 a[i] = 0.0;
  210.             } else if (v != v) {
  211.                 // Move NaN to end
  212.                 a[i] = a[--end];
  213.                 a[end] = v;
  214.             }
  215.         }

  216.         // Partition
  217.         if (end - fromIndex > 1 && k < end) {
  218.             QuickSelect.select(a, fromIndex, end - 1, k);
  219.         }

  220.         // Restore signed zeros
  221.         if (cn != 0) {
  222.             // Use partition index below zero to fast-forward to zero as much as possible
  223.             for (int j = a[k] < 0 ? k : -1;;) {
  224.                 if (a[++j] == 0) {
  225.                     a[j] = -0.0;
  226.                     if (--cn == 0) {
  227.                         break;
  228.                     }
  229.                 }
  230.             }
  231.         }
  232.     }

  233.     /**
  234.      * Partition the array such that indices {@code k} correspond to their correctly
  235.      * sorted value in the equivalent fully sorted array.
  236.      *
  237.      * <p>This method pre/post-processes the data and indices to respect the ordering
  238.      * imposed by {@link Double#compare(double, double)}.
  239.      *
  240.      * @param fromIndex Index of the first element (inclusive).
  241.      * @param toIndex Index of the last element (exclusive).
  242.      * @param a Values.
  243.      * @param k Indices (may be destructively modified).
  244.      */
  245.     private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) {
  246.         if (k.length == 0 || toIndex - fromIndex <= 1) {
  247.             return;
  248.         }
  249.         // Sort NaN / count signed zeros.
  250.         // Caution: This loop contributes significantly to the runtime for single indices.
  251.         int cn = 0;
  252.         int end = toIndex;
  253.         for (int i = toIndex; --i >= fromIndex;) {
  254.             final double v = a[i];
  255.             // Count negative zeros using a sign bit check
  256.             if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
  257.                 cn++;
  258.                 // Change to positive zero.
  259.                 // Data must be repaired after selection.
  260.                 a[i] = 0.0;
  261.             } else if (v != v) {
  262.                 // Move NaN to end
  263.                 a[i] = a[--end];
  264.                 a[end] = v;
  265.             }
  266.         }

  267.         // Partition
  268.         int n = 0;
  269.         if (end - fromIndex > 1) {
  270.             n = k.length;
  271.             // Filter indices invalidated by NaN check
  272.             if (end < toIndex) {
  273.                 for (int i = n; --i >= 0;) {
  274.                     final int index = k[i];
  275.                     if (index >= end) {
  276.                         // Move to end
  277.                         k[i] = k[--n];
  278.                         k[n] = index;
  279.                     }
  280.                 }
  281.             }
  282.             // Return n, the count of used indices in k.
  283.             // Use this to post-process zeros.
  284.             n = QuickSelect.select(a, fromIndex, end - 1, k, n);
  285.         }

  286.         // Restore signed zeros
  287.         if (cn != 0) {
  288.             // Use partition indices below zero to fast-forward to zero as much as possible
  289.             int j = -1;
  290.             if (n < 0) {
  291.                 // Binary search on -n sorted indices: hi = (-n) - 1
  292.                 int lo = 0;
  293.                 int hi = ~n;
  294.                 while (lo <= hi) {
  295.                     final int mid = (lo + hi) >>> 1;
  296.                     if (a[k[mid]] < 0) {
  297.                         j = mid;
  298.                         lo = mid + 1;
  299.                     } else {
  300.                         hi = mid - 1;
  301.                     }
  302.                 }
  303.             } else {
  304.                 // Unsorted, process all indices
  305.                 for (int i = n; --i >= 0;) {
  306.                     if (a[k[i]] < 0) {
  307.                         j = k[i];
  308.                     }
  309.                 }
  310.             }
  311.             for (;;) {
  312.                 if (a[++j] == 0) {
  313.                     a[j] = -0.0;
  314.                     if (--cn == 0) {
  315.                         break;
  316.                     }
  317.                 }
  318.             }
  319.         }
  320.     }

  321.     /**
  322.      * Partition the array such that index {@code k} corresponds to its correctly
  323.      * sorted value in the equivalent fully sorted array.
  324.      *
  325.      * @param a Values.
  326.      * @param k Index.
  327.      * @throws IndexOutOfBoundsException if index {@code k} is not within the
  328.      * sub-range {@code [0, a.length)}
  329.      */
  330.     public static void select(int[] a, int k) {
  331.         IndexSupport.checkIndex(0, a.length, k);
  332.         if (a.length <= 1) {
  333.             return;
  334.         }
  335.         QuickSelect.select(a, 0, a.length - 1, k);
  336.     }

  337.     /**
  338.      * Partition the array such that indices {@code k} correspond to their correctly
  339.      * sorted value in the equivalent fully sorted array.
  340.      *
  341.      * @param a Values.
  342.      * @param k Indices (may be destructively modified).
  343.      * @throws IndexOutOfBoundsException if any index {@code k} is not within the
  344.      * sub-range {@code [0, a.length)}
  345.      */
  346.     public static void select(int[] a, int[] k) {
  347.         IndexSupport.checkIndices(0, a.length, k);
  348.         if (k.length == 0 || a.length <= 1) {
  349.             return;
  350.         }
  351.         QuickSelect.select(a, 0, a.length - 1, k, k.length);
  352.     }

  353.     /**
  354.      * Partition the array such that index {@code k} corresponds to its correctly
  355.      * sorted value in the equivalent fully sorted array.
  356.      *
  357.      * @param a Values.
  358.      * @param fromIndex Index of the first element (inclusive).
  359.      * @param toIndex Index of the last element (exclusive).
  360.      * @param k Index.
  361.      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
  362.      * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
  363.      * sub-range {@code [fromIndex, toIndex)}
  364.      */
  365.     public static void select(int[] a, int fromIndex, int toIndex, int k) {
  366.         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
  367.         IndexSupport.checkIndex(fromIndex, toIndex, k);
  368.         if (toIndex - fromIndex <= 1) {
  369.             return;
  370.         }
  371.         QuickSelect.select(a, fromIndex, toIndex - 1, k);
  372.     }

  373.     /**
  374.      * Partition the array such that indices {@code k} correspond to their correctly
  375.      * sorted value in the equivalent fully sorted array.
  376.      *
  377.      * @param a Values.
  378.      * @param fromIndex Index of the first element (inclusive).
  379.      * @param toIndex Index of the last element (exclusive).
  380.      * @param k Indices (may be destructively modified).
  381.      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
  382.      * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
  383.      * sub-range {@code [fromIndex, toIndex)}
  384.      */
  385.     public static void select(int[] a, int fromIndex, int toIndex, int[] k) {
  386.         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
  387.         IndexSupport.checkIndices(fromIndex, toIndex, k);
  388.         if (k.length == 0 || toIndex - fromIndex <= 1) {
  389.             return;
  390.         }
  391.         QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
  392.     }
  393. }