1 /* 2 * Licensed to the Apache Software Foundation (ASF) under one or more 3 * contributor license agreements. See the NOTICE file distributed with 4 * this work for additional information regarding copyright ownership. 5 * The ASF licenses this file to You under the Apache License, Version 2.0 6 * (the "License"); you may not use this file except in compliance with 7 * the License. You may obtain a copy of the License at 8 * 9 * http://www.apache.org/licenses/LICENSE-2.0 10 * 11 * Unless required by applicable law or agreed to in writing, software 12 * distributed under the License is distributed on an "AS IS" BASIS, 13 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 * See the License for the specific language governing permissions and 15 * limitations under the License. 16 */ 17 18 package org.apache.commons.numbers.arrays; 19 20 /** 21 * Select indices in array data. 22 * 23 * <p>Arranges elements such that indices {@code k} correspond to their correctly 24 * sorted value in the equivalent fully sorted array. For all indices {@code k} 25 * and any index {@code i}: 26 * 27 * <pre>{@code 28 * data[i < k] <= data[k] <= data[k < i] 29 * }</pre> 30 * 31 * <p>Examples: 32 * 33 * <pre> 34 * data [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7] 35 * 36 * 37 * k=4 : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7] 38 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7] 39 * </pre> 40 * 41 * <p>This implementation can select on multiple indices and will handle duplicate and 42 * unordered indices. The method detects ordered indices (with or without duplicates) and 43 * uses this during processing. Passing ordered indices is recommended if the order is already 44 * known; for example using uniform spacing through the array data, or to select the top and 45 * bottom {@code n} values from the data. 46 * 47 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the 48 * partition sizes after each division to update the algorithm mode. If the partition 49 * containing the target does not sufficiently reduce in size then the algorithm is 50 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction 51 * of data is eliminated each step and worse-case linear run time performance. This method can 52 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of 53 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at 54 * the edge of the partition bounded by {@code ka}. 55 * 56 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses 57 * into partitions containing indices. Excess recursion will trigger use of a heapselect 58 * on the remaining range of indices ensuring non-quadratic worse case performance. Any 59 * partition containing a single index, adjacent pair of indices, or range of indices with a 60 * small separation will use the quickselect adaptive method for single keys. Note that the 61 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all 62 * indices are handled as singles. 63 * 64 * <p>Floating-point order 65 * 66 * <p>The {@code <} relation does not impose a total order on all floating-point values. 67 * This class respects the ordering imposed by {@link Double#compare(double, double)}. 68 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is 69 * considered greater than any other value; and all {@code Double.NaN} values are 70 * considered equal. 71 * 72 * <p>References 73 * 74 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n} 75 * using repeat division of the data around a partition element, recursing into the 76 * partition that contains {@code k}. 77 * 78 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in 79 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst 80 * case bound. 81 * 82 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is 83 * performed in the SELECT algorithm of Floyd and Rivest [3, 4]. 84 * 85 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses 86 * the median of medians as a partition element for selection which ensures a minimum fraction of 87 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice 88 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of 89 * Alexandrescu [6]. 90 * 91 * <ol> 92 * <li>Hoare (1961) 93 * Algorithm 65: Find 94 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a> 95 * <li>Musser (1999) 96 * Introspective Sorting and Selection Algorithms 97 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23"> 98 * Software: Practice and Experience 27, 983-993.</a> 99 * <li>Floyd and Rivest (1975) 100 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements. 101 * Comm. ACM. 18 (3): 173. 102 * <li>Kiwiel (2005) 103 * On Floyd and Rivest's SELECT algorithm. 104 * Theoretical Computer Science 347, 214-238. 105 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973) 106 * Time bounds for selection. 107 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9"> 108 * Journal of Computer and System Sciences. 7 (4): 448–461</a>. 109 * <li>Alexandrescu (2016) 110 * Fast Deterministic Selection 111 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>. 112 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a> 113 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a> 114 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a> 115 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a> 116 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a> 117 * </ol> 118 * 119 * @since 1.2 120 */ 121 public final class Selection { 122 123 /** No instances. */ 124 private Selection() {} 125 126 /** 127 * Partition the array such that index {@code k} corresponds to its correctly 128 * sorted value in the equivalent fully sorted array. 129 * 130 * @param a Values. 131 * @param k Index. 132 * @throws IndexOutOfBoundsException if index {@code k} is not within the 133 * sub-range {@code [0, a.length)} 134 */ 135 public static void select(double[] a, int k) { 136 IndexSupport.checkIndex(0, a.length, k); 137 doSelect(a, 0, a.length, k); 138 } 139 140 /** 141 * Partition the array such that indices {@code k} correspond to their correctly 142 * sorted value in the equivalent fully sorted array. 143 * 144 * @param a Values. 145 * @param k Indices (may be destructively modified). 146 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 147 * sub-range {@code [0, a.length)} 148 */ 149 public static void select(double[] a, int[] k) { 150 IndexSupport.checkIndices(0, a.length, k); 151 doSelect(a, 0, a.length, k); 152 } 153 154 /** 155 * Partition the array such that index {@code k} corresponds to its correctly 156 * sorted value in the equivalent fully sorted array. 157 * 158 * @param a Values. 159 * @param fromIndex Index of the first element (inclusive). 160 * @param toIndex Index of the last element (exclusive). 161 * @param k Index. 162 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 163 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 164 * sub-range {@code [fromIndex, toIndex)} 165 */ 166 public static void select(double[] a, int fromIndex, int toIndex, int k) { 167 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 168 IndexSupport.checkIndex(fromIndex, toIndex, k); 169 doSelect(a, fromIndex, toIndex, k); 170 } 171 172 /** 173 * Partition the array such that indices {@code k} correspond to their correctly 174 * sorted value in the equivalent fully sorted array. 175 * 176 * @param a Values. 177 * @param fromIndex Index of the first element (inclusive). 178 * @param toIndex Index of the last element (exclusive). 179 * @param k Indices (may be destructively modified). 180 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 181 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 182 * sub-range {@code [fromIndex, toIndex)} 183 */ 184 public static void select(double[] a, int fromIndex, int toIndex, int[] k) { 185 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 186 IndexSupport.checkIndices(fromIndex, toIndex, k); 187 doSelect(a, fromIndex, toIndex, k); 188 } 189 190 /** 191 * Partition the array such that index {@code k} corresponds to its correctly 192 * sorted value in the equivalent fully sorted array. 193 * 194 * <p>This method pre/post-processes the data and indices to respect the ordering 195 * imposed by {@link Double#compare(double, double)}. 196 * 197 * @param fromIndex Index of the first element (inclusive). 198 * @param toIndex Index of the last element (exclusive). 199 * @param a Values. 200 * @param k Index. 201 */ 202 private static void doSelect(double[] a, int fromIndex, int toIndex, int k) { 203 if (toIndex - fromIndex <= 1) { 204 return; 205 } 206 // Sort NaN / count signed zeros. 207 // Caution: This loop contributes significantly to the runtime. 208 int cn = 0; 209 int end = toIndex; 210 for (int i = toIndex; --i >= fromIndex;) { 211 final double v = a[i]; 212 // Count negative zeros using a sign bit check 213 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 214 cn++; 215 // Change to positive zero. 216 // Data must be repaired after selection. 217 a[i] = 0.0; 218 } else if (v != v) { 219 // Move NaN to end 220 a[i] = a[--end]; 221 a[end] = v; 222 } 223 } 224 225 // Partition 226 if (end - fromIndex > 1 && k < end) { 227 QuickSelect.select(a, fromIndex, end - 1, k); 228 } 229 230 // Restore signed zeros 231 if (cn != 0) { 232 // Use partition index below zero to fast-forward to zero as much as possible 233 for (int j = a[k] < 0 ? k : -1;;) { 234 if (a[++j] == 0) { 235 a[j] = -0.0; 236 if (--cn == 0) { 237 break; 238 } 239 } 240 } 241 } 242 } 243 244 /** 245 * Partition the array such that indices {@code k} correspond to their correctly 246 * sorted value in the equivalent fully sorted array. 247 * 248 * <p>This method pre/post-processes the data and indices to respect the ordering 249 * imposed by {@link Double#compare(double, double)}. 250 * 251 * @param fromIndex Index of the first element (inclusive). 252 * @param toIndex Index of the last element (exclusive). 253 * @param a Values. 254 * @param k Indices (may be destructively modified). 255 */ 256 private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) { 257 if (k.length == 0 || toIndex - fromIndex <= 1) { 258 return; 259 } 260 // Sort NaN / count signed zeros. 261 // Caution: This loop contributes significantly to the runtime for single indices. 262 int cn = 0; 263 int end = toIndex; 264 for (int i = toIndex; --i >= fromIndex;) { 265 final double v = a[i]; 266 // Count negative zeros using a sign bit check 267 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 268 cn++; 269 // Change to positive zero. 270 // Data must be repaired after selection. 271 a[i] = 0.0; 272 } else if (v != v) { 273 // Move NaN to end 274 a[i] = a[--end]; 275 a[end] = v; 276 } 277 } 278 279 // Partition 280 int n = 0; 281 if (end - fromIndex > 1) { 282 n = k.length; 283 // Filter indices invalidated by NaN check 284 if (end < toIndex) { 285 for (int i = n; --i >= 0;) { 286 final int index = k[i]; 287 if (index >= end) { 288 // Move to end 289 k[i] = k[--n]; 290 k[n] = index; 291 } 292 } 293 } 294 // Return n, the count of used indices in k. 295 // Use this to post-process zeros. 296 n = QuickSelect.select(a, fromIndex, end - 1, k, n); 297 } 298 299 // Restore signed zeros 300 if (cn != 0) { 301 // Use partition indices below zero to fast-forward to zero as much as possible 302 int j = -1; 303 if (n < 0) { 304 // Binary search on -n sorted indices: hi = (-n) - 1 305 int lo = 0; 306 int hi = ~n; 307 while (lo <= hi) { 308 final int mid = (lo + hi) >>> 1; 309 if (a[k[mid]] < 0) { 310 j = mid; 311 lo = mid + 1; 312 } else { 313 hi = mid - 1; 314 } 315 } 316 } else { 317 // Unsorted, process all indices 318 for (int i = n; --i >= 0;) { 319 if (a[k[i]] < 0) { 320 j = k[i]; 321 } 322 } 323 } 324 for (;;) { 325 if (a[++j] == 0) { 326 a[j] = -0.0; 327 if (--cn == 0) { 328 break; 329 } 330 } 331 } 332 } 333 } 334 335 /** 336 * Partition the array such that index {@code k} corresponds to its correctly 337 * sorted value in the equivalent fully sorted array. 338 * 339 * @param a Values. 340 * @param k Index. 341 * @throws IndexOutOfBoundsException if index {@code k} is not within the 342 * sub-range {@code [0, a.length)} 343 */ 344 public static void select(int[] a, int k) { 345 IndexSupport.checkIndex(0, a.length, k); 346 if (a.length <= 1) { 347 return; 348 } 349 QuickSelect.select(a, 0, a.length - 1, k); 350 } 351 352 /** 353 * Partition the array such that indices {@code k} correspond to their correctly 354 * sorted value in the equivalent fully sorted array. 355 * 356 * @param a Values. 357 * @param k Indices (may be destructively modified). 358 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 359 * sub-range {@code [0, a.length)} 360 */ 361 public static void select(int[] a, int[] k) { 362 IndexSupport.checkIndices(0, a.length, k); 363 if (k.length == 0 || a.length <= 1) { 364 return; 365 } 366 QuickSelect.select(a, 0, a.length - 1, k, k.length); 367 } 368 369 /** 370 * Partition the array such that index {@code k} corresponds to its correctly 371 * sorted value in the equivalent fully sorted array. 372 * 373 * @param a Values. 374 * @param fromIndex Index of the first element (inclusive). 375 * @param toIndex Index of the last element (exclusive). 376 * @param k Index. 377 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 378 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 379 * sub-range {@code [fromIndex, toIndex)} 380 */ 381 public static void select(int[] a, int fromIndex, int toIndex, int k) { 382 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 383 IndexSupport.checkIndex(fromIndex, toIndex, k); 384 if (toIndex - fromIndex <= 1) { 385 return; 386 } 387 QuickSelect.select(a, fromIndex, toIndex - 1, k); 388 } 389 390 /** 391 * Partition the array such that indices {@code k} correspond to their correctly 392 * sorted value in the equivalent fully sorted array. 393 * 394 * @param a Values. 395 * @param fromIndex Index of the first element (inclusive). 396 * @param toIndex Index of the last element (exclusive). 397 * @param k Indices (may be destructively modified). 398 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 399 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 400 * sub-range {@code [fromIndex, toIndex)} 401 */ 402 public static void select(int[] a, int fromIndex, int toIndex, int[] k) { 403 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 404 IndexSupport.checkIndices(fromIndex, toIndex, k); 405 if (k.length == 0 || toIndex - fromIndex <= 1) { 406 return; 407 } 408 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length); 409 } 410 }