001/* 002 * Licensed to the Apache Software Foundation (ASF) under one or more 003 * contributor license agreements. See the NOTICE file distributed with 004 * this work for additional information regarding copyright ownership. 005 * The ASF licenses this file to You under the Apache License, Version 2.0 006 * (the "License"); you may not use this file except in compliance with 007 * the License. You may obtain a copy of the License at 008 * 009 * http://www.apache.org/licenses/LICENSE-2.0 010 * 011 * Unless required by applicable law or agreed to in writing, software 012 * distributed under the License is distributed on an "AS IS" BASIS, 013 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 014 * See the License for the specific language governing permissions and 015 * limitations under the License. 016 */ 017 018package org.apache.commons.numbers.arrays; 019 020/** 021 * Select indices in array data. 022 * 023 * <p>Arranges elements such that indices {@code k} correspond to their correctly 024 * sorted value in the equivalent fully sorted array. For all indices {@code k} 025 * and any index {@code i}: 026 * 027 * <pre>{@code 028 * data[i < k] <= data[k] <= data[k < i] 029 * }</pre> 030 * 031 * <p>Examples: 032 * 033 * <pre> 034 * data [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7] 035 * 036 * 037 * k=4 : [0, 2, 1, 1], [2], [6, 3, 2, 3, 5, 7, 7, 7, 7] 038 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [7, 7, 6, 7, 7] 039 * </pre> 040 * 041 * <p>Notes: 042 * 043 * <ul> 044 * <li>The algorithm does not require that all data with a value equal to the target index 045 * are ordered consecutively in a range containing the target index. 046 * <li>The algorithm may reorder any part of the array above and below the target indices. 047 * <li>Correct usage for multiple target indices should not call multiple times with each 048 * index but instead call selection only once with all indices. Use of consecutive calls 049 * can reorder previously selected indices. 050 * </ul> 051 * 052 * <p>Selection on multiple indices will handle duplicate and 053 * unordered indices. The method detects ordered indices (with or without duplicates) and 054 * uses this during processing. Passing ordered indices is recommended if the order is already 055 * known; for example using uniform spacing through the array data, or to select the top and 056 * bottom {@code n} values from the data. 057 * 058 * <p>Implementation details 059 * 060 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the 061 * partition sizes after each division to update the algorithm mode. If the partition 062 * containing the target does not sufficiently reduce in size then the algorithm is 063 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction 064 * of data is eliminated each step and worse-case linear run time performance. This method can 065 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of 066 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at 067 * the edge of the partition bounded by {@code ka}. 068 * 069 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses 070 * into partitions containing indices. Excess recursion will trigger use of a heapselect 071 * on the remaining range of indices ensuring non-quadratic worse case performance. Any 072 * partition containing a single index, adjacent pair of indices, or range of indices with a 073 * small separation will use the quickselect adaptive method for single keys. Note that the 074 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all 075 * indices are handled as singles. 076 * 077 * <p>Floating-point order 078 * 079 * <p>The {@code <} relation does not impose a total order on all floating-point values. 080 * This class respects the ordering imposed by {@link Double#compare(double, double)}. 081 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is 082 * considered greater than any other value; and all {@code Double.NaN} values are 083 * considered equal. 084 * 085 * <p>References 086 * 087 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n} 088 * using repeat division of the data around a partition element, recursing into the 089 * partition that contains {@code k}. 090 * 091 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in 092 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst 093 * case bound. 094 * 095 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is 096 * performed in the SELECT algorithm of Floyd and Rivest [3, 4]. 097 * 098 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses 099 * the median of medians as a partition element for selection which ensures a minimum fraction of 100 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice 101 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of 102 * Alexandrescu [6]. 103 * 104 * <ol> 105 * <li>Hoare (1961) 106 * Algorithm 65: Find 107 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a></li> 108 * <li>Musser (1999) 109 * Introspective Sorting and Selection Algorithms 110 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23"> 111 * Software: Practice and Experience 27, 983-993.</a></li> 112 * <li>Floyd and Rivest (1975) 113 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements. 114 * Comm. ACM. 18 (3): 173.</li> 115 * <li>Kiwiel (2005) 116 * On Floyd and Rivest's SELECT algorithm. 117 * Theoretical Computer Science 347, 214-238.</li> 118 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973) 119 * Time bounds for selection. 120 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9"> 121 * Journal of Computer and System Sciences. 7 (4): 448–461</a>.</li> 122 * <li>Alexandrescu (2016) 123 * Fast Deterministic Selection 124 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.</li> 125 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a></li> 126 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a></li> 127 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a></li> 128 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a></li> 129 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a></li> 130 * </ol> 131 * 132 * @since 1.2 133 */ 134public final class Selection { 135 136 /** No instances. */ 137 private Selection() {} 138 139 /** 140 * Partition the array such that index {@code k} corresponds to its correctly 141 * sorted value in the equivalent fully sorted array. 142 * 143 * @param a Values. 144 * @param k Index. 145 * @throws IndexOutOfBoundsException if index {@code k} is not within the 146 * sub-range {@code [0, a.length)} 147 */ 148 public static void select(double[] a, int k) { 149 IndexSupport.checkIndex(0, a.length, k); 150 doSelect(a, 0, a.length, k); 151 } 152 153 /** 154 * Partition the array such that indices {@code k} correspond to their correctly 155 * sorted value in the equivalent fully sorted array. 156 * 157 * @param a Values. 158 * @param k Indices (may be destructively modified). 159 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 160 * sub-range {@code [0, a.length)} 161 */ 162 public static void select(double[] a, int[] k) { 163 IndexSupport.checkIndices(0, a.length, k); 164 doSelect(a, 0, a.length, k); 165 } 166 167 /** 168 * Partition the array such that index {@code k} corresponds to its correctly 169 * sorted value in the equivalent fully sorted array. 170 * 171 * @param a Values. 172 * @param fromIndex Index of the first element (inclusive). 173 * @param toIndex Index of the last element (exclusive). 174 * @param k Index. 175 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 176 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 177 * sub-range {@code [fromIndex, toIndex)} 178 */ 179 public static void select(double[] a, int fromIndex, int toIndex, int k) { 180 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 181 IndexSupport.checkIndex(fromIndex, toIndex, k); 182 doSelect(a, fromIndex, toIndex, k); 183 } 184 185 /** 186 * Partition the array such that indices {@code k} correspond to their correctly 187 * sorted value in the equivalent fully sorted array. 188 * 189 * @param a Values. 190 * @param fromIndex Index of the first element (inclusive). 191 * @param toIndex Index of the last element (exclusive). 192 * @param k Indices (may be destructively modified). 193 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 194 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 195 * sub-range {@code [fromIndex, toIndex)} 196 */ 197 public static void select(double[] a, int fromIndex, int toIndex, int[] k) { 198 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 199 IndexSupport.checkIndices(fromIndex, toIndex, k); 200 doSelect(a, fromIndex, toIndex, k); 201 } 202 203 /** 204 * Partition the array such that index {@code k} corresponds to its correctly 205 * sorted value in the equivalent fully sorted array. 206 * 207 * <p>This method pre/post-processes the data and indices to respect the ordering 208 * imposed by {@link Double#compare(double, double)}. 209 * 210 * @param fromIndex Index of the first element (inclusive). 211 * @param toIndex Index of the last element (exclusive). 212 * @param a Values. 213 * @param k Index. 214 */ 215 private static void doSelect(double[] a, int fromIndex, int toIndex, int k) { 216 if (toIndex - fromIndex <= 1) { 217 return; 218 } 219 // Sort NaN / count signed zeros. 220 // Caution: This loop contributes significantly to the runtime. 221 int cn = 0; 222 int end = toIndex; 223 for (int i = toIndex; --i >= fromIndex;) { 224 final double v = a[i]; 225 // Count negative zeros using a sign bit check 226 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 227 cn++; 228 // Change to positive zero. 229 // Data must be repaired after selection. 230 a[i] = 0.0; 231 } else if (v != v) { 232 // Move NaN to end 233 a[i] = a[--end]; 234 a[end] = v; 235 } 236 } 237 238 // Partition 239 if (end - fromIndex > 1 && k < end) { 240 QuickSelect.select(a, fromIndex, end - 1, k); 241 } 242 243 // Restore signed zeros 244 if (cn != 0) { 245 // Use partition index below zero to fast-forward to zero as much as possible 246 for (int j = a[k] < 0 ? k : -1;;) { 247 if (a[++j] == 0) { 248 a[j] = -0.0; 249 if (--cn == 0) { 250 break; 251 } 252 } 253 } 254 } 255 } 256 257 /** 258 * Partition the array such that indices {@code k} correspond to their correctly 259 * sorted value in the equivalent fully sorted array. 260 * 261 * <p>This method pre/post-processes the data and indices to respect the ordering 262 * imposed by {@link Double#compare(double, double)}. 263 * 264 * @param fromIndex Index of the first element (inclusive). 265 * @param toIndex Index of the last element (exclusive). 266 * @param a Values. 267 * @param k Indices (may be destructively modified). 268 */ 269 private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) { 270 if (k.length == 0 || toIndex - fromIndex <= 1) { 271 return; 272 } 273 // Sort NaN / count signed zeros. 274 // Caution: This loop contributes significantly to the runtime for single indices. 275 int cn = 0; 276 int end = toIndex; 277 for (int i = toIndex; --i >= fromIndex;) { 278 final double v = a[i]; 279 // Count negative zeros using a sign bit check 280 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 281 cn++; 282 // Change to positive zero. 283 // Data must be repaired after selection. 284 a[i] = 0.0; 285 } else if (v != v) { 286 // Move NaN to end 287 a[i] = a[--end]; 288 a[end] = v; 289 } 290 } 291 292 // Partition 293 int n = 0; 294 if (end - fromIndex > 1) { 295 n = k.length; 296 // Filter indices invalidated by NaN check 297 if (end < toIndex) { 298 for (int i = n; --i >= 0;) { 299 final int index = k[i]; 300 if (index >= end) { 301 // Move to end 302 k[i] = k[--n]; 303 k[n] = index; 304 } 305 } 306 } 307 // Return n, the count of used indices in k. 308 // Use this to post-process zeros. 309 n = QuickSelect.select(a, fromIndex, end - 1, k, n); 310 } 311 312 // Restore signed zeros 313 if (cn != 0) { 314 // Use partition indices below zero to fast-forward to zero as much as possible 315 int j = -1; 316 if (n < 0) { 317 // Binary search on -n sorted indices: hi = (-n) - 1 318 int lo = 0; 319 int hi = ~n; 320 while (lo <= hi) { 321 final int mid = (lo + hi) >>> 1; 322 if (a[k[mid]] < 0) { 323 j = mid; 324 lo = mid + 1; 325 } else { 326 hi = mid - 1; 327 } 328 } 329 } else { 330 // Unsorted, process all indices 331 for (int i = n; --i >= 0;) { 332 if (a[k[i]] < 0) { 333 j = k[i]; 334 } 335 } 336 } 337 for (;;) { 338 if (a[++j] == 0) { 339 a[j] = -0.0; 340 if (--cn == 0) { 341 break; 342 } 343 } 344 } 345 } 346 } 347 348 /** 349 * Partition the array such that index {@code k} corresponds to its correctly 350 * sorted value in the equivalent fully sorted array. 351 * 352 * @param a Values. 353 * @param k Index. 354 * @throws IndexOutOfBoundsException if index {@code k} is not within the 355 * sub-range {@code [0, a.length)} 356 */ 357 public static void select(int[] a, int k) { 358 IndexSupport.checkIndex(0, a.length, k); 359 if (a.length <= 1) { 360 return; 361 } 362 QuickSelect.select(a, 0, a.length - 1, k); 363 } 364 365 /** 366 * Partition the array such that indices {@code k} correspond to their correctly 367 * sorted value in the equivalent fully sorted array. 368 * 369 * @param a Values. 370 * @param k Indices (may be destructively modified). 371 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 372 * sub-range {@code [0, a.length)} 373 */ 374 public static void select(int[] a, int[] k) { 375 IndexSupport.checkIndices(0, a.length, k); 376 if (k.length == 0 || a.length <= 1) { 377 return; 378 } 379 QuickSelect.select(a, 0, a.length - 1, k, k.length); 380 } 381 382 /** 383 * Partition the array such that index {@code k} corresponds to its correctly 384 * sorted value in the equivalent fully sorted array. 385 * 386 * @param a Values. 387 * @param fromIndex Index of the first element (inclusive). 388 * @param toIndex Index of the last element (exclusive). 389 * @param k Index. 390 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 391 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 392 * sub-range {@code [fromIndex, toIndex)} 393 */ 394 public static void select(int[] a, int fromIndex, int toIndex, int k) { 395 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 396 IndexSupport.checkIndex(fromIndex, toIndex, k); 397 if (toIndex - fromIndex <= 1) { 398 return; 399 } 400 QuickSelect.select(a, fromIndex, toIndex - 1, k); 401 } 402 403 /** 404 * Partition the array such that indices {@code k} correspond to their correctly 405 * sorted value in the equivalent fully sorted array. 406 * 407 * @param a Values. 408 * @param fromIndex Index of the first element (inclusive). 409 * @param toIndex Index of the last element (exclusive). 410 * @param k Indices (may be destructively modified). 411 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 412 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 413 * sub-range {@code [fromIndex, toIndex)} 414 */ 415 public static void select(int[] a, int fromIndex, int toIndex, int[] k) { 416 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 417 IndexSupport.checkIndices(fromIndex, toIndex, k); 418 if (k.length == 0 || toIndex - fromIndex <= 1) { 419 return; 420 } 421 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length); 422 } 423 424 /** 425 * Partition the array such that index {@code k} corresponds to its correctly 426 * sorted value in the equivalent fully sorted array. 427 * 428 * @param a Values. 429 * @param k Index. 430 * @throws IndexOutOfBoundsException if index {@code k} is not within the 431 * sub-range {@code [0, a.length)} 432 * @since 1.3 433 */ 434 public static void select(long[] a, int k) { 435 IndexSupport.checkIndex(0, a.length, k); 436 if (a.length <= 1) { 437 return; 438 } 439 QuickSelect.select(a, 0, a.length - 1, k); 440 } 441 442 /** 443 * Partition the array such that indices {@code k} correspond to their correctly 444 * sorted value in the equivalent fully sorted array. 445 * 446 * @param a Values. 447 * @param k Indices (may be destructively modified). 448 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 449 * sub-range {@code [0, a.length)} 450 * @since 1.3 451 */ 452 public static void select(long[] a, int[] k) { 453 IndexSupport.checkIndices(0, a.length, k); 454 if (k.length == 0 || a.length <= 1) { 455 return; 456 } 457 QuickSelect.select(a, 0, a.length - 1, k, k.length); 458 } 459 460 /** 461 * Partition the array such that index {@code k} corresponds to its correctly 462 * sorted value in the equivalent fully sorted array. 463 * 464 * @param a Values. 465 * @param fromIndex Index of the first element (inclusive). 466 * @param toIndex Index of the last element (exclusive). 467 * @param k Index. 468 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 469 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 470 * sub-range {@code [fromIndex, toIndex)} 471 * @since 1.3 472 */ 473 public static void select(long[] a, int fromIndex, int toIndex, int k) { 474 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 475 IndexSupport.checkIndex(fromIndex, toIndex, k); 476 if (toIndex - fromIndex <= 1) { 477 return; 478 } 479 QuickSelect.select(a, fromIndex, toIndex - 1, k); 480 } 481 482 /** 483 * Partition the array such that indices {@code k} correspond to their correctly 484 * sorted value in the equivalent fully sorted array. 485 * 486 * @param a Values. 487 * @param fromIndex Index of the first element (inclusive). 488 * @param toIndex Index of the last element (exclusive). 489 * @param k Indices (may be destructively modified). 490 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 491 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 492 * sub-range {@code [fromIndex, toIndex)} 493 * @since 1.3 494 */ 495 public static void select(long[] a, int fromIndex, int toIndex, int[] k) { 496 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 497 IndexSupport.checkIndices(fromIndex, toIndex, k); 498 if (k.length == 0 || toIndex - fromIndex <= 1) { 499 return; 500 } 501 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length); 502 } 503}