001/* 002 * Licensed to the Apache Software Foundation (ASF) under one or more 003 * contributor license agreements. See the NOTICE file distributed with 004 * this work for additional information regarding copyright ownership. 005 * The ASF licenses this file to You under the Apache License, Version 2.0 006 * (the "License"); you may not use this file except in compliance with 007 * the License. You may obtain a copy of the License at 008 * 009 * http://www.apache.org/licenses/LICENSE-2.0 010 * 011 * Unless required by applicable law or agreed to in writing, software 012 * distributed under the License is distributed on an "AS IS" BASIS, 013 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 014 * See the License for the specific language governing permissions and 015 * limitations under the License. 016 */ 017 018package org.apache.commons.numbers.arrays; 019 020/** 021 * Select indices in array data. 022 * 023 * <p>Arranges elements such that indices {@code k} correspond to their correctly 024 * sorted value in the equivalent fully sorted array. For all indices {@code k} 025 * and any index {@code i}: 026 * 027 * <pre>{@code 028 * data[i < k] <= data[k] <= data[k < i] 029 * }</pre> 030 * 031 * <p>Examples: 032 * 033 * <pre> 034 * data [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7] 035 * 036 * 037 * k=4 : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7] 038 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7] 039 * </pre> 040 * 041 * <p>This implementation can select on multiple indices and will handle duplicate and 042 * unordered indices. The method detects ordered indices (with or without duplicates) and 043 * uses this during processing. Passing ordered indices is recommended if the order is already 044 * known; for example using uniform spacing through the array data, or to select the top and 045 * bottom {@code n} values from the data. 046 * 047 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the 048 * partition sizes after each division to update the algorithm mode. If the partition 049 * containing the target does not sufficiently reduce in size then the algorithm is 050 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction 051 * of data is eliminated each step and worse-case linear run time performance. This method can 052 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of 053 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at 054 * the edge of the partition bounded by {@code ka}. 055 * 056 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses 057 * into partitions containing indices. Excess recursion will trigger use of a heapselect 058 * on the remaining range of indices ensuring non-quadratic worse case performance. Any 059 * partition containing a single index, adjacent pair of indices, or range of indices with a 060 * small separation will use the quickselect adaptive method for single keys. Note that the 061 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all 062 * indices are handled as singles. 063 * 064 * <p>Floating-point order 065 * 066 * <p>The {@code <} relation does not impose a total order on all floating-point values. 067 * This class respects the ordering imposed by {@link Double#compare(double, double)}. 068 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is 069 * considered greater than any other value; and all {@code Double.NaN} values are 070 * considered equal. 071 * 072 * <p>References 073 * 074 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n} 075 * using repeat division of the data around a partition element, recursing into the 076 * partition that contains {@code k}. 077 * 078 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in 079 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst 080 * case bound. 081 * 082 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is 083 * performed in the SELECT algorithm of Floyd and Rivest [3, 4]. 084 * 085 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses 086 * the median of medians as a partition element for selection which ensures a minimum fraction of 087 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice 088 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of 089 * Alexandrescu [6]. 090 * 091 * <ol> 092 * <li>Hoare (1961) 093 * Algorithm 65: Find 094 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a> 095 * <li>Musser (1999) 096 * Introspective Sorting and Selection Algorithms 097 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23"> 098 * Software: Practice and Experience 27, 983-993.</a> 099 * <li>Floyd and Rivest (1975) 100 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements. 101 * Comm. ACM. 18 (3): 173. 102 * <li>Kiwiel (2005) 103 * On Floyd and Rivest's SELECT algorithm. 104 * Theoretical Computer Science 347, 214-238. 105 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973) 106 * Time bounds for selection. 107 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9"> 108 * Journal of Computer and System Sciences. 7 (4): 448–461</a>. 109 * <li>Alexandrescu (2016) 110 * Fast Deterministic Selection 111 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>. 112 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a> 113 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a> 114 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a> 115 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a> 116 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a> 117 * </ol> 118 * 119 * @since 1.2 120 */ 121public final class Selection { 122 123 /** No instances. */ 124 private Selection() {} 125 126 /** 127 * Partition the array such that index {@code k} corresponds to its correctly 128 * sorted value in the equivalent fully sorted array. 129 * 130 * @param a Values. 131 * @param k Index. 132 * @throws IndexOutOfBoundsException if index {@code k} is not within the 133 * sub-range {@code [0, a.length)} 134 */ 135 public static void select(double[] a, int k) { 136 IndexSupport.checkIndex(0, a.length, k); 137 doSelect(a, 0, a.length, k); 138 } 139 140 /** 141 * Partition the array such that indices {@code k} correspond to their correctly 142 * sorted value in the equivalent fully sorted array. 143 * 144 * @param a Values. 145 * @param k Indices (may be destructively modified). 146 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 147 * sub-range {@code [0, a.length)} 148 */ 149 public static void select(double[] a, int[] k) { 150 IndexSupport.checkIndices(0, a.length, k); 151 doSelect(a, 0, a.length, k); 152 } 153 154 /** 155 * Partition the array such that index {@code k} corresponds to its correctly 156 * sorted value in the equivalent fully sorted array. 157 * 158 * @param a Values. 159 * @param fromIndex Index of the first element (inclusive). 160 * @param toIndex Index of the last element (exclusive). 161 * @param k Index. 162 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 163 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 164 * sub-range {@code [fromIndex, toIndex)} 165 */ 166 public static void select(double[] a, int fromIndex, int toIndex, int k) { 167 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 168 IndexSupport.checkIndex(fromIndex, toIndex, k); 169 doSelect(a, fromIndex, toIndex, k); 170 } 171 172 /** 173 * Partition the array such that indices {@code k} correspond to their correctly 174 * sorted value in the equivalent fully sorted array. 175 * 176 * @param a Values. 177 * @param fromIndex Index of the first element (inclusive). 178 * @param toIndex Index of the last element (exclusive). 179 * @param k Indices (may be destructively modified). 180 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 181 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 182 * sub-range {@code [fromIndex, toIndex)} 183 */ 184 public static void select(double[] a, int fromIndex, int toIndex, int[] k) { 185 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 186 IndexSupport.checkIndices(fromIndex, toIndex, k); 187 doSelect(a, fromIndex, toIndex, k); 188 } 189 190 /** 191 * Partition the array such that index {@code k} corresponds to its correctly 192 * sorted value in the equivalent fully sorted array. 193 * 194 * <p>This method pre/post-processes the data and indices to respect the ordering 195 * imposed by {@link Double#compare(double, double)}. 196 * 197 * @param fromIndex Index of the first element (inclusive). 198 * @param toIndex Index of the last element (exclusive). 199 * @param a Values. 200 * @param k Index. 201 */ 202 private static void doSelect(double[] a, int fromIndex, int toIndex, int k) { 203 if (toIndex - fromIndex <= 1) { 204 return; 205 } 206 // Sort NaN / count signed zeros. 207 // Caution: This loop contributes significantly to the runtime. 208 int cn = 0; 209 int end = toIndex; 210 for (int i = toIndex; --i >= fromIndex;) { 211 final double v = a[i]; 212 // Count negative zeros using a sign bit check 213 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 214 cn++; 215 // Change to positive zero. 216 // Data must be repaired after selection. 217 a[i] = 0.0; 218 } else if (v != v) { 219 // Move NaN to end 220 a[i] = a[--end]; 221 a[end] = v; 222 } 223 } 224 225 // Partition 226 if (end - fromIndex > 1 && k < end) { 227 QuickSelect.select(a, fromIndex, end - 1, k); 228 } 229 230 // Restore signed zeros 231 if (cn != 0) { 232 // Use partition index below zero to fast-forward to zero as much as possible 233 for (int j = a[k] < 0 ? k : -1;;) { 234 if (a[++j] == 0) { 235 a[j] = -0.0; 236 if (--cn == 0) { 237 break; 238 } 239 } 240 } 241 } 242 } 243 244 /** 245 * Partition the array such that indices {@code k} correspond to their correctly 246 * sorted value in the equivalent fully sorted array. 247 * 248 * <p>This method pre/post-processes the data and indices to respect the ordering 249 * imposed by {@link Double#compare(double, double)}. 250 * 251 * @param fromIndex Index of the first element (inclusive). 252 * @param toIndex Index of the last element (exclusive). 253 * @param a Values. 254 * @param k Indices (may be destructively modified). 255 */ 256 private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) { 257 if (k.length == 0 || toIndex - fromIndex <= 1) { 258 return; 259 } 260 // Sort NaN / count signed zeros. 261 // Caution: This loop contributes significantly to the runtime for single indices. 262 int cn = 0; 263 int end = toIndex; 264 for (int i = toIndex; --i >= fromIndex;) { 265 final double v = a[i]; 266 // Count negative zeros using a sign bit check 267 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { 268 cn++; 269 // Change to positive zero. 270 // Data must be repaired after selection. 271 a[i] = 0.0; 272 } else if (v != v) { 273 // Move NaN to end 274 a[i] = a[--end]; 275 a[end] = v; 276 } 277 } 278 279 // Partition 280 int n = 0; 281 if (end - fromIndex > 1) { 282 n = k.length; 283 // Filter indices invalidated by NaN check 284 if (end < toIndex) { 285 for (int i = n; --i >= 0;) { 286 final int index = k[i]; 287 if (index >= end) { 288 // Move to end 289 k[i] = k[--n]; 290 k[n] = index; 291 } 292 } 293 } 294 // Return n, the count of used indices in k. 295 // Use this to post-process zeros. 296 n = QuickSelect.select(a, fromIndex, end - 1, k, n); 297 } 298 299 // Restore signed zeros 300 if (cn != 0) { 301 // Use partition indices below zero to fast-forward to zero as much as possible 302 int j = -1; 303 if (n < 0) { 304 // Binary search on -n sorted indices: hi = (-n) - 1 305 int lo = 0; 306 int hi = ~n; 307 while (lo <= hi) { 308 final int mid = (lo + hi) >>> 1; 309 if (a[k[mid]] < 0) { 310 j = mid; 311 lo = mid + 1; 312 } else { 313 hi = mid - 1; 314 } 315 } 316 } else { 317 // Unsorted, process all indices 318 for (int i = n; --i >= 0;) { 319 if (a[k[i]] < 0) { 320 j = k[i]; 321 } 322 } 323 } 324 for (;;) { 325 if (a[++j] == 0) { 326 a[j] = -0.0; 327 if (--cn == 0) { 328 break; 329 } 330 } 331 } 332 } 333 } 334 335 /** 336 * Partition the array such that index {@code k} corresponds to its correctly 337 * sorted value in the equivalent fully sorted array. 338 * 339 * @param a Values. 340 * @param k Index. 341 * @throws IndexOutOfBoundsException if index {@code k} is not within the 342 * sub-range {@code [0, a.length)} 343 */ 344 public static void select(int[] a, int k) { 345 IndexSupport.checkIndex(0, a.length, k); 346 if (a.length <= 1) { 347 return; 348 } 349 QuickSelect.select(a, 0, a.length - 1, k); 350 } 351 352 /** 353 * Partition the array such that indices {@code k} correspond to their correctly 354 * sorted value in the equivalent fully sorted array. 355 * 356 * @param a Values. 357 * @param k Indices (may be destructively modified). 358 * @throws IndexOutOfBoundsException if any index {@code k} is not within the 359 * sub-range {@code [0, a.length)} 360 */ 361 public static void select(int[] a, int[] k) { 362 IndexSupport.checkIndices(0, a.length, k); 363 if (k.length == 0 || a.length <= 1) { 364 return; 365 } 366 QuickSelect.select(a, 0, a.length - 1, k, k.length); 367 } 368 369 /** 370 * Partition the array such that index {@code k} corresponds to its correctly 371 * sorted value in the equivalent fully sorted array. 372 * 373 * @param a Values. 374 * @param fromIndex Index of the first element (inclusive). 375 * @param toIndex Index of the last element (exclusive). 376 * @param k Index. 377 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 378 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the 379 * sub-range {@code [fromIndex, toIndex)} 380 */ 381 public static void select(int[] a, int fromIndex, int toIndex, int k) { 382 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 383 IndexSupport.checkIndex(fromIndex, toIndex, k); 384 if (toIndex - fromIndex <= 1) { 385 return; 386 } 387 QuickSelect.select(a, fromIndex, toIndex - 1, k); 388 } 389 390 /** 391 * Partition the array such that indices {@code k} correspond to their correctly 392 * sorted value in the equivalent fully sorted array. 393 * 394 * @param a Values. 395 * @param fromIndex Index of the first element (inclusive). 396 * @param toIndex Index of the last element (exclusive). 397 * @param k Indices (may be destructively modified). 398 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of 399 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the 400 * sub-range {@code [fromIndex, toIndex)} 401 */ 402 public static void select(int[] a, int fromIndex, int toIndex, int[] k) { 403 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); 404 IndexSupport.checkIndices(fromIndex, toIndex, k); 405 if (k.length == 0 || toIndex - fromIndex <= 1) { 406 return; 407 } 408 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length); 409 } 410}