1 /*
2 * Licensed to the Apache Software Foundation (ASF) under one or more
3 * contributor license agreements. See the NOTICE file distributed with
4 * this work for additional information regarding copyright ownership.
5 * The ASF licenses this file to You under the Apache License, Version 2.0
6 * (the "License"); you may not use this file except in compliance with
7 * the License. You may obtain a copy of the License at
8 *
9 * http://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in writing, software
12 * distributed under the License is distributed on an "AS IS" BASIS,
13 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 * See the License for the specific language governing permissions and
15 * limitations under the License.
16 */
17
18 package org.apache.commons.numbers.arrays;
19
20 /**
21 * Select indices in array data.
22 *
23 * <p>Arranges elements such that indices {@code k} correspond to their correctly
24 * sorted value in the equivalent fully sorted array. For all indices {@code k}
25 * and any index {@code i}:
26 *
27 * <pre>{@code
28 * data[i < k] <= data[k] <= data[k < i]
29 * }</pre>
30 *
31 * <p>Examples:
32 *
33 * <pre>
34 * data [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
35 *
36 *
37 * k=4 : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7]
38 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7]
39 * </pre>
40 *
41 * <p>This implementation can select on multiple indices and will handle duplicate and
42 * unordered indices. The method detects ordered indices (with or without duplicates) and
43 * uses this during processing. Passing ordered indices is recommended if the order is already
44 * known; for example using uniform spacing through the array data, or to select the top and
45 * bottom {@code n} values from the data.
46 *
47 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the
48 * partition sizes after each division to update the algorithm mode. If the partition
49 * containing the target does not sufficiently reduce in size then the algorithm is
50 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction
51 * of data is eliminated each step and worse-case linear run time performance. This method can
52 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of
53 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at
54 * the edge of the partition bounded by {@code ka}.
55 *
56 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses
57 * into partitions containing indices. Excess recursion will trigger use of a heapselect
58 * on the remaining range of indices ensuring non-quadratic worse case performance. Any
59 * partition containing a single index, adjacent pair of indices, or range of indices with a
60 * small separation will use the quickselect adaptive method for single keys. Note that the
61 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all
62 * indices are handled as singles.
63 *
64 * <p>Floating-point order
65 *
66 * <p>The {@code <} relation does not impose a total order on all floating-point values.
67 * This class respects the ordering imposed by {@link Double#compare(double, double)}.
68 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is
69 * considered greater than any other value; and all {@code Double.NaN} values are
70 * considered equal.
71 *
72 * <p>References
73 *
74 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n}
75 * using repeat division of the data around a partition element, recursing into the
76 * partition that contains {@code k}.
77 *
78 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in
79 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst
80 * case bound.
81 *
82 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is
83 * performed in the SELECT algorithm of Floyd and Rivest [3, 4].
84 *
85 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses
86 * the median of medians as a partition element for selection which ensures a minimum fraction of
87 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice
88 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of
89 * Alexandrescu [6].
90 *
91 * <ol>
92 * <li>Hoare (1961)
93 * Algorithm 65: Find
94 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a>
95 * <li>Musser (1999)
96 * Introspective Sorting and Selection Algorithms
97 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23">
98 * Software: Practice and Experience 27, 983-993.</a>
99 * <li>Floyd and Rivest (1975)
100 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements.
101 * Comm. ACM. 18 (3): 173.
102 * <li>Kiwiel (2005)
103 * On Floyd and Rivest's SELECT algorithm.
104 * Theoretical Computer Science 347, 214-238.
105 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
106 * Time bounds for selection.
107 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9">
108 * Journal of Computer and System Sciences. 7 (4): 448–461</a>.
109 * <li>Alexandrescu (2016)
110 * Fast Deterministic Selection
111 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.
112 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a>
113 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a>
114 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a>
115 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a>
116 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a>
117 * </ol>
118 *
119 * @since 1.2
120 */
121 public final class Selection {
122
123 /** No instances. */
124 private Selection() {}
125
126 /**
127 * Partition the array such that index {@code k} corresponds to its correctly
128 * sorted value in the equivalent fully sorted array.
129 *
130 * @param a Values.
131 * @param k Index.
132 * @throws IndexOutOfBoundsException if index {@code k} is not within the
133 * sub-range {@code [0, a.length)}
134 */
135 public static void select(double[] a, int k) {
136 IndexSupport.checkIndex(0, a.length, k);
137 doSelect(a, 0, a.length, k);
138 }
139
140 /**
141 * Partition the array such that indices {@code k} correspond to their correctly
142 * sorted value in the equivalent fully sorted array.
143 *
144 * @param a Values.
145 * @param k Indices (may be destructively modified).
146 * @throws IndexOutOfBoundsException if any index {@code k} is not within the
147 * sub-range {@code [0, a.length)}
148 */
149 public static void select(double[] a, int[] k) {
150 IndexSupport.checkIndices(0, a.length, k);
151 doSelect(a, 0, a.length, k);
152 }
153
154 /**
155 * Partition the array such that index {@code k} corresponds to its correctly
156 * sorted value in the equivalent fully sorted array.
157 *
158 * @param a Values.
159 * @param fromIndex Index of the first element (inclusive).
160 * @param toIndex Index of the last element (exclusive).
161 * @param k Index.
162 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
163 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
164 * sub-range {@code [fromIndex, toIndex)}
165 */
166 public static void select(double[] a, int fromIndex, int toIndex, int k) {
167 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
168 IndexSupport.checkIndex(fromIndex, toIndex, k);
169 doSelect(a, fromIndex, toIndex, k);
170 }
171
172 /**
173 * Partition the array such that indices {@code k} correspond to their correctly
174 * sorted value in the equivalent fully sorted array.
175 *
176 * @param a Values.
177 * @param fromIndex Index of the first element (inclusive).
178 * @param toIndex Index of the last element (exclusive).
179 * @param k Indices (may be destructively modified).
180 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
181 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
182 * sub-range {@code [fromIndex, toIndex)}
183 */
184 public static void select(double[] a, int fromIndex, int toIndex, int[] k) {
185 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
186 IndexSupport.checkIndices(fromIndex, toIndex, k);
187 doSelect(a, fromIndex, toIndex, k);
188 }
189
190 /**
191 * Partition the array such that index {@code k} corresponds to its correctly
192 * sorted value in the equivalent fully sorted array.
193 *
194 * <p>This method pre/post-processes the data and indices to respect the ordering
195 * imposed by {@link Double#compare(double, double)}.
196 *
197 * @param fromIndex Index of the first element (inclusive).
198 * @param toIndex Index of the last element (exclusive).
199 * @param a Values.
200 * @param k Index.
201 */
202 private static void doSelect(double[] a, int fromIndex, int toIndex, int k) {
203 if (toIndex - fromIndex <= 1) {
204 return;
205 }
206 // Sort NaN / count signed zeros.
207 // Caution: This loop contributes significantly to the runtime.
208 int cn = 0;
209 int end = toIndex;
210 for (int i = toIndex; --i >= fromIndex;) {
211 final double v = a[i];
212 // Count negative zeros using a sign bit check
213 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
214 cn++;
215 // Change to positive zero.
216 // Data must be repaired after selection.
217 a[i] = 0.0;
218 } else if (v != v) {
219 // Move NaN to end
220 a[i] = a[--end];
221 a[end] = v;
222 }
223 }
224
225 // Partition
226 if (end - fromIndex > 1 && k < end) {
227 QuickSelect.select(a, fromIndex, end - 1, k);
228 }
229
230 // Restore signed zeros
231 if (cn != 0) {
232 // Use partition index below zero to fast-forward to zero as much as possible
233 for (int j = a[k] < 0 ? k : -1;;) {
234 if (a[++j] == 0) {
235 a[j] = -0.0;
236 if (--cn == 0) {
237 break;
238 }
239 }
240 }
241 }
242 }
243
244 /**
245 * Partition the array such that indices {@code k} correspond to their correctly
246 * sorted value in the equivalent fully sorted array.
247 *
248 * <p>This method pre/post-processes the data and indices to respect the ordering
249 * imposed by {@link Double#compare(double, double)}.
250 *
251 * @param fromIndex Index of the first element (inclusive).
252 * @param toIndex Index of the last element (exclusive).
253 * @param a Values.
254 * @param k Indices (may be destructively modified).
255 */
256 private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) {
257 if (k.length == 0 || toIndex - fromIndex <= 1) {
258 return;
259 }
260 // Sort NaN / count signed zeros.
261 // Caution: This loop contributes significantly to the runtime for single indices.
262 int cn = 0;
263 int end = toIndex;
264 for (int i = toIndex; --i >= fromIndex;) {
265 final double v = a[i];
266 // Count negative zeros using a sign bit check
267 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
268 cn++;
269 // Change to positive zero.
270 // Data must be repaired after selection.
271 a[i] = 0.0;
272 } else if (v != v) {
273 // Move NaN to end
274 a[i] = a[--end];
275 a[end] = v;
276 }
277 }
278
279 // Partition
280 int n = 0;
281 if (end - fromIndex > 1) {
282 n = k.length;
283 // Filter indices invalidated by NaN check
284 if (end < toIndex) {
285 for (int i = n; --i >= 0;) {
286 final int index = k[i];
287 if (index >= end) {
288 // Move to end
289 k[i] = k[--n];
290 k[n] = index;
291 }
292 }
293 }
294 // Return n, the count of used indices in k.
295 // Use this to post-process zeros.
296 n = QuickSelect.select(a, fromIndex, end - 1, k, n);
297 }
298
299 // Restore signed zeros
300 if (cn != 0) {
301 // Use partition indices below zero to fast-forward to zero as much as possible
302 int j = -1;
303 if (n < 0) {
304 // Binary search on -n sorted indices: hi = (-n) - 1
305 int lo = 0;
306 int hi = ~n;
307 while (lo <= hi) {
308 final int mid = (lo + hi) >>> 1;
309 if (a[k[mid]] < 0) {
310 j = mid;
311 lo = mid + 1;
312 } else {
313 hi = mid - 1;
314 }
315 }
316 } else {
317 // Unsorted, process all indices
318 for (int i = n; --i >= 0;) {
319 if (a[k[i]] < 0) {
320 j = k[i];
321 }
322 }
323 }
324 for (;;) {
325 if (a[++j] == 0) {
326 a[j] = -0.0;
327 if (--cn == 0) {
328 break;
329 }
330 }
331 }
332 }
333 }
334
335 /**
336 * Partition the array such that index {@code k} corresponds to its correctly
337 * sorted value in the equivalent fully sorted array.
338 *
339 * @param a Values.
340 * @param k Index.
341 * @throws IndexOutOfBoundsException if index {@code k} is not within the
342 * sub-range {@code [0, a.length)}
343 */
344 public static void select(int[] a, int k) {
345 IndexSupport.checkIndex(0, a.length, k);
346 if (a.length <= 1) {
347 return;
348 }
349 QuickSelect.select(a, 0, a.length - 1, k);
350 }
351
352 /**
353 * Partition the array such that indices {@code k} correspond to their correctly
354 * sorted value in the equivalent fully sorted array.
355 *
356 * @param a Values.
357 * @param k Indices (may be destructively modified).
358 * @throws IndexOutOfBoundsException if any index {@code k} is not within the
359 * sub-range {@code [0, a.length)}
360 */
361 public static void select(int[] a, int[] k) {
362 IndexSupport.checkIndices(0, a.length, k);
363 if (k.length == 0 || a.length <= 1) {
364 return;
365 }
366 QuickSelect.select(a, 0, a.length - 1, k, k.length);
367 }
368
369 /**
370 * Partition the array such that index {@code k} corresponds to its correctly
371 * sorted value in the equivalent fully sorted array.
372 *
373 * @param a Values.
374 * @param fromIndex Index of the first element (inclusive).
375 * @param toIndex Index of the last element (exclusive).
376 * @param k Index.
377 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
378 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
379 * sub-range {@code [fromIndex, toIndex)}
380 */
381 public static void select(int[] a, int fromIndex, int toIndex, int k) {
382 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
383 IndexSupport.checkIndex(fromIndex, toIndex, k);
384 if (toIndex - fromIndex <= 1) {
385 return;
386 }
387 QuickSelect.select(a, fromIndex, toIndex - 1, k);
388 }
389
390 /**
391 * Partition the array such that indices {@code k} correspond to their correctly
392 * sorted value in the equivalent fully sorted array.
393 *
394 * @param a Values.
395 * @param fromIndex Index of the first element (inclusive).
396 * @param toIndex Index of the last element (exclusive).
397 * @param k Indices (may be destructively modified).
398 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
399 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
400 * sub-range {@code [fromIndex, toIndex)}
401 */
402 public static void select(int[] a, int fromIndex, int toIndex, int[] k) {
403 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
404 IndexSupport.checkIndices(fromIndex, toIndex, k);
405 if (k.length == 0 || toIndex - fromIndex <= 1) {
406 return;
407 }
408 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
409 }
410 }