How to incrementally sample without replacement?(如何在不替换的情况下增量采样?)
问题描述
Python 有 my_sample = random.sample(range(100), 10)
可以从 [0, 100)
随机抽样而不替换.
Python has my_sample = random.sample(range(100), 10)
to randomly sample without replacement from [0, 100)
.
假设我已经采样了 n
个这样的数字,现在我想在不替换的情况下再采样一个(不包括之前采样的任何一个 n
),怎么做超级有效吗?
Suppose I have sampled n
such numbers and now I want to sample one more without replacement (without including any of the previously sampled n
), how to do so super efficiently?
更新:从合理高效"变为超级高效"(但忽略常数因素)
update: changed from "reasonably efficiently" to "super efficiently" (but ignoring constant factors)
推荐答案
OP 读者注意事项:请考虑查看 原先接受的答案 理解逻辑,再理解这个答案.
Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.
Aaaaaand 为完整起见:这是死灵法师的答案的概念,但进行了调整,因此需要一个禁用数字列表作为输入.这与我之前的答案中的代码相同,但我们从forbid
构建了一个状态, 在我们生成数字之前.
Aaaaaand for completeness sake: This is the concept of necromancer’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid
, before we generate numbers.
- 这是时间
O(f+k)
和内存O(f+k)
.显然,这是最快的方法,不需要forbid
(排序/设置)的格式.我认为这在某种程度上使它成为赢家^^. - 如果
forbid
是一个集合,重复猜测的方法用O(k⋅n/(n-(f+k)))
更快,这是非常接近O(k)
对于f+k
不是 非常 接近n
. - 如果
forbid
已排序,我的荒谬算法更快:
- This is time
O(f+k)
and memoryO(f+k)
. Obviously this is the fastest thing possible without requirements towards the format offorbid
(sorted/set). I think this makes this a winner in some way ^^. - If
forbid
is a set, the repeated guessing method is faster withO(k⋅n/(n-(f+k)))
, which is very close toO(k)
forf+k
not very close ton
. - If
forbid
is sorted, my ridiculous algorithm is faster with:
import random
def sample_gen(n, forbid):
state = dict()
track = dict()
for (i, o) in enumerate(forbid):
x = track.get(o, o)
t = state.get(n-i-1, n-i-1)
state[x] = t
track[t] = x
state.pop(n-i-1, None)
track.pop(o, None)
del track
for remaining in xrange(n-len(forbid), 0, -1):
i = random.randrange(remaining)
yield state.get(i, i)
state[i] = state.get(remaining - 1, remaining - 1)
state.pop(remaining - 1, None)
用法:
gen = sample_gen(10, [1, 2, 4, 8])
print gen.next()
print gen.next()
print gen.next()
print gen.next()
这篇关于如何在不替换的情况下增量采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!