如何在不替换的情况下增量采样?

素材狗 2022-01-01 23:06:04 文章分类：Python问题点击数：次

文章标签

How to incrementally sample without replacement?(如何在不替换的情况下增量采样?)

本文介绍了如何在不替换的情况下增量采样?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python 有 my_sample = random.sample(range(100), 10) 可以从 [0, 100) 随机抽样而不替换.

Python has my_sample = random.sample(range(100), 10) to randomly sample without replacement from [0, 100).

假设我已经采样了 n 个这样的数字，现在我想在不替换的情况下再采样一个(不包括之前采样的任何一个 n)，怎么做超级有效吗?

Suppose I have sampled n such numbers and now I want to sample one more without replacement (without including any of the previously sampled n), how to do so super efficiently?

更新:从合理高效"变为超级高效"(但忽略常数因素)

update: changed from "reasonably efficiently" to "super efficiently" (but ignoring constant factors)

推荐答案

OP 读者注意事项:请考虑查看原先接受的答案理解逻辑，再理解这个答案.

Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

Aaaaaand 为完整起见:这是死灵法师的答案的概念，但进行了调整，因此需要一个禁用数字列表作为输入.这与我之前的答案中的代码相同，但我们从forbid构建了一个状态, 在我们生成数字之前.

Aaaaaand for completeness sake: This is the concept of necromancer’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

这是时间O(f+k)和内存O(f+k).显然，这是最快的方法，不需要forbid(排序/设置)的格式.我认为这在某种程度上使它成为赢家^^.
如果forbid是一个集合，重复猜测的方法用O(k⋅n/(n-(f+k)))更快，这是非常接近 O(k) 对于 f+k 不是非常接近 n.
如果 forbid 已排序，我的荒谬算法更快:

This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
If forbid is sorted, my ridiculous algorithm is faster with:

import random
def sample_gen(n, forbid):
    state = dict()
    track = dict()
    for (i, o) in enumerate(forbid):
        x = track.get(o, o)
        t = state.get(n-i-1, n-i-1)
        state[x] = t
        track[t] = x
        state.pop(n-i-1, None)
        track.pop(o, None)
    del track
    for remaining in xrange(n-len(forbid), 0, -1):
        i = random.randrange(remaining)
        yield state.get(i, i)
        state[i] = state.get(remaining - 1, remaining - 1)
        state.pop(remaining - 1, None)

用法:

gen = sample_gen(10, [1, 2, 4, 8])
print gen.next()
print gen.next()
print gen.next()
print gen.next()

这篇关于如何在不替换的情况下增量采样?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

上一篇：是否有用于打开 SPSS 文件的 Python 模块?

下一篇：matplotlib 中的箱线图:标记和异常值

如何在不替换的情况下增量采样?

问题描述

推荐答案

相关文章

mysql存储中使用while批量插入数据(批量提交和单个

前端传参数进行Mybatis调用mysql存储过程执行返回

使用SpringBoot+EasyExcel+Vue实现excel表格的导入和导出

关于@Bean的使用方式

MyBatisCodeHelperPro最新激活方法(有效方法)

Spring boot 运用策略模式实现避免多次使用if的操

素材狗