加权随机

什么是“加权随机”

假设你有一个项目列表。项目可以是任何东西。例如，我们可能有一个你喜欢食用的水果和蔬菜列表：['🍌', '🍎', '🥕']。

权重列表代表每个项目的权重（或概率，或重要性）。权重是数字。例如，权重列表[3, 7, 1]表示：

你会更频繁地吃🍎 苹果（在总共3 + 7 + 1 = 11次中选择），
然后你会较少频繁地吃香蕉 🍌（在总共11次中只有3次），
而你真的不喜欢胡萝卜 🥕（在总共11次中只选择1次）。

如果我们用概率来表述，那么权重列表可能是一个浮点数数组，总和为1（即[0.1, 0.5, 0.2, 0.2]）。

在这种情况下的加权随机将是这样一个函数，它会随机从列表中返回一个项目，并且它会考虑每个项目的权重，以便权重较高的项目被选中的频率更高。

函数接口示例：

const items =   [ '🍌', '🍎', '🥕' ];
const weights = [  3,    7,    1  ];

function weightedRandom(items, weights) {
  // 实现代码在这里 ...
}

const nextSnackToEat = weightedRandom(items, weights); // 可能是 '🍎'

加权随机应用

在遗传算法中，加权随机用于“选择”阶段，我们需要根据个体的适应度分数来选择最适/最强个体进行交配和产生下一代更强壮的代。你可以在《500行代码的自驾驶汽车》文章中找到一个例子。
在循环神经网络（RNN）中，当尝试基于下一个字母的概率决定接下来要选择哪个字母（以形成句子）时使用。你可以在使用循环神经网络（RNN）生成食谱 Jupyter笔记本中找到一个例子。
在Nginx负载均衡中，为了更频繁地将HTTP请求发送到权重较高的服务器。
以及其他...

算法

直接方法将是：

根据每个项目的权重重复列表中的每个项目。
从列表中随机选择一个项目。

例如，在我们的水果和蔬菜案例中，我们可以生成以下大小为3 + 7 + 1 = 11的列表：

const items =   [ '🍌', '🍎', '🥕' ];
const weights = [  3,    7,    1  ];

// 根据权重重复项目。
const weightedItems = [
  '🍌', '🍌', '🍌',
  '🍎', '🍎', '🍎', '🍎', '🍎', '🍎', '🍎',
  '🥕',
];

// 现在只需从weightedItems数组中随机选择一个项目。

然而，如你所见，这种方法可能需要大量内存，如果我们在weightedItems列表中有许多需要重复的项目。可以将其视为你需要重复字符串"some-random-string"（18字节）一千万次的情况。仅此数组就需要分配大约180Mb的额外内存空间。

更高效的方法将是：

准备每个项目的累积权重列表（即cumulativeWeights列表，其元素数量与原始weights列表相同）。在我们的案例中，它将如下所示：cumulativeWeights = [3, 10, 11]
从0到最高累积权重值生成随机数randomNumber。在我们的案例中，随机数将在[0..11]范围内。假设我们有一个randomNumber = 8。
从左到右遍历cumulativeWeights列表，并选择第一个大于或等于randomNumber的元素。我们将使用这个元素的索引来从items数组中选择项目。

这种方法背后的思想是，权重较高的项目将“占据”更多的数值空间。因此，随机数落在“较高权重数值桶”的机会更大。

const weights =           [3, 7,  1 ];
const cumulativeWeights = [3, 10, 11];

// 以伪表示法，我们可以这样思考cumulativeWeights数组。
const pseudoCumulativeWeights = [
  1, 2, 3,               // <-- [3] 个数
  4, 5, 6, 7, 8, 9, 10,  // <-- [7] 个数
  11,                    // <-- [1] 个数
];

这里是weightedRandom函数可能实现的一个例子：

/**
 * 根据其权重选择随机项目。
 * 权重较高的项目被选中的频率更高（概率更高）。
 *
 * 例如：
 * - items = ['banana', 'orange', 'apple']
 * - weights = [0, 0.2, 0.8]
 * - weightedRandom(items, weights) 在 80% 的情况下会返回 'apple'，在 20% 的情况下会返回
 * 'orange'，它永远不会返回 'banana'（因为选择香蕉的概率是 0%）
 *
 * @param {any[]} items
 * @param {number[]} weights
 * @returns {{item: any, index: number}}
 */
export default function weightedRandom(items, weights) {
  if (items.length !== weights.length) {
    throw new Error('Items and weights must be of the same size');
  }

  if (!items.length) {
    throw new Error('Items must not be empty');
  }

  // 准备累积权重列表。
  // 例如：
  // - weights = [1, 4, 3]
  // - cumulativeWeights = [1, 5, 8]
  const cumulativeWeights = [];
  for (let i = 0; i < weights.length; i += 1) {
    cumulativeWeights[i] = weights[i] + (cumulativeWeights[i - 1] || 0);
  }

  // 在 [0...sum(weights)] 的范围内生成随机数
  // 例如：
  // - weights = [1, 4, 3]
  // - maxCumulativeWeight = 8
  // - 随机数的范围是 [0...8]
  const maxCumulativeWeight = cumulativeWeights[cumulativeWeights.length - 1];
```javascript
const randomNumber = maxCumulativeWeight * Math.random();

// Picking the random item based on its weight.
// The items with higher weight will be picked more often.
for (let itemIndex = 0; itemIndex < items.length; itemIndex += 1) {
  if (cumulativeWeights[itemIndex] >= randomNumber) {
    return {
      item: items[itemIndex],
      index: itemIndex,
    };
  }
}

实现

查看 weightedRandom.js 文件以了解 weightedRandom() 函数的实现。
查看 weightedRandom.test.js 文件以了解测试用例。