[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[patches] strfry() doesn't use a level random distribution



strfry() uses the output of __random_r mod the length of the input string to decide which characters to swap. Unfortunately, that yields values biased toward the beginning of the string. For example, say that __random_r returns values from 0..7, and that len-1 is 6. In that case:

random | random % len
-------+-------------
  0    |  0
  1    |  1
  2    |  2
  3    |  3
  4    |  4
  5    |  5
  6    |  0
  7    |  1

The solution is to set a threshold value that is the largest multiple of len-1 that is smaller than the highest value that __random_r can return: 2^31-1. Then, call __random_r until it returns a value less than that threshold.

Side note 1: I don't know how to properly handle the case where strlen(string) >= 2^31 - 1, or even if that's possible.

Side note 2: I don't have an FSF assignment on file. If you want to use this patch, let me know what I need to do and I'll take care of it.

Side note 3: Most importantly, the idea behind this patch is generally true for all code, especially crypto stuff. A cursory grep of eglibc's source tree didn't show any similar code involving __random_r but I didn't spend a long time looking. If random distributions are used elsewhere, they should be audited to make sure they're not subject to the same bias.

Index: strfry.c
===================================================================
--- strfry.c	(revision 8417)
+++ strfry.c	(working copy)
@@ -41,8 +41,14 @@
     for (size_t i = 0; i < len - 1; ++i)
       {
 	int32_t j;
-	__random_r (&rdata, &j);
-	j = j % (len - i) + i;
+        int32_t randmax = 2^31 - 1;
+        randmax = randmax - randmax % (len - i);
+        do
+          {
+	    __random_r (&rdata, &j);
+	    j = j % (len - i);
+          } while (j >= randmax);
+        j = j + i;

 	char c = string[i];
 	string[i] = string[j];

--
Kirk Strauser