+ Reply to Thread
Results 1 to 12 of 12

RANDBETWEEN - maybe not so random

  1. #1
    Forum Contributor
    Join Date
    04-30-2010
    Location
    Denmark
    MS-Off Ver
    Office 365
    Posts
    215

    RANDBETWEEN - maybe not so random

    I have not gone into details with the statistic calculation here, but im quite puzzled by the following.

    I need to generate usernames with 2 letters (from A-Z), followed by 2 numbers (from 0-9) and then again 2 letters, for example AB45DE.

    I made a formula to do that:
    Please Login or Register  to view this content.
    Im not much into the calculations of likelihood, but this should have more than 45 million combinations, ie (26*26*10*10*26*26)=45.697.600

    I then do this for 10.000 rows, hoping for 10.000 unique usernames. But what happens more often than not is that I end up with 1, 2, 3 or more duplicates in the 10.000 rows. This seems very strange and unlikely to me.

    I made a sheet to illustrate this - its generates the names in column A and shows if its duplicated in column B (2 or more is a duplicate), and the sum of duplicates in D1 - it runs by clicking the Randomize button (takes a little while to run) and it converts to values at the end so it will stay the same until next run. Try running it a few times and you will get duplicates.

    Can it really be statistically correct that you get duplicates so often in 10.000 rows, outta 45 million possible combinations or is the Excel RANDBETWEEN really not that random after all?
    Attached Files Attached Files

  2. #2
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,721

    Re: RANDBETWEEN - maybe not so random

    Two things to note. First, software does not generate random numbers. It generates pseudo-random numbers. This means, roughly, that a sequence of numbers should be intractably hard to predict and should approximate a random distribution, but is not the same as true randomness. This is a pretty big topic and I am not a mathematician so am limiting my explanation here.

    Second, even with true random numbers, if you generate 10,000 numbers at random with replacement from a population of 45,697,600, the chances are, by rough calculation, at least 1/9 of having at least two duplicates in that list. (This is a bit back-of-the-envelope but I'll show my work if you're really interested.)

    If you need to generate 10,000 strings that are guaranteed to be unique then I would use VBA code.
    Jeff
    | | |會 |會 |會 |會 | |:| | |會 |會
    Read the rules
    Use code tags to [code]enclose your code![/code]

  3. #3
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    44,046

    Re: RANDBETWEEN - maybe not so random

    Quote Originally Posted by 6StringJazzer View Post
    This is a bit back-of-the-envelope but I'll show my work if you're really interested.
    I am... and would be interested to see... Since I'm not the OP, don't bother if it's going to be a lengthy discourse!!
    Glenn




    None of us get paid for helping you... we do this for fun. So DON'T FORGET to say "Thank You" to all who have freely given some of their time to help YOU.

    Temporary addition of accented to illustrate ongoing problem to the TT: L? fh?ile P?draig sona dhaoibh

  4. #4
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    44,046

    Re: RANDBETWEEN - maybe not so random

    Forget it. I was asleep. The first letter and digit don't count. So it's 26*26*26*10

  5. #5
    Forum Expert ben_hensel's Avatar
    Join Date
    01-26-2012
    Location
    Northeast USA
    MS-Off Ver
    Office 365
    Posts
    2,043

    Re: RANDBETWEEN - maybe not so random

    RAND(), and by extension RANDBETWEEN(), kind of have a reputation for being not all that great. You can read the gossip.

    Specifically to your case though,
    This is in the same family as the Birthday problem of the probability that random events/numbers will duplicate. (The short answer is that duplicates are more likely than naive intuition would assume.)

    If you just want to generate a bunch of mostly-random usernames as a one-off, then I would tell you to generate extras, and then just cull the duplicates out.
    Click the [* Add Reputation] Button to thank people for help!
    Post Attachment: Click [Go Advanced] under the reply box; then click [Manage Attachments] to launch the attachments dialogue.
    Mark Thread Solved: Go to the top of the first post; Select [Thread Tools] dropdown; Pick the [Mark thread as Solved] option.

  6. #6
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,721

    Re: RANDBETWEEN - maybe not so random

    Quote Originally Posted by Glenn Kennedy View Post
    I am... and would be interested to see... Since I'm not the OP, don't bother if it's going to be a lengthy discourse!!
    In writing out my answer I found an error so let's wait on that


    But ben_hensel said it well--sometimes math clashes with intuition. If you have 23 randomly chosen people, chances are 50% that you'll get at least one duplicate birthday.

  7. #7
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: RANDBETWEEN - maybe not so random

    Hi Imbizile,

    Random numbers don't mean they won't repeat. If you think of a 6 side die and you role it twice, will you get the same number again (at random)? Of course you would. Just because you have 45 million possible sides on the die, doesn't mean it won't repeat.

    Now if you want to insure you get unique names, simply filter your B column, in your example, by 1 so any duplicate counts won't show. Click your button and sort column B from Large to Small. Then drop down the B column filter and unselect the 2. You have a good answer then, unique random names.
    One test is worth a thousand opinions.
    Click the * Add Reputation below to say thanks.

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: RANDBETWEEN - maybe not so random

    The probability of picking 10,000 distinct items when multi-selecting from 45,697,600 is ...

    {=PRODUCT(1 - (ROW(INDIRECT("1:10000")) - 1)/45697600)} ~ 33%

    You should see the same result if you enter =randbetween(1, 45697600) down 10,000 rows.

    If you just generate a few extra values they way you are, you could copy, paste as values, and then do Remove Duplicates.
    Last edited by shg; 02-28-2018 at 06:16 PM.
    Entia non sunt multiplicanda sine necessitate

  9. #9
    Forum Expert leelnich's Avatar
    Join Date
    03-20-2017
    Location
    Delaware, USA
    MS-Off Ver
    Office 2016
    Posts
    2,807

    Re: RANDBETWEEN - maybe not so random

    Odds of 1 or more duplicates in 10000 rows : 66.52 %
    (The loop calcs the probability of 10000 uniques, which is then subtracted from 1.)
    Please Login or Register  to view this content.
    Last edited by leelnich; 02-28-2018 at 07:10 PM.
    Clicking the Add Reputation star below helpful posts is a great way to show your appreciation.
    Please mark your threads as SOLVED upon conclusion (Thread Tools above Post # 1). - Lee

  10. #10
    Forum Contributor
    Join Date
    04-30-2010
    Location
    Denmark
    MS-Off Ver
    Office 365
    Posts
    215

    Re: RANDBETWEEN - maybe not so random

    Thanks everyone for the input. And yes, I do realize now that the probability of duplicates happening are actually much higher than first anticipated. I dont actually use the exact setup I have in the demo file here, it was just in order to be able to illustrate what I meant. But I made some adjustments to my actual setup, to make sure duplicates are not appearing (or rather they are generated again if they occour).

    I just thought it was not likely to have that many duplicates, but after some researching, from the inputs here, specifically on the Birthday problem, I can see why it is in fact very likely. This site explains it very well, btw: https://betterexplained.com/articles...thday-paradox/

    I believe the 66,52% probability is the correct one.

    Thanks again everyone.

  11. #11
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: RANDBETWEEN - maybe not so random

    Hi Imbizele,

    I'm smiling when you wrote:
    I believe the 66,52% probability is the correct one.
    shg (who is rarely wrong) said the probability of picking 10000 distinct items while leelnich said "1 or more duplicates". I think they may both be right.

    I used to teach statitistics and there were two types of problems where reading the problem was imperitive. It was putting black and white marbles in a bag and drawing them out. One problem was the probability of drawing marbles out "with replacements" and the other was with NO replacements. Many time students would confuse the two types of games. Reread both answers above and see if they might both be correct.

  12. #12
    Forum Contributor
    Join Date
    04-30-2010
    Location
    Denmark
    MS-Off Ver
    Office 365
    Posts
    215

    Re: RANDBETWEEN - maybe not so random

    Ahh yes I see that now - misread the first one

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Assign random number (RANDBETWEEN) equal number of times
    By dechesne in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 12-20-2016, 10:58 AM
  2. Replies: 10
    Last Post: 07-03-2015, 04:29 PM
  3. HELP: How to get random data from a 'table' using 'randbetween'
    By faizzsheikh in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 06-17-2015, 08:03 PM
  4. Randbetween or Random Number (1,3)
    By itselflearn in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 02-27-2013, 02:34 PM
  5. Replies: 8
    Last Post: 09-21-2011, 06:50 PM
  6. [SOLVED] Re: Non-random numbers generated by excel's data analysis random gener
    By Harlan Grove in forum Excel General
    Replies: 2
    Last Post: 09-13-2005, 12:05 PM
  7. Random numbers not using randbetween()
    By davehill1974 in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 07-07-2005, 12:05 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1