+ Reply to Thread
Results 1 to 20 of 20

Improving array shuffle

  1. #1
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Improving array shuffle

    After reading an essay by Mike Bostock on the dangers of shuffling arrays in particular ways, I emulated his visualization in Excel.

    I want to visualize whether a shuffle is truly random by visualizing the propensity for any given starting index to end up in another.

    Here's what I came up with:
    Please Login or Register  to view this content.
    The problem is that I don't get what I expect, Fisher-Yates should show an even distribution (with a bit of noise). It doesn't.

    What I don't know is whether this is a problem with my method (most likely), a problem with the shuffle (though I've tried a couple of Chip Pearsons implementations as well as my own - though they exhibit different behaviour), the way random numbers are generated in Excel or something else.

    I've attached for perusal, would be good if someone could spot the error
    Attached Files Attached Files
    Last edited by Kyle123; 09-15-2015 at 11:23 AM.

  2. #2
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Array Shuffle Visualization

    Think I got it, the 1 base arrays were an issue

  3. #3
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Array Shuffle Visualization

    So I'm getting a bit closer on a better shuffle, Chip's (from here http://www.cpearson.com/excel/ShuffleArray.aspx):

    Please Login or Register  to view this content.
    Produces this: http://i.imgur.com/OSLVlsS.png which isn't ideal.

    A raw implementation of Fisher-Yates yields better results:
    Please Login or Register  to view this content.
    Produces this: http://i.imgur.com/u2R0IC9.png, which is exactly what I'm after, unfortunately the use of a worksheet function call has a huuuuuuge impact on performance.

    This leads to my next question, I was under the impression that the VBA Int function would always round down, so should be a replacement for Floor, however I've got artifacts in my visualization showing that something isn't quite right, this is what I've got:
    Please Login or Register  to view this content.
    And these artifacts are consistent in their location:
    http://i.imgur.com/KCYUeh5.png
    http://i.imgur.com/Q0ljZ6I.png

    Any takers on how to improve the algorithm? - latest workbook attached
    Attached Files Attached Files
    Last edited by Kyle123; 09-15-2015 at 11:23 AM. Reason: fixed URL link coding

  4. #4
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    Hy, Kyle,

    How about either of these?

    Please Login or Register  to view this content.
    Test1 always has the numbers 10 to n-1 in each row. Test2 shuffles the numbers 0 to n^2-1.

    An in-situ 2D F-Y shuffle would avoid the need for the copy.

    EDIT: This seems like it's not worth the time:

    Please Login or Register  to view this content.
    Last edited by shg; 09-15-2015 at 12:48 PM. Reason: Corrected dim of Arr in Test1
    Entia non sunt multiplicanda sine necessitate

  5. #5
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    I don't think they do the same thing. I'm looking for the best possible algorithm that shuffles without bias, my Fisher Yates still exhibits bias as shown by the artifacts in the visualisation - this seems to be as a result of the differences in operation between the floor worksheet function and the implicit cast to int (that should always round down).

    In short, I'm not convinced on my sorting algorithm, the question I'm looking to get answered, is:

    Why does this:
    Please Login or Register  to view this content.
    Operate differently to:
    Please Login or Register  to view this content.
    Then pointedly, how does one make a vba solution (as opposed to an Excel object model call) that produces the same result as the worksheet function?

    RE: If N <> J Then

    Agreed, but I needed an established baseline

  6. #6
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Consistently at Column 40, Row 35 (in the spreadsheet) when using the the Int method is a field that is always significantly in variance to the norm over large iterations - it swings both ways.

    It also occurs in "ShuffleArrayInPlace" in the same location once one reverses the loop order (or the inverse location if one leaves as is)

    Similarly Row 47 is consistently regular.

    One wouldn't expect this over a large enough sample size if the sorting were properly random - and indeed it's not present on the Application.Floor method.

    Interestingly removing Randomize gives a better distribution
    Last edited by Kyle123; 09-15-2015 at 01:30 PM. Reason: Clarification

  7. #7
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    That fixes it, misunderstood randomize in this context, only needs to be called once, not on each iteration

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    Quote Originally Posted by Kyle123 View Post
    Interestingly removing Randomize gives a better distribution
    By your appearance measure, yes. In another thread over the last month, someone made the same argument using the same method - a picture. The application was simulating dice rolls.

    Another measure of randomness, though, is predictability, and Rnd has such a short cycle that in about 12 rolls of a single die, you can predict the balance of the rolls. So randomizing in every call is totally appropriate.

  9. #9
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Thanks shg.

    So how is it applied in this case? I ask because maths is not my strong point. Why does randomizing more frequently (or setting a new seed as I understand it to be) offer less random results? If the cycle is short, one would assume that reseeding regularly is required to maintain a random distribution - a short cycle would generate a pattern.

    If one increases frequency of generating a new by placing the randomize call within the inner iteration (rather than before it) the distribution becomes yet less even - this is easily demonstrated through the visualisation.

    How was this demonstrated using a picture? Or do you mean a visual representation as I have?
    Last edited by Kyle123; 09-15-2015 at 04:41 PM.

  10. #10
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    Here's the thread: http://www.mrexcel.com/forum/excel-q...dbetweens.html

    Rnd produces a lovely pattern of numbers, it's just so short than you can look at a few numbers (you only need one if you look at the entire Single) and figure out where it is, so you know what's coming next.

    I would not put the Randomize in the shuffle function. I think it is meritorius that Rnd can be seeded (or the VBA reset) to generate a fixed sequence of random numbers for testing.

    RAND() has a hugely longer cycle (about 27 trillion) than Rnd's 2^24 (~ 17M), but it's about 1000 times slower
    Last edited by shg; 09-15-2015 at 04:49 PM.

  11. #11
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Hmmm, having read that thread I understand your point. What I don't understand though is why I get such consistent results. I understand why Chip's code produces a non desirable shuffle, but I don't understand why the FY method does in this implementation - I know it should work and it does in other environments, but doesn't work to the standard I would expect in Excel.

    Do you have any insight on why I see the expected behaviour when using application.floor vs the implicit cast to int?

  12. #12
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    Int() rounds toward -infinity, and TRUNC() rounds toward 0, but there is no difference for positive numbers.

    Which combination of routines do you find anomalous, Kyle?

    I would add that WF.RandBetween was hosed in UDFs in Excel 2003, and there is a mind-bending anomaly described in http://www.excelforum.com/excel-form...functions.html

    EDIT: And which of Chip's codes?
    Last edited by shg; 09-15-2015 at 05:44 PM.

  13. #13
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    In the workbook, "shufflearrayinplace" is not ideal (it's chips), there is a strong bias against a number remaining where it starts as well as a sliding bias on the last element. Here is the output http://i.imgur.com/OSLVlsS.png note the variance at 11,16 this is the same as below allowing for the off by one (the iteration is reversed)

    Application.Floor provides an even distribution, however take a look at the linked pictures (or run fischeryates using my code in the workbook) and you'll see that there's a bias on row 57, as well as an element at col 40 row 35 in the spreadsheet that is consistently at variance to the norm (can be positive or negative)

    The linked images demonstrate the behaviour, but it's reproduceable by running the code - you can try all three variants in my workbook
    Last edited by Kyle123; 09-15-2015 at 06:07 PM.

  14. #14
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Also this is where I got the idea originally http://bost.ocks.org/mike/shuffle/compare.html

  15. #15
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    In Sub test from post #3, with one iteration, all of the values in matrix are between 0 and 5.

  16. #16
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    While we're here, a 2D shuffle:

    Please Login or Register  to view this content.

  17. #17
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Quote Originally Posted by shg View Post
    In Sub test from post #3, with one iteration, all of the values in matrix are between 0 and 5.
    But not over 500 iterations, however Randomizing once outside the shuffle does indeed fix the issue. The (slight) problem that I have though, is that I still don't know why - I wonder if it's a timing issue. As I understand it Randomize uses the tick count to set the seed, it's therefore possible that over enough iterations there is an overlap of cycles (where the number of random numbers is a finite set) - the application.floor method doesn't suffer from this since it is too slow, the tickcount simply sets the seed too far along the cycle for repetitions to occur. Randomize fixes this since the numbers are used only once.

    Does that sound feasible?

  18. #18
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Thinking further, I think this theory holds (some) weight, using chips algorithm and moving the Randomize call out of the shuffle method removes the same artifacts visible in my method.

    Chip with randomization in the shuffle:
    http://i.imgur.com/A3jTF6h.png
    Chip with no randomization in the shuffle;
    http://i.imgur.com/DeLtf3c.png

    Finally, calibrated (for colour) Fisher Yates - no randomization in shuffle:
    http://i.imgur.com/1uT6U9d.png

  19. #19
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Improving array shuffle

    Quote Originally Posted by Kyle123 View Post
    Does that sound feasible?
    Timer on my 'puter is 256 ticks/s (about 4 ms between ticks), so that loop would use the same value many times. Help sez,

    [when the number argument is omitted] Randomize uses the return value from the Timer function as the new seed value. <...> To repeat sequences of random numbers, call Rnd with a negative argument immediately before using Randomize with a numeric argument. Using Randomize with the same value for number does not repeat the previous sequence.
    ... and a quick test confirms that, but I don't know what other issues it might cause.

    I should have said yesterday that the first thing I did was remove the Randomize statement from your shuffle routine.

    Finally, calibrated (for colour) Fisher Yates - no randomization in shuffle:
    http://i.imgur.com/1uT6U9d.png
    I don't understand how you have the CF set up for that; there isn't the variation across the spectrum of colors that the CF defaults to. Could you post the current workbook and say which routine generates that?
    Last edited by shg; 09-16-2015 at 12:09 PM.

  20. #20
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Improving array shuffle

    Thanks, hmmm dunno then, don't know why it would produce different results in that case. I'll not worry about it, just bear it in mind.

    Calibration is a bit of a strong term, I simply have a cells outside the square (but inside the CF) set to Zero and the maximum possible value a cell may have. Routine is simply FischerYates.

    I can post the workbook if that isn't sufficient.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Shuffle array (Scheduling)
    By hcyeap in forum Excel Programming / VBA / Macros
    Replies: 12
    Last Post: 03-22-2015, 08:59 AM
  2. Visualization Plugin/Add-On Excel
    By markvee in forum Excel General
    Replies: 0
    Last Post: 01-15-2015, 10:43 AM
  3. Interesting visualization idea
    By Ferloft in forum Excel Programming / VBA / Macros
    Replies: 8
    Last Post: 06-27-2013, 10:46 AM
  4. Vacation Visualization
    By paulrh in forum Excel General
    Replies: 5
    Last Post: 03-13-2012, 12:51 PM
  5. Shuffle Array
    By Rik Smith in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 08-24-2005, 12:05 PM
  6. [SOLVED] data visualization
    By utefan001 in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 06-01-2005, 12:05 PM
  7. data visualization
    By [email protected] in forum Excel General
    Replies: 1
    Last Post: 05-31-2005, 11:05 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1