+ Reply to Thread
Results 1 to 8 of 8

Frequency for Histograms in Excel

  1. #1
    Jim
    Guest

    Frequency for Histograms in Excel

    I am using Excel 2002. There is a problem with the frequency function, which
    is used for histograms. The frequency counts for some (not all) bins is
    incorrect. The case that I learned of in another posting involves 101 data
    values: 0, 10, and values in between that are created by using an increment
    of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    the other bins have the correct frequency (10). However, the bin
    representing the interval from 1 to 2 has a frequency of 9, which the bin
    representing the interval from 4 to 5 has a frequency of 11! I have also
    used the "Better Histrogram" downloadable file for this same problem, but
    apparently the "Better Histogram" procedure starts with the Frequency
    function, as it ends up with a histogram with the same results. Any
    suggestions for correctly calculating all bin frequencies?
    --
    Jim

  2. #2
    Jim
    Guest

    RE: Frequency for Histograms in Excel (Added Info)

    In my initial question/posting, I forgot to add that the bins used for my
    example data set are: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10.

    "Jim" wrote:

    > I am using Excel 2002. There is a problem with the frequency function, which
    > is used for histograms. The frequency counts for some (not all) bins is
    > incorrect. The case that I learned of in another posting involves 101 data
    > values: 0, 10, and values in between that are created by using an increment
    > of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    > the other bins have the correct frequency (10). However, the bin
    > representing the interval from 1 to 2 has a frequency of 9, which the bin
    > representing the interval from 4 to 5 has a frequency of 11! I have also
    > used the "Better Histrogram" downloadable file for this same problem, but
    > apparently the "Better Histogram" procedure starts with the Frequency
    > function, as it ends up with a histogram with the same results. Any
    > suggestions for correctly calculating all bin frequencies?
    > --
    > Jim


  3. #3
    Tushar Mehta
    Guest

    Re: Frequency for Histograms in Excel

    Since you don't share the raw data, it is hard to know what the
    frequency function is doing wrong. However, it, by *definition*
    considers the bin value as the upper bound for the bin and not the mid-
    point. So, if that is the problem you allude to, it is an XL design
    feature. If it is something else, maybe you can share the data for 1-
    to-5 so that it becomes easier to understand how the FREQUENCY function
    is getting it wrong.

    --
    Regards,

    Tushar Mehta
    www.tushar-mehta.com
    Excel, PowerPoint, and VBA add-ins, tutorials
    Custom MS Office productivity solutions

    In article <[email protected]>,
    [email protected] says...
    > I am using Excel 2002. There is a problem with the frequency function, which
    > is used for histograms. The frequency counts for some (not all) bins is
    > incorrect. The case that I learned of in another posting involves 101 data
    > values: 0, 10, and values in between that are created by using an increment
    > of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    > the other bins have the correct frequency (10). However, the bin
    > representing the interval from 1 to 2 has a frequency of 9, which the bin
    > representing the interval from 4 to 5 has a frequency of 11! I have also
    > used the "Better Histrogram" downloadable file for this same problem, but
    > apparently the "Better Histogram" procedure starts with the Frequency
    > function, as it ends up with a histogram with the same results. Any
    > suggestions for correctly calculating all bin frequencies?
    >


  4. #4
    Jim
    Guest

    Re: Frequency for Histograms in Excel

    Perhaps I was not clear in my original posting, but the data is the following
    set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9,
    2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0,
    6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and
    the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the
    problem. The problem is that theoretically, the calculated bin frequencies
    (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead,
    these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10}

    "Tushar Mehta" wrote:

    > Since you don't share the raw data, it is hard to know what the
    > frequency function is doing wrong. However, it, by *definition*
    > considers the bin value as the upper bound for the bin and not the mid-
    > point. So, if that is the problem you allude to, it is an XL design
    > feature. If it is something else, maybe you can share the data for 1-
    > to-5 so that it becomes easier to understand how the FREQUENCY function
    > is getting it wrong.
    >
    > --
    > Regards,
    >
    > Tushar Mehta
    > www.tushar-mehta.com
    > Excel, PowerPoint, and VBA add-ins, tutorials
    > Custom MS Office productivity solutions
    >
    > In article <[email protected]>,
    > [email protected] says...
    > > I am using Excel 2002. There is a problem with the frequency function, which
    > > is used for histograms. The frequency counts for some (not all) bins is
    > > incorrect. The case that I learned of in another posting involves 101 data
    > > values: 0, 10, and values in between that are created by using an increment
    > > of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    > > the other bins have the correct frequency (10). However, the bin
    > > representing the interval from 1 to 2 has a frequency of 9, which the bin
    > > representing the interval from 4 to 5 has a frequency of 11! I have also
    > > used the "Better Histrogram" downloadable file for this same problem, but
    > > apparently the "Better Histogram" procedure starts with the Frequency
    > > function, as it ends up with a histogram with the same results. Any
    > > suggestions for correctly calculating all bin frequencies?
    > >

    >


  5. #5
    Tushar Mehta
    Guest

    Re: Frequency for Histograms in Excel

    My best guess (and it is only a guess) is that the problem occurs
    because of floating point limitations and is exacerbated by something
    in the FREQUENCY function. I did the following tests:

    Generated 101 numbers in A1:A101 between 0 and 10. Entered 0 in A1,
    =A1+0.1 in A2 and copied A2 down as needed.

    In C1 entered the formula =A1+1. C2 contained =C1+1 and copied C2 to
    C3:C10.

    D1 contained the array formula =FREQUENCY(A1:A101,C1:C10).

    E1 contained the formula =COUNTIF($A$1:$A$101,"<="&C1)-COUNTIF($A$1:$A
    $101,"<="&(C1-1)). E1 was copied down to E2:E10.

    When A1 contains a zero, the COUNTIF formulas returned the correct
    results, whereas FREQUENCY returned the error you found.

    Changing A1 to 60 caused the FREQUENCY function to report 10 in the 61
    bin and 11 in the 65 bin. But, now COUNTIF function reported 9 in the
    64 bin and 11 in the 65 bin.

    Setting the value in A1 to different starting values results in a
    shifting error pattern in the results of the 2 functions.

    However, change the A column so that the numbers are generated with the
    formula =$A$1+0.1*(ROW()-1) and all the results from both FREQUENCY and
    the COUNTIF are always correct.

    --
    Regards,

    Tushar Mehta
    www.tushar-mehta.com
    Excel, PowerPoint, and VBA add-ins, tutorials
    Custom MS Office productivity solutions

    In article <[email protected]>,
    [email protected] says...
    > Perhaps I was not clear in my original posting, but the data is the following
    > set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9,
    > 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0,
    > 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and
    > the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the
    > problem. The problem is that theoretically, the calculated bin frequencies
    > (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead,
    > these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10}
    >
    > "Tushar Mehta" wrote:
    >
    > > Since you don't share the raw data, it is hard to know what the
    > > frequency function is doing wrong. However, it, by *definition*
    > > considers the bin value as the upper bound for the bin and not the mid-
    > > point. So, if that is the problem you allude to, it is an XL design
    > > feature. If it is something else, maybe you can share the data for 1-
    > > to-5 so that it becomes easier to understand how the FREQUENCY function
    > > is getting it wrong.
    > >
    > > --
    > > Regards,
    > >
    > > Tushar Mehta
    > > www.tushar-mehta.com
    > > Excel, PowerPoint, and VBA add-ins, tutorials
    > > Custom MS Office productivity solutions
    > >
    > > In article <[email protected]>,
    > > [email protected] says...
    > > > I am using Excel 2002. There is a problem with the frequency function, which
    > > > is used for histograms. The frequency counts for some (not all) bins is
    > > > incorrect. The case that I learned of in another posting involves 101 data
    > > > values: 0, 10, and values in between that are created by using an increment
    > > > of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    > > > the other bins have the correct frequency (10). However, the bin
    > > > representing the interval from 1 to 2 has a frequency of 9, which the bin
    > > > representing the interval from 4 to 5 has a frequency of 11! I have also
    > > > used the "Better Histrogram" downloadable file for this same problem, but
    > > > apparently the "Better Histogram" procedure starts with the Frequency
    > > > function, as it ends up with a histogram with the same results. Any
    > > > suggestions for correctly calculating all bin frequencies?
    > > >

    > >

    >


  6. #6
    Jerry W. Lewis
    Guest

    Re: Frequency for Histograms in Excel

    If your 101 data elements were the values that you claim, then the bin
    frequencies would be the values that you expect.

    Instead, I suspect that your data elements were created by successively
    adding 0.1 to the previous value. That algorithm will not produce
    exactly the values that you claim, because computers do binary math, and
    ..1 has no exact binary representation. The result of accumulating these
    approximations are that instad of 9.9, you get 9.89999999999998, etc.

    In A1:A101, put the formula =(ROW()-1)/10
    =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10})
    returns {11,10,10,10,10,10,10,10,10,10}

    In B1 put 0, in B2 put =B1+0.1, and copy the B2 formula and paste over
    B3:B101
    =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10})
    returns {11,9,10,10,11,10,10,10,10,10}

    The difference is that the formula in A1:A101 avoids accumulating
    approximations.

    Jerry

    Jim wrote:

    > Perhaps I was not clear in my original posting, but the data is the following
    > set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9, 1.0, 1.1, ... 1.9,
    > 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ... 4.9, 5.0, 5.1,...5.9, 6.0,
    > 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0, 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and
    > the bins are {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the
    > problem. The problem is that theoretically, the calculated bin frequencies
    > (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10}; instead,
    > these calculated bin frequencies are {11,9,10,10, 11, 10,10,10,10,10}
    >
    > "Tushar Mehta" wrote:
    >
    >
    >>Since you don't share the raw data, it is hard to know what the
    >>frequency function is doing wrong. However, it, by *definition*
    >>considers the bin value as the upper bound for the bin and not the mid-
    >>point. So, if that is the problem you allude to, it is an XL design
    >>feature. If it is something else, maybe you can share the data for 1-
    >>to-5 so that it becomes easier to understand how the FREQUENCY function
    >>is getting it wrong.
    >>
    >>--
    >>Regards,
    >>
    >>Tushar Mehta
    >>www.tushar-mehta.com
    >>Excel, PowerPoint, and VBA add-ins, tutorials
    >>Custom MS Office productivity solutions
    >>
    >>In article <[email protected]>,
    >>[email protected] says...
    >>
    >>>I am using Excel 2002. There is a problem with the frequency function, which
    >>>is used for histograms. The frequency counts for some (not all) bins is
    >>>incorrect. The case that I learned of in another posting involves 101 data
    >>>values: 0, 10, and values in between that are created by using an increment
    >>>of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    >>>the other bins have the correct frequency (10). However, the bin
    >>>representing the interval from 1 to 2 has a frequency of 9, which the bin
    >>>representing the interval from 4 to 5 has a frequency of 11! I have also
    >>>used the "Better Histrogram" downloadable file for this same problem, but
    >>>apparently the "Better Histogram" procedure starts with the Frequency
    >>>function, as it ends up with a histogram with the same results. Any
    >>>suggestions for correctly calculating all bin frequencies?
    >>>
    >>>



  7. #7
    Jerry W. Lewis
    Guest

    Re: Frequency for Histograms in Excel

    Alternately, you could use
    =ROUND(B1+0.1,1)
    etc. to avoid accumulating binary approximations.

    Jerry

    Jerry W. Lewis wrote:

    > If your 101 data elements were the values that you claim, then the bin
    > frequencies would be the values that you expect.
    >
    > Instead, I suspect that your data elements were created by successively
    > adding 0.1 to the previous value. That algorithm will not produce
    > exactly the values that you claim, because computers do binary math, and
    > .1 has no exact binary representation. The result of accumulating these
    > approximations are that instad of 9.9, you get 9.89999999999998, etc.
    >
    > In A1:A101, put the formula =(ROW()-1)/10
    > =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10})
    > returns {11,10,10,10,10,10,10,10,10,10}
    >
    > In B1 put 0, in B2 put =B1+0.1, and copy the B2 formula and paste over
    > B3:B101
    > =FREQUENCY(B$1:B$101,{1,2,3,4,5,6,7,8,9,10})
    > returns {11,9,10,10,11,10,10,10,10,10}
    >
    > The difference is that the formula in A1:A101 avoids accumulating
    > approximations.
    >
    > Jerry
    >
    > Jim wrote:
    >
    >> Perhaps I was not clear in my original posting, but the data is the
    >> following set of 101 data elements: {0.0, 0.1, 0.2, 0.3, ..., 0.9,
    >> 1.0, 1.1, ... 1.9, 2.0, 2.1, ...,2.9, 3.0, 3.1, ...3.9, 4.0, 4.1, ...
    >> 4.9, 5.0, 5.1,...5.9, 6.0, 6.1, ...6.9, 7.0, 7.1,...7.9, 8.0,
    >> 8.1,...8.9, 9.0, 9.1,... 9.9, 10.0} and the bins are
    >> {1,2,3,4,5,6,7,8,9,10}. Midpoints is not relevant to the problem.
    >> The problem is that theoretically, the calculated bin frequencies
    >> (beginning with bin 1) should be {11,10,10,10,10,10,10,10,10,10};
    >> instead, these calculated bin frequencies are {11,9,10,10, 11,
    >> 10,10,10,10,10}
    >>
    >> "Tushar Mehta" wrote:
    >>
    >>
    >>> Since you don't share the raw data, it is hard to know what the
    >>> frequency function is doing wrong. However, it, by *definition*
    >>> considers the bin value as the upper bound for the bin and not the mid-
    >>> point. So, if that is the problem you allude to, it is an XL design
    >>> feature. If it is something else, maybe you can share the data for 1-
    >>> to-5 so that it becomes easier to understand how the FREQUENCY
    >>> function is getting it wrong.
    >>>
    >>> --
    >>> Regards,
    >>>
    >>> Tushar Mehta
    >>> www.tushar-mehta.com
    >>> Excel, PowerPoint, and VBA add-ins, tutorials
    >>> Custom MS Office productivity solutions
    >>>
    >>> In article <[email protected]>,
    >>> [email protected] says...
    >>>
    >>>> I am using Excel 2002. There is a problem with the frequency
    >>>> function, which is used for histograms. The frequency counts for
    >>>> some (not all) bins is incorrect. The case that I learned of in
    >>>> another posting involves 101 data values: 0, 10, and values in
    >>>> between that are created by using an increment of 0.1. The
    >>>> frequency value for the first bin is (correctly) 11, and most of the
    >>>> other bins have the correct frequency (10). However, the bin
    >>>> representing the interval from 1 to 2 has a frequency of 9, which
    >>>> the bin representing the interval from 4 to 5 has a frequency of
    >>>> 11! I have also used the "Better Histrogram" downloadable file for
    >>>> this same problem, but apparently the "Better Histogram" procedure
    >>>> starts with the Frequency function, as it ends up with a histogram
    >>>> with the same results. Any suggestions for correctly calculating
    >>>> all bin frequencies?



  8. #8
    Jim
    Guest

    RE: Frequency for Histograms in Excel

    Thanks! Tushar Mehta initially suggested that the problem was perhaps caused
    by the internal representation of decimal numbers, and Jerry W. Lewis
    specifically suggested that it was my method of creating data (successive
    increments of 0.1), rather than the Frequency function itself, that caused
    the problem that I noted in my initial posting. Now I no longer view the
    Frequency function with skepticism, because I know that I alone was the
    culprit.

    One suggestion: My original concern resulted from Googling "histogram" +
    "Excel" and reading some earlier email correspondence regarding the Excel
    Frequency function and the data set (0, 0.1, 0.2, ..., 9.9, 10.0). Others
    could easily run across this or similar electronic discussions and might not
    be as lucky as I was in finding a comforting answer. Is there perhaps a
    communication mechanism that Microsoft could use to correct any similar
    misunderstandings, in the community of Excel users, regarding the Frequency
    function? Just a thought!

    "Jim" wrote:

    > I am using Excel 2002. There is a problem with the frequency function, which
    > is used for histograms. The frequency counts for some (not all) bins is
    > incorrect. The case that I learned of in another posting involves 101 data
    > values: 0, 10, and values in between that are created by using an increment
    > of 0.1. The frequency value for the first bin is (correctly) 11, and most of
    > the other bins have the correct frequency (10). However, the bin
    > representing the interval from 1 to 2 has a frequency of 9, which the bin
    > representing the interval from 4 to 5 has a frequency of 11! I have also
    > used the "Better Histrogram" downloadable file for this same problem, but
    > apparently the "Better Histogram" procedure starts with the Frequency
    > function, as it ends up with a histogram with the same results. Any
    > suggestions for correctly calculating all bin frequencies?
    > --
    > Jim


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1