# FREQUENCY function returns wrong results

1. ## FREQUENCY function returns wrong results

I have a data range which contains the heights of 150 people. Each height is recorded in meters (with 2 decimals) and in centimeters (no decimals). The range in meters is [1.50, 2.05]. I'm using the Frequency function to calculate frequencies of the heights, both in meters and in centimeters. I do the same with the Countif-Countifs functions in order to verify the results.

The results I get for the heights in centimeters are the same whichever function I use. But the results I get for the heights in meters using the Frequency function differ in two of the categories while the rest are OK. (And the Countifs function gives the correct results). The bins are 1.59, 1.69, 1.79, 1.89, 1.99, over 1.99. For some reason, heights equal to 1.89 are not included in the category (1.79, 1.89] but in the next category (1.89, 1.99].

I've tried this in Excel 2010 and 2013.

There are only 2 people recorded in the data with 1.89 m height and I have placed them in the first rows of the data to make it easier to experiment. When I change these heights to a different number, the Frequency function gives the correct results!

Is this some sort of a bug of the Frequency function or am I missing something?

2. ## Re: FREQUENCY function returns wrong results

Although not an answer to your question, but I should solve this with a pivot table.

See the attached file.

3. ## Re: FREQUENCY function returns wrong results

The values in col B were the result of some calculation or conversion. The 1.89 in col B is not exactly the same as the 1.89 in col J ; it's one LSB larger.

If you select col B, Data > Text to columns > Finish, it fixes the problem.

4. ## Re: FREQUENCY function returns wrong results

Originally Posted by imatzav
the results I get for the heights in meters using the Frequency function differ in two of the categories while the rest are OK. (And the Countifs function gives the correct results).
The constants that appear to be 1.89 in B2 and B3 are actually 1.89 + 2.22E-16 (approximately).

You can see the difference with a formula of the form =B2-ROUND(B2,2)-0 formatted as Scientific. The "redundant" -0 is needed to work around a dubious and inconsistently-applied heuristic in Excel whereby sometimes Excel arbitrarily substitutes exact zero for the actual subtraction result if Excel deems the last two operands to be "close enough" to the same value.

COUNTIF "<=" compares the values rounded to 15 significant digits (1.89).

FREQUENCY compares the exact binary values (1.89 + 2.22E-16).

Since 1.89 + 2.22E-16 is larger than 1.89 (I11), the FREQUENCY count for that bin is 2 less, and the FREQUENCY count for the next bin (up to 1.99) is 2 more.

Apparently, the constants in B2 and B3 are the results of calculations that were copy-and-pasted-value into B2 and B3. That operation preserves the exact binary representation when the copy and paste are in the same Excel instance.

IMHO, the best solution is to return to the source of the calculation and explicitly round to 2 decimal places, if you intend the calculation to be accurate to that many decimal places. (That is, explicitly round to whatever number of decimal places that you intend the calculation to be accurate to.)

5. ## Re: FREQUENCY function returns wrong results

Thank you for your reply oeldere. Indeed, I could use a pivot table. The vlookup is not even necessary, I could simply use the grouping option in the pivot table itself to group Height_m. However, functions have the advantage of automatic updating. Also, although I didn't mention it at first, the Frequency function produces the same results as the Histogram tool in Data Analysis, probably because the Frequency function is used internally to make the calculations. If this is a bug then, one should not trust either of them and should make Histograms only with Pivot Table or Countifs. I'm interested in teaching these options to students and I don't want them to learn erroneous tools! If this is indeed a bug, it surprises me that I have found no references of it on the internet!

6. ## Re: FREQUENCY function returns wrong results

Thank you very much joeu2004! That was totally unexpected! I thought that seeing 1.89 in the formula bar meant 1.89 exactly! Shouldn't it be so?

The heights did actually come from a formula: 1+randbetween(50,105)/100 and then I copied-pasted as values. I used your formula =B2-ROUND(B2,2)-0 and filled down to check. Surprisingly, some, not all heights, have this extra 2.22E-16 (with + or -). Since randbetween produces integers, do you have any idea why this happens? Where did that 2.22E-16 come from? Why only for some results? And most importantly, does this mean that the only way to be sure that what I see in the formula bar is exactly that, is to round always??

8. ## Re: FREQUENCY function returns wrong results

Originally Posted by imatzav
The heights did actually come from a formula: 1+randbetween(50,105)/100 and then I copied-pasted as values. [....] Since randbetween produces integers, do you have any idea why this happens?
RANDBETWEEN produces integers, but RANDBETWEEN(...)/100 produces non-integers in your case.

Whenever we do standard binary arithmetic with non-integers or that produces non-integers (as well as integers greater than 2^53), we might encounter these (relatively) infinitesimal differences from decimal arithmetic results.

I would write one of the following equivalent formulas:

=ROUND(1 + RANDBETWEEN(50,105)/100, 2)
or
=ROUND(RANDBETWEEN(150,205)/100, 2)

Ironically, the latter formula does not require the use of ROUND, at least for that RANDBETWEEN range. But that is a coincidence.

Originally Posted by imatzav
does this mean that the only way to be sure that what I see in the formula bar is exactly that, is to round always?
Well, maybe not "always". But generally yes: IMHO, whenever we intend for a calculation to be accurate to n decimal places, we should explicitly round to that number of decimal places, using the ROUND function. And to that number of decimal places, not to an arbitrary number of decimal places like 10, as some people suggest.

The point is: we do not always intend for a calculation to be accurate to n decimal places.

First, usually we do not care about subexpressions. (But sometimes we do.) For example, we might write =ROUND(A1-A2+A3-A4,2), but we might not care enough to write =ROUND(ROUND(A1-A2,2)+ROUND(A3-A4,2),2).

Second, sometimes it is more correct not to round formulas. For example, we might not round periodic interest calculations in amortization schedules, so that the schedule matches the amounts that are calculated by standard formulas (FV, PV, NPER, etc).

So there is some judgment to be applied to the decision about whether or not to round.

That is one of many reasons why I deprecate the use of the Excel option "Precision as displayed" (PAD), which some people suggest. For more about that option, see below.

However, I would only round calculations, not constants.

If we merely "re-enter" constants that we suspect have an infinitesimal binary difference from their decimal representation by selecting the cell and pressing f2, then Enter, the infinitesimal difference will go away because the constant is "re-entered" as it appears in the Formula Bar, not as it was produced by copy-and-paste-value.

Originally Posted by imatzav
Where did that 2.22E-16 come from? Why only for some results?
This is not considered to be a bug. It is simply a side-effect of the industry-standard 64-bit binary floating-point representation, which is how Excel chooses to store numeric values.

There is a wealth of information on the internet about anomalies of 64-bit binary floating-point in general and in Excel specifically (although most anomalies are not limited to Excel). But beware: most references have some misinformation. For example, it is not limited to 15 significant digits.

Perhaps the following description will be helpful. I wrote it in another forum recently, in response to a question about why 14.230 - 14.191 is 0.0389999999999997 instead of 0.0390000000000000, where A1 is 14.230, A2 is 14.191, and A3 is =A1-A2.

-----

Do not set the "Precision as displayed" option without taking appropriate precautions. In particular, make a backup copy of the Excel file before setting PAD.

in any and all worksheets in the Excel file, if you enter them with more decimal places than you choose to display. And that can permanently change the behavior and results of formulas that directly or indirectly depend on those altered constants.

Moreover, PAD only affects the final cell value. It does not affect intermediate or final values of expressions within formulas. For example, IF(A1-A2=0.039,TRUE) still returns FALSE(!), even when PAD is set.

And "Precision as displayed" is a misnomer: it should be called "Precision as formatted".

If we format A3 as General, Excel displays 0.039. But IF(A3=0.039,TRUE) still returns FALSE(!) because PAD rounds cells that are formatted as General to 15 significant digits, regardless of how they are displayed. And if A4 has the same formula, but it is formatted as Number with 3 decimal places, IF(A3=A4,TRUE) returns FALSE(!) even though both cells appear to be the same.

[....]

As for "why?" [the infinitesimal differences], the explanation is relatively simple: Excel stores numeric values in binary, not in decimal.

Specifically, numbers are represented by the sum of 53 consecutive powers of two ("bits") times an exponential factor. Consequently, most decimal fractions cannot be represented exactly. For example:

14.230 is stored as 14.2300000000000,00426325641456060111522674560546875
14.191 is stored as 14.1910000000000,007247535904753021895885467529296875
A1-A2 is stored as 0.0389999999999997,015720509807579219341278076171875

(I use period for the decimal point and comma to demarcate the first 15 significant digits.)

We cannot see that precision in Excel because Excel arbitrarily limits formatting to the first 15 significant digits (rounded, not truncated) [1], padding any additional decimal places to the right with zeros.

[1] Excel truncates to 15 significant digits only data that we enter manually (or read from a text file).

Moreover, the binary approximation of a decimal fraction depends on the magnitude of the integer part of the number, because some of the 53 bits must be used to represent the integer part.

That is why IF(10.01-10=0.01,TRUE) returns FALSE(!).

10.01 is stored as 10.0099999999999,997868371792719699442386627197265625
10.01-10 is stored as 0.00999999999999978,68371792719699442386627197265625
0.01 is stored as 0.0100000000000000,0020816681711721685132943093776702880859375

IF(ROUND(10.01-10,2)=0.01,TRUE) returns TRUE.

-----

As if the anomalies of standard 64-bit binary floating-point representation are not complicated enough, Excel adds some anomalies of its own with dubious heuristics that are poorly and incompletely described in KB 78113 (click here) [2] under the misleading title "when a value reaches zero".

[2] http://support.microsoft.com/kb/78113

The Excel-specific heuristics are intended to "correct" some of the standard anomalies. However, the Excel heuristics are applied inconsistently, which is my definition of "design defect".

As a consequence, where B1 and B2 differ by an infinitesimal amount:

1. IF(B1=B2,TRUE) might return TRUE, but =B1-B2 formatted as Scientific might display an infinitesimal difference.

2. =B1-B2 might return exact zero (0.00E+0), but IF(B1=B2,TRUE) might return FALSE.

3. =B1-B2 might return exact zero (0.00E+0) and IF(B1=B2,TRUE) might return TRUE, but: IF(B1-B2=0,TRUE) might return FALSE; MATCH(B1,B2,0) might return #N/A (no match); and =B1-B2-0 and =(B1-B2) (!) formatted as Scientific might display an infinitesimal difference. The latter demonstrates the absurd inconsistency of the implementation of the heuristics.

9. ## Re: FREQUENCY function returns wrong results

That is an exhaustive answer joeu2004! Thank you very much for taking the time to write all this useful information. I don't think many Excel users are aware of these issues. I could ask many more questions on these but at least my initial problem is now SOLVED!

10. ## Re: FREQUENCY function returns wrong results

Hi imatzav,

I worked about a half hour on this problem before any other answers had been given. I believed it was a "decimal rounding" problem with base 10 vs base 2 numbers but couldn't prove it to myself. I used to teach and also find bugs in software.

The closest I can come to explaining this using student math is with converting fractions to decimals. 1/2 = .5 (exactly) but 1/3 = .333 (repeating). Some fractions can be exactly converted to decimals and others need a "hat" on the repeating digit(s). Because computers only work with a fixed number of places for digits, they could have some rounding error. Excel isn't flawed but a tool that is correct to lots of digits.

I agree that teaching rules and functions that don't work exactly as imagined is hard, but the imagination is a finer tool that a computer.

Another site that explains "wrong results" is at:

11. ## Re: FREQUENCY function returns wrong results

Hi MarvinP,

I agree with all that but still I would expect from Excel to be more "logical" and consistent with how it treats numbers. I used RANDBETWEEN in a simple formula to get numbers with 2 decimals. The formula bar showed that I did get numbers with 2 decimals but internally there was more to that. OK so far. Then I copied-pasted values only and Excel kept the extra "hidden" digits. How was I supposed to know that? To me, the result of copy-paste values should be exactly the values that appear in the formula bar as if I have typed them using the keyboard, nothing extra hidden. Then I used the countifs and the frequency function and they treated the same input differently. Countifs used the values as they appear in the formula bar and frequency used their binary format, hence the problem. I think there is room for improvement by Microsoft...

12. ## Re: FREQUENCY function returns wrong results

Originally Posted by imatzav
Then I copied-pasted values only and Excel kept the extra "hidden" digits. How was I supposed to know that? To me, the result of copy-paste values should be exactly the values that appear in the formula bar as if I have typed them using the keyboard, nothing extra hidden.
I expect the opposite: when I copy-and-paste-value, I expect the pasted value to be identical to the calculated value that I copied. Otherwise, formulas that depend on the pasted-value would behave differently from the same formulas that depend on the copied value.

But I agree that the Formula Bar should display the actual value. The flaw is not that paste-value retains the hidden digits, but that Excel does not display the hidden digits. Excel formats only up to the first 15 significant decimal digits (rounded).

The IEEE 754 requires that applications format at least 17, not 15, significant decimal digits in order to permit accurate conversion between 64-bit binary floating-point and decimal representations. And in fact, Excel does format up to 17 significant decimal digits in XML files, which are used (compressed) for xlsx and xlsm files.

-----

BTW, VBA has a similar problem. Unlike Excel, we can enter constants that have 17 significant decimal digits (and more, IIRC), and VBA uses all decimal digits to convert to 64-bit binary floating-point. But VBA displays only up to 15 significant decimal digits, like Excel. Consequently, if we edit the VBA line, the constant changes value: it is truncated to the first 15 significant decimal digits.

In order to avoid that problem in VBA, we can use the Val function to ensure that all 17 significant decimal digits are preserved. For example: val("1.8900000000000012").

@imatzav: Of course, that is not your intent, nor is it related to your original problem. I am just using that familiar constant to demonstrate VBA behavior.

Originally Posted by imatzav
Countifs used the values as they appear in the formula bar and frequency used their binary format, hence the problem. I think there is room for improvement by Microsoft.
Yes. But IMHO, the flaw is that COUNTIF compares numerical values rounded to the first 15 significant decimal digits instead of their exact binary value.

Again, part of the problem lies in the fact that Excel only formats up to the first 15 significant decimal digits. Since the "criteria" parameter is input as a string to COUNTIF, COUNTIF "must" compare only the first 15 significant decimal digits of the numerical values in the "range" parameter. Otherwise, some values might never be compared correctly.

I'm sure you would argue that it is unfair to burden the user with understanding these subtle details. I would agree. But again, IMHO, the remedy is for Excel to follow the IEEE 754 standard and format up to 17, not 15, significant decimal digits.

There are currently 1 users browsing this thread. (0 members and 1 guests)