# Determine frequency of a serie and amount of data missing

1. ## Determine frequency of a serie and amount of data missing

Hi all,

Here is my trouble. I compile a list of receipt that contains 4 type of information I'm interested in : Date + Product + Amount + Price.
In this process I discovered that I had some information missing and the periodicity of the receipt differs.
Sometimes I get a Quantity without a Price, sometimes I miss the whole receipt (never received).
On top of that, I might receive 2 receipt for the same product on the same month.

My purpose (yours too if you can help) is to determine how much information I am missing. I am drawing charts with these informations and I want to present an indication of the accuracy of the rendering.

It's easy to figure out the accuracy with the set of data I have provided as an attachment.
But I want an automated solution. Since I have much more products to add to the list.
I must note that frequency of receipt can vary. For example, receipt can suddenly be sent monthly for water if I change my provider. So I'll have series with receipt sent every month and every two month.

I have set up a pivot table and segment to easily display the information.

Goal is to determine how much data is missing in each serie since the first record and today. Expected results are :
- Quantity Water : 2 (July & October 2020).
- Quantity Gas : 1 (February 2020)
- Price Water : 5 (April, July & October 2019 + January, July & October 2020)
- Price Gas : 1 (February 2020)

What does not work :
- counting the number of dates in one serie to figure out frequency. It will not show the receipts I haven't received (or lost).
- counting the number of value of either serie of product. Sometimes I get 2 values the same month which ruins this idea.

Whatever works, I'll use it. I've been struggling with that one for weeks now. Time to admit I need help.

Thx.

2. ## Re: Determine frequency of a serie and amount of data missing

It would seem to me that there will need to be a list of dates for which bills are expected such as modeled in columns V and Z.
V4:V31 are populated using: =IF(V2="","",IF(EDATE(V2,1)< TODAY(),EDATE(V2,1),""))
Z4:Z31 are populated using: =IF(Z2="","",IF(DATE(YEAR(Z2),MONTH(Z2)+3,1)< TODAY(),DATE(YEAR(Z2),MONTH(Z2)+3,1),""))
Note that in both columns the cell in row 3 is manually filled.
In each case the next two columns display the quantity and price for that month.
W3:X31 are populated using: =IF(\$V3="","",IFERROR(1/(1/SUMIFS(Tableau1[Quantity],Tableau1[[Energy]:[Energy]],\$V\$2,Tableau1[[Date]:[Date]],\$V3)),"Missing"))
AA3:AB31 are populated using: =IF(\$V3="","",IFERROR(1/(1/SUMIFS(Tableau1[Quantity],Tableau1[[Energy]:[Energy]],\$V\$2,Tableau1[[Date]:[Date]],\$V3)),"Missing"))
The counts of the missing quantities and prices are displayed in row 1 using: =COUNTIFS(W3:W31,"Missing")
Let us know if you have any questions.

3. ## Re: Determine frequency of a serie and amount of data missing

Hi JeteMc,

Thx for your contribution. I don't fully understand the 1/SUMIFS part, but I eventually will. Some sort of control mechanism I've seen around, and I will get to it one day.

As for your way of solving, that would work. I could have a separate table, though I wouldn't input either Price or Energy, but just count the missing data and sum it up in a Pivot Table so that's it's updated depending of the selection I make (my graphs are all coming from Pivot Table).

My remaining problem is that I can't input the expected date it's coming. Too many different contracts to check. What isn't displayed in my sample for sake of simplification, is that all this information is location dependent and I have lots of different locations in my full set of data.
I have to deduce the frequency of a given set of data and count the possibly missing sets.

What I'm doing now is to :
- check the first date entered in one serie
- check today's date
- count the number of month in between those two dates
- count the number of values between those two dates
- Compute and display the number of missing values according to the result (calculation takes into consideration that periodicity could be monthly or bi-monthly, sometimes every semester)
- calculate and display the percentage of accuracy of the serie.

That works without entering any other information into my tables. But the more data my pivot table contains, the more nonesense this method output, because it always end up being more accurate (both in quantity and percentage) if I group everything, as this way of proceeding always finds (multiple) values monthly in both fields.

I'm beginning to think that my lack of statistical skills (in math in general and excel in particular) is what's limiting me
I'll keep on improving those.

There are currently 1 users browsing this thread. (0 members and 1 guests)