Lets say I've been charged with creating a database for Consumer Price Indexes for Canada (and its provinces) and the US. There are 12 sources that we extract data from for one of our reports and each source provides data differently. For example, A major bank will provide quarterly CPI data while another source like Statistics Canada will provide monthly values. Statistics Canada will also break down the CPI into all of its little baskets, which we also want but the banks and other sources do not.
My question is, do I make 1 database to accommodate them all (with different data structures) or do I create multiple databases to accommodate each source? In Excel we use workbooks for each source so that is why I have to ask. Please let me know if I have expressed this clearly enough.
Kindest Regards:
Michael
Last edited by Mordred; 06-20-2011 at 01:01 PM.
Please leave a message after the beep!
Hard to say without seeing the data, but I think I'd use one database - I'd make separate tables to import each source's data, then set up queries to get the data into the format that you want. I'd assume you'd want it all in the same format, in which case I'd set up one table that would hold the formatted results from all the other tables.
So it'd be something like this:
Major Bank Table --> Query --\
Statistics Canada Table --> Query -->--> Master Table Holding all data.
Other source(s) --> Query --/
I'd clear each source table whenever you get fresh data, import the data into the source table, and then run an append query to format the data and place it into the Master Table.
When I say format, I'm not talking about how the data looks, but rather putting it into what ever organization you want, be it based on dates or whatever.
Does that make sense?
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
Ok, so a table for each source you say. The way I have my tables set up (soon to change) is:
tblSources
- SourceID(PK)
- Source_Name
- A couple more attributes
tblRegions
- RegionID(PK)
- Region_Name
tblEconomic_Variables
- Eco_Var_ID (PK)
- SourcID(FK)
- RegionID(FK)
- Eco_Var_Name
- Qtr_Val
- Mth_Val
tblEconomic_Variables is where I get confused because of the different data provided from different sources. Based of what you see, should I still create tables for each source?
Please leave a message after the beep!
Well, the different table for each source would just be the table into which the raw data is imported into Access. If the raw data all follows the same format, then you wouldn't need a separate table for each.
If one source gives you five columns of data, but another gives you ten columns, obviously you are going to need different tables to import them into.
This would help then to get your data into tables shown above. If one source provides quarterly data, but another provides only monthly data, you obviously can't use the same query to get them from an imported table to your Economic Variables table.
It does look like you did a good job setting up normalized tables.
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
I completely understand what you are saying Dave but I have to ask, do you think I should have just one flat table then for each source instead of the three I have? Something like:
tblStatsCan
• Eco_Var_ID(PK)
• VSeriesNumber
• Source_Name
• Region
• Mth_Val
Please leave a message after the beep!
It's very hard to say without seeing your data, and may be subjective anyway. You want to try to minimize the amount of data that is stored in multiple places. If you use one flat table, are your Source Names and Regions going to repeat a bunch of times?
On the other hand, too much normalization, while good in theory, in practice can be a beast to maintain. I created a database about half a year ago where I made it as normal as possible. It was as close to fourth order normal as could be. However, now it takes me forever to maintain and make user requested changes, and I've realized I should have simplified it (and had duplicate data) because the user's wanted changes so often.
Anyway, I'd probably have 1 regions and 1 sources table that holds all regions and sources data for all the different sources. Then I'd have a table for each source where the only thing that table does is temporarily hold the data from the source when you import it into the database. Lastly, I'd have one master table that hold all data permenantly. I'd use separate querys to get the data from the import tables into the master table.
If you want to upload a database, maybe just leave one record (scrubbed of private data) in each table, and I may be able to get a better idea of what you're looking at.
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
Thanks Dave, I have a database created but the tables are all empty at the moment. I'll need a bit of time to add some data to them but I will.
Please leave a message after the beep!
Ok Dave I thought I would take a different approach to showing you my data. I have uploaded an excel file with the data that is extracted from Statistics Canada (don't worry, all data is public). The very last sheet ("Other Sources(BMO)") is from one of the banks we extract from. As you will see, they are very different. My thinking right now is to have a database specifically for data from Statistics Canada, another database for banks, and then a final database for 4 other forecasting agencies. What do you think?
Please leave a message after the beep!
That sounds good, but are you going to be able to compare data between the databases? Or is that not your aim?
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
The aim of the user is two fold. 1, create corporate quarterly economic analysis reports, and 2, do compare analysis for inter-departmental reports.
Please leave a message after the beep!
I don't know why but uploaded this database makes me feel like a nervous nelly! I've never created a database before that didn't have to do with student enrollments in university or playing with Northwind. At any rate, here it is Dave (or anyone else). I have two tables, a Regions table and a Statistics_Canada_CPI table. I have populated the Regions table but now I need to populate the Statistics_Canada_CPI table but as seen in the previous Cdn&ProvCPI.zip file that I uploaded, there is a lot to load. My question now becomes, how do I do this?
Please leave a message after the beep!
Sorry, I didn't have time to look at this yesterday, but I should today. What is the variable name in the Statistics_Canada_CPI table going to hold? is that Column B in the spreadsheets? How would you determine the region for all tabs BEFORE MBCPI?
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
Hi Dave and thanks for your patience. You are right, the variable name is from column B in the spreadsheets. The region for all tabs before MBCPI would be RegionID 1, Canada.
Please leave a message after the beep!
Is the setup of my tables off? Something seems a miss to me but being new to this, I cannot pinpoint what I am feeling on this.
Please leave a message after the beep!
I think it looks ok. I'll set up how I would import the data real quick, then you can compare it to how you were planning on doing it and see what works better and why.
Is your code running too slowly?
Does your workbook or database have a bunch of duplicate pieces of data?
Have a look at this article to learn the best ways to set up your projects.
It will save both time and effort in the long run!
Dave
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks