If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Table Analyzer and Normalization
Hi,
I'm trying to build a datamart using Access as our database. Ideally I'd do it in SQL server, but I belong to a business group and SQL server would make this a low priority IT project. While we convince the powers-that-be about our specific reporting and analytical needs, I'm tasked with the responsibility of making something work with Access: Now. Currently I can get a flat file dump from our datawarehouse using a reporting tool. It has about 20 columns and has about 1.5million rows for 6 months of data and is 1.5gb in size within Access. We do apply some specific business rules to categorize and do some clean up within Access. I've been trying to unsuccessfully create a database out of this in the hope of bringing down the redundancy present. The hope is that I can fit 12 months of data into the database after it has been normalized into 8 tables in a dimensional model (one fact, 7 dimension tables). To begin, we'd be updating our database only once a month. Table Analyzer seems like a great tool, but it doesn't seem upto the task, I'm getting all kinds of errors (Temporary Disk Full or System Resources exceeded) even when I work with just 50,000 rows of the original data. I was able to make it work once on another database, so I know it works and Excel users can use the Select Query generated to build pivot tables seamlessly without worrying about the joins etc... Would like to know if I'm being too ambitious or if I'm on the wrong path here. Should I just stick with one Front End Db and multiple back end db's to solve our needs? I know that will definitely make the job of incremental monthly updates easier. All suggestions welcome and appreciated, John H. |
#2
|
|||
|
|||
Hi John,
It sounds as you know more than Table Analyzer. Here's what I'd probably do: 1) Work out a normalised (or part-normalised) data structure suitable for the reporting needs. Calculate how many bytes of actual data this would involve over 12 months. If this is much over 1GB (to allow for indexes and other overhead) then a single Access back end won't do. 2) Link (save space by not importing) to the big text file and run a series of append queries to populate the normalised tables. If any of the normalised tables are really big, it seems to help if you remove as many indexes as possible before appending, and re-create them afterwards having first compacted the database. 3) Create a series of queries that give views of the data that the users can base their own queries on. NB1: To get round the size limitation, given that this data will be read-only and there's therefore no worry about maintaining relational integrity against user editing, you could put some of the tables into a separate mdb file and use linked tables to connect the two. Or perhaps you could put one over on your DBA and install MySql somewhere. NB2: If you're faced with humongeous text files, they can be brought down to size (e.g. strip out unwanted fields or records) with text-file tools from the Unix world (see http://unxutils.sourceforge.net/ and http://gnosis.cx/publish/programming/text_utils.html) or a scripting language such as Perl. On Tue, 22 Feb 2005 17:05:03 -0800, "BI_Specialist" wrote: Hi, I'm trying to build a datamart using Access as our database. Ideally I'd do it in SQL server, but I belong to a business group and SQL server would make this a low priority IT project. While we convince the powers-that-be about our specific reporting and analytical needs, I'm tasked with the responsibility of making something work with Access: Now. Currently I can get a flat file dump from our datawarehouse using a reporting tool. It has about 20 columns and has about 1.5million rows for 6 months of data and is 1.5gb in size within Access. We do apply some specific business rules to categorize and do some clean up within Access. I've been trying to unsuccessfully create a database out of this in the hope of bringing down the redundancy present. The hope is that I can fit 12 months of data into the database after it has been normalized into 8 tables in a dimensional model (one fact, 7 dimension tables). To begin, we'd be updating our database only once a month. Table Analyzer seems like a great tool, but it doesn't seem upto the task, I'm getting all kinds of errors (Temporary Disk Full or System Resources exceeded) even when I work with just 50,000 rows of the original data. I was able to make it work once on another database, so I know it works and Excel users can use the Select Query generated to build pivot tables seamlessly without worrying about the joins etc... Would like to know if I'm being too ambitious or if I'm on the wrong path here. Should I just stick with one Front End Db and multiple back end db's to solve our needs? I know that will definitely make the job of incremental monthly updates easier. All suggestions welcome and appreciated, John H. -- John Nurick [Microsoft Access MVP] Please respond in the newgroup and not by email. |
#3
|
|||
|
|||
Hi John,
Thanks a lot for the reply. It appears that I'm on the right track for the present. See specific follow up comments below: "John Nurick" wrote: Hi John, It sounds as you know more than Table Analyzer. Here's what I'd probably do: 1) Work out a normalised (or part-normalised) data structure suitable for the reporting needs. Calculate how many bytes of actual data this would involve over 12 months. If this is much over 1GB (to allow for indexes and other overhead) then a single Access back end won't do. I was succesful finally in creating a 8 table db working with 25k rows in Table Analyzer. For my current level of granularity, one back end might do, otherwise I will store some of the tables in another db and use linked tables to create one view for querying purposes as you suggest. 2) Link (save space by not importing) to the big text file and run a series of append queries to populate the normalised tables. If any of the normalised tables are really big, it seems to help if you remove as many indexes as possible before appending, and re-create them afterwards having first compacted the database. I am linking my big Access table. My normalized dimesion tables have only one index that link to lookup fields in the Fact table (thanks to Table Anlayzer, this is done automatically). Now the challenge seems to be in populating the rest of the 1million odd rows from my source table into this schema via update queries - while still maintaining referential integrity. Basically I've to recreate what Table Analyzer did, by associating the correct lookup values in fact table to new id's in dimension table while I append each of the tables with new data, so that the entire row can be recreated. 3) Create a series of queries that give views of the data that the users can base their own queries on. NB1: To get round the size limitation, given that this data will be read-only and there's therefore no worry about maintaining relational integrity against user editing, you could put some of the tables into a separate mdb file and use linked tables to connect the two. Or perhaps you could put one over on your DBA and install MySql somewhere. Isn't that blasphemous...mentioning MySql in a Microsoft forum? I have in fact installed MySQl and an evaluation copy of SQl server too. Preference is for SQL server, just because the integration between Excel, SQL server is tighter, especially when it comes to building Pivot Tables or building cubes from Excel. NB2: If you're faced with humongeous text files, they can be brought down to size (e.g. strip out unwanted fields or records) with text-file tools from the Unix world (see http://unxutils.sourceforge.net/ and http://gnosis.cx/publish/programming/text_utils.html) or a scripting language such as Perl. John Nurick [Microsoft Access MVP] No such luck regarding reducing the number of fields. If anything if I am successful with this effort, we'll be adding more granular data (weeks, city, zip code etc) to the database. Thanks again for all the suggestions. |
#4
|
|||
|
|||
On Wed, 23 Feb 2005 12:05:05 -0800, "BI_Specialist"
wrote: Hi John, Thanks a lot for the reply. It appears that I'm on the right track for the present. See specific follow up comments below: Isn't that blasphemous...mentioning MySql in a Microsoft forum? I have in fact installed MySQl and an evaluation copy of SQl server too. Preference is for SQL server, just because the integration between Excel, SQL server is tighter, especially when it comes to building Pivot Tables or building cubes from Excel. I haven't been struck by a thunderbolt yetg. Part of being an MVP is being independent of Microsoft and free to suggest what we feel are the best solutions no matter where they come from. I'm far from an expert in either, but ISTM that MySQL's only advantage in the present context is price. MSDE seems to be ruled out by your need to store more than 2GB, but check out MSDE's successor, which is called something like SQL Server 2005 Express, now in beta. IIRC the size limit on this is more than 2GB. NB2: If you're faced with humongeous text files, they can be brought down to size (e.g. strip out unwanted fields or records) with text-file tools from the Unix world (see http://unxutils.sourceforge.net/ and http://gnosis.cx/publish/programming/text_utils.html) or a scripting language such as Perl. More blasphemy on my partg No such luck regarding reducing the number of fields. If anything if I am successful with this effort, we'll be adding more granular data (weeks, city, zip code etc) to the database. With a bit of practice one can actually use either the textutils or a Perl script to normalise an ultra-wide text file into multiple narrower ones that can be imported into related tables in Access. Good luck! -- John Nurick [Microsoft Access MVP] Please respond in the newgroup and not by email. |
#5
|
|||
|
|||
Hi John,
Good to know that you take your independence to heart. I really value your unbiased advice on this issue. Hopefully, all the good that your doing, improves your chances against any natural disasters g SQL Server Express seems to be a good way to go since it has a 4GB limit on file size and is free. As for Perl and the text based utilities, I do know you can do very sophisticated things, but I guess I'm not a programmer at heart. Good at vi, but never made the full transition to perl No wonder I'm find myself struggling at this point in the datamart-building process: I've got all my dimension tables loaded with unique values and the fact table also loaded from the source. Size of the DB is down from 1.3Gb (6months data) to 240Mb after normalization, so I'm confident that I won't hit the 2Gb limit for 12 months. The final step though is to populate the lookup fields in the measures table with the id's in the dimension tables and this is where I'm trying to account for the fact that my source is one big dump with only one id per row. Table Analyzer used composite keys made up of 2-3 fields (from the dimension table) concatenated together as lookup values in the fact table. Except for one dimension table which doesn't have a unique identifier, I believe I can get by with a simpler approach. Each of the dimension tables has an automatically generated ID that I hope to use, as a foreign key/lookup in the fact table. My pseudo code for each lookup_field in the fact table is as follows: update fact set fact.prod_lookup= (select d.prod_id from dimension d, source s where s.product_number=d.product_number) where fact.id= (select s.id from source s ) Access has some funky sql syntax that I have to learn or circumvent to code the above logic. Think I'm on the right track? Thanks again |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Query Building | GERRYM | Running & Setting Up Queries | 5 | August 20th, 2004 02:45 PM |