OfficeFrustration - View Single Post

**KenSheridan via AccessMonster.com** · May 21st, 2010, 04:17 PM posted to microsoft.public.access.gettingstarted

alia:

The second example of the append query contained a cut-and-paste error. It
should of course have been:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Pt_ppm", Pt_ppm
FROM [TheLinkedCSVFile]
WHERE Pt_ppm IS NOT NULL;

Ken Sheridan
Stafford, England

KenSheridan wrote:
alia:

The principle issue here is that the structure of the .csv file in which the
assay results are being presented is much more akin to a spreadsheet than a
database table. Rather than having a separate column for each result
category these would, in a relational database, be stored as separate rows in
a related table. However, it's not difficult to take the data from the .csv
file and recast it in a format suitable for a database table. Before coming
to that, though, it looks to me like your existing database might be in need
of a little remodelling.

The design of tables in a relational database is governed by a process known
as normalization. I won't go into the details of this (The Wikipedia article
on the subject is pretty good), but essentially normalization is a set of
rules (normal forms) which ensure the elimination of any redundancies which
can leave the database open to inconsistent data. The end result is a set of
related table, each of which models an entity type and whose tables model the
attributes of each entity type without any redundancy. This is achieved by a
process of 'decomposition' of tables, breaking them down into separate tables
so that the attributes (columns) of each contain no redundant information.
Assuming a location might have more than one sample taken a table of the
structure you cite:

location ID | northing | easting | region name | sample number |

contains redundancies because for each sample per location we are told the
easting, northing and region of the location. This allows for
inconsistencies as there is nothing to prevent different value of one or more
of these attributes being entered in separate rows for the same location.
Separate Locations, Regions and Samples tables are needed, e.g.

Regions
....Region

Locations
....LocationID
....LocationName
....Region

Samples
....SampleNumber
....SampleDate
....LocationID

Region in Locations is a foreign key referencing the primary key of Regions,
and LocationID in samples is a foreign key referencing the primary key of
Samples. The tables might well have other non-key columns of course, but the
important thing is that each of these must be a specific attribute of the
entity type which the table models. In the language of the relational model
each must be 'functionally dependent' solely on the whole of the primary key
of the table.

Turning to the assay results a suitable table for this would be:

AssayResults
....SampleNumber
....ResultCategory
....Result

The primary key of this table would be a composite one made up of the two
columns SampleNumber and ResultCategory, each of which are foreign key
columns, the former referencing the primary key of Samples, the latter the
primary key of a ResultCategories table:

ResultCategories
....Resultcategory

This table would have one row for each type of assay result, so the values
might be Au_ppm, Pt_ppm etc.

When it comes to importing the results data from the .csv file you'd link to
the file and use a set of 'append' queries to insert rows into AssayResults,
with a separate query per result category, so for gold you'd use:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Au_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Au_ppm IS NOT NULL;

The "Au_ppm" in quotes is a constant which inserts the text value 'Au_ppm'
into the ResultCategory column, the Au_ppm without quotes is the column in
the linked file which contains the result for gold ppm. You might have a
similar append query for platinum for instance:

INSERT INTO AssayResults
(SampleNumber, ResultCategory, Result)
SELECT [sample number], "Pt_ppm", Au_ppm
FROM [TheLinkedCSVFile]
WHERE Pt_ppm IS NOT NULL;

Whenever you receive a .csv file with new assay results it simply a case of
linking to the .csv file and executing the set of append queries, which can
easily be automated so that they can all be run at a single click of a button
on a form. Even if the same queries were accidentally executed more than
once for the same samples no harm would be done as the violation of the
composite primary key of assay results would prevent the same row being
inserted more than once.

When it comes to making the data available to MapInfo I have no experience of
that particular product. In my own work with environmental data of a broadly
similar structure to yours we used ArcInfo as the GIS. However, given a set
of correctly normalized tables as outlined above it should a simple task to
create a query to return the data in a format compatible with MapInfo's
requirements.

Ken Sheridan
Stafford, England

hi Jeff,

[quoted text clipped - 34 lines]
Jeff Boyce
Microsoft Access MVP

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...arted/201005/1