If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
multiple data types in normalized database
If one has a normalized database, then different types of measurement data
will not be in different fields, but rather in a structure with measurement.name and measurement.value as the field names. A brief example will illustrate the potential problem. tblMeasurement.lookup: measurement.id, measurement.name, measurement.units, etc. tblMeasurements: measurement.name,measurement.value, plus fields linking to other tables. The problem is that the field measurement.value can only be of one data type (text, numeric, etc.). What if some of the measurement.values are text, some are numeric, and some are logical? If this is the case, it appears that text is the lowest common denominator, and access functions exist that can transform text to numeric or logical in queries. Alternatively, it would be easier to have separate measurement tables for the measurement.names associated with different data types? What is the best way to keep normality but have the correct data types? |
#2
|
|||
|
|||
If text, are we talking discrete values to choose from (e.g. Low, Medium,
High), or free-form text? If discrete values, could you use a lookup table with a Number primary key, a MeasurementID foreign key (what this value applies to), and a MeasurementDescription (the actual text)? That would permit you to store the entires as a Number, which would probably be more efficient. If that is not suitable, take a look at how Duane Hookom suggests handling this kind of data in his At Your Surey database: http://www.rogersaccesslibrary.com/Otherdownload.asp?SampleName='At%20Your%20Survey%2 02000' The sample database illustrates how to store answers to survery questions which can be a mix of quantative and qualitative responses. -- Allen Browne - Microsoft MVP. Perth, Western Australia. Tips for Access users - http://allenbrowne.com/tips.html Reply to group, rather than allenbrowne at mvps dot org. "LAF" wrote in message ... If one has a normalized database, then different types of measurement data will not be in different fields, but rather in a structure with measurement.name and measurement.value as the field names. A brief example will illustrate the potential problem. tblMeasurement.lookup: measurement.id, measurement.name, measurement.units, etc. tblMeasurements: measurement.name,measurement.value, plus fields linking to other tables. The problem is that the field measurement.value can only be of one data type (text, numeric, etc.). What if some of the measurement.values are text, some are numeric, and some are logical? If this is the case, it appears that text is the lowest common denominator, and access functions exist that can transform text to numeric or logical in queries. Alternatively, it would be easier to have separate measurement tables for the measurement.names associated with different data types? What is the best way to keep normality but have the correct data types? |
#3
|
|||
|
|||
On Wed, 9 Feb 2005 22:57:01 -0800, LAF
wrote: If one has a normalized database, then different types of measurement data will not be in different fields, but rather in a structure with measurement.name and measurement.value as the field names. Really? What principle of normalization did you apply to come up with that? What if some of the measurement.values are text, some are numeric, and some are logical? In relational terms, does this translate to "What if the values come from different domains?" -- Mike Sherrill Information Management Systems |
#4
|
|||
|
|||
A concrete example may be useful since Mike Sherrill has identified a
problem. The database in question involves bird banding. There are events based on site, date, and time; bird ID's; and associated junction table. For each event for each bird, a number of things are identified, measured, and sampled. These might be different domains within a particular bird-event. Things identified would be sex, age, breeding condition, molt, disease, ectoparasites. These things identified would include text and logical fields. Things measured could all be numeric. Things sampled, like blood and feathers, could be all text. My initial problem would be solved through 3 tables instead of one. These tables would have one-to-one relationships with each other through the EventID and BirdID combination primary key. Is this the best solution? "Mike Sherrill" wrote: On Wed, 9 Feb 2005 22:57:01 -0800, LAF wrote: If one has a normalized database, then different types of measurement data will not be in different fields, but rather in a structure with measurement.name and measurement.value as the field names. Really? What principle of normalization did you apply to come up with that? What if some of the measurement.values are text, some are numeric, and some are logical? In relational terms, does this translate to "What if the values come from different domains?" -- Mike Sherrill Information Management Systems |
#5
|
|||
|
|||
Forgive me for interjecting but I am doing so partially for my own learning
experience. Is there such a thing as over-normalization? What you are describine sounds akin to havng a Customers table witha CustomerId field and then a Customers_Info table where you store all the contact info. You might have a fields for Name and Value and look like this Customer_Info.Name Customer_Info.Value FirstName John LastNAme Doe Address 123 Some Street City New York State NY I think eveyone who tracks customers has a single customer table whith defined fields for FirstName, LastName, Address, City, etc etc So with your table you might have something like...(forgive my obvious idiocy when it comes to ornathology) measurement.name measurement.value Sex Male Age 2 Molt Yes Disease None Sounds like you are depending on data in one field to defind the data type in another. This would seem overly complicated to query. Since I assume each Event can collect data from multiple birds this is how I would handle it: Tables: Birds, Events and Birds_Events. Birds would contian fields for the stuff that is always true about the bird: BirdID, Breed, Gender, ApproxDOB Events would contain basic data on the Event: EventID, EventDate, EventTime, EventSite Birds_Events would link the two and include all the measured values: BirdID, EventID, Molt, Breeding Condition, Disease, Blood, Feathers. Each bird can be measured at multiple Events. Each Event can be used to measure multiple Birds. This allows you to define each field for its most appropriate data type and makes querying much easie in my opinion. If this isn't right than I have been doing it all wrong. Thanks for entertaining my rant, Tony V "LAF" wrote in message ... A concrete example may be useful since Mike Sherrill has identified a problem. The database in question involves bird banding. There are events based on site, date, and time; bird ID's; and associated junction table. For each event for each bird, a number of things are identified, measured, and sampled. These might be different domains within a particular bird-event. Things identified would be sex, age, breeding condition, molt, disease, ectoparasites. These things identified would include text and logical fields. Things measured could all be numeric. Things sampled, like blood and feathers, could be all text. My initial problem would be solved through 3 tables instead of one. These tables would have one-to-one relationships with each other through the EventID and BirdID combination primary key. Is this the best solution? "Mike Sherrill" wrote: On Wed, 9 Feb 2005 22:57:01 -0800, LAF wrote: If one has a normalized database, then different types of measurement data will not be in different fields, but rather in a structure with measurement.name and measurement.value as the field names. Really? What principle of normalization did you apply to come up with that? What if some of the measurement.values are text, some are numeric, and some are logical? In relational terms, does this translate to "What if the values come from different domains?" -- Mike Sherrill Information Management Systems |
#6
|
|||
|
|||
On Thu, 10 Feb 2005 05:49:04 -0800, LAF
wrote: There are events based on site, date, and time; bird ID's; and associated junction table. For each event for each bird, a number of things are identified, measured, and sampled. These might be different domains within a particular bird-event. I find it hard to believe that the banders talk of bird-events, but I'm not the expert. Are bird ID's just the band numbers? Things identified would be sex, age, breeding condition, molt, disease, ectoparasites. I'm pretty sure a single bird could be diagnosed with more than one disease and more than one kind of ectoparasite during a bird-event. And I suppose a diagnosis of disease could be done weeks later, based on blood work. I'm just guessing about molt, though, because I don't know what the banders say about molt. These things identified would include text and logical fields. Things measured could all be numeric. Things sampled, like blood and feathers, could be all text. My initial problem would be solved through 3 tables instead of one. These tables would have one-to-one relationships with each other through the EventID and BirdID combination primary key. Is this the best solution? I don't know. If you're suggesting tables like [Things_identified], [Things_measured], and [Things_sampled], I'd say not. Post some representative sample data. -- Mike Sherrill Information Management Systems |
#7
|
|||
|
|||
On Thu, 10 Feb 2005 11:52:44 -0600, "Tony Vrolyk" tvrolyk at mlhg dot
net wrote: Is there such a thing as over-normalization? No. There are a lot of ways to go wrong in database design, but "over-normalization" doesn't have a generally accepted meaning. [snip] Sounds like you are depending on data in one field to defind the data type in another. This would seem overly complicated to query. It would be hard. Integrity constraints might be impossible to declare. I'm more concerned with constraints. Birds would contian fields for the stuff that is always true about the bird: BirdID, Breed, Gender, ApproxDOB Wait . . . Events would contain basic data on the Event: EventID, EventDate, EventTime, EventSite In Access, you shouldn't separate date and time into two columns. Birds_Events would link the two and include all the measured values: BirdID, EventID, Molt, Breeding Condition, Disease, Blood, Feathers. Species, gender, and age are all "measured" values. You should probably assess them each time the bird is in hand. If you find a bird's sex has changed from "M" to "F", you know you have a data error. If you just overwrite "M" with "F", you still have a data error, but you'll probably never discover it. An additional problem is that valid values for age might be species-specific. I know that, in field observations, "fourth winter" is an identifiable plumage for some gulls, but not for, say, Carolina Chickadees. I don't know how the OP measures age, and I don't have any idea how to do it myself (other than by plumage). -- Mike Sherrill Information Management Systems |
#8
|
|||
|
|||
I was really questioning the table structure. If one were to take the
original post to it's extreme, no table would ever have more then two fields besides the record ID. Over-normalization seemed an appropriate sounding term for the situation even if it has not generally accepted meaning - In your points you mostly address issues like field types and data locatoin. Maybe Bird tracking is a difficult one to use as an example. Is there such a thing as over-normalization? hey, maybe I just coined a new term - okay maybe not. Sounds like you are depending on data in one field to defind the data type in another. This would seem overly complicated to query. It would be hard. Integrity constraints might be impossible to declare. I'm more concerned with constraints. Not sure what you mean here. Could you clarify? In Access, you shouldn't separate date and time into two columns. True but it was just an example. I was just using the posters example of date and time and I wasn't thinking about that rule at the time. Species, gender, and age are all "measured" values. You should probably assess them each time the bird is in hand. If you find a bird's sex has changed from "M" to "F", you know you have a data error. If you just overwrite "M" with "F", you still have a data error, but you'll probably never discover it. An additional problem is that valid values for age might be species-specific. I know that, in field observations, "fourth winter" is an identifiable plumage for some gulls, but not for, say, Carolina Chickadees. I don't know how the OP measures age, and I don't have any idea how to do it myself (other than by plumage). Again this may be true but it is specific to this application. I am guessing the original poster can make judgments about where the data is tracked I was just using examples to explain in general terms how I wouls structure the tables. Gender never does change, it may be mis-measured. The point was this: Reatlively static data in the Birds Table, Event specific data in the Events Table and measured data in the Birds_Events table. Instead of creating tables with numerous fields for different data the posted wanted two fields - one to name the value and another to store the value itself. This is what I was bringing into question. Again I am just posting for discussion and learning. I am no expert on the subject - just a self-taught access hack - if you call using these groups and a dozen Access support sites being self-taught Tony V |
#9
|
|||
|
|||
On Fri, 11 Feb 2005 09:06:10 -0600, "Tony Vrolyk" tvrolyk at mlhg dot
net wrote: [snippage] I was really questioning the table structure. If one were to take the original post to it's extreme, no table would ever have more then two fields besides the record ID. Over-normalization seemed an appropriate sounding term for the situation even if it has not generally accepted meaning - "Normalization" has a pretty precise meaning in database design. There's such a thing as 1NF, 2NF, and DKNF. There's no such thing as "over NF". The problem here is in hiding metadata--the name of an attribute--within data. It would be hard. Integrity constraints might be impossible to declare. I'm more concerned with constraints. Not sure what you mean here. Could you clarify? Valid values for sex are "male", "female", and "I don't know" (I guess). Let's say that valid values for age are simply "juvenile" and "adult". In order to keep the data right, you'd have to build a validation rule or a CHECK constraint to cover each case, along the lines of "If column1 is 'sex' then column2 is either 'male' or 'female', but if column1 is 'age' then column2 is either 'juvenile' or 'adult', but if column1 is 'ectoparasite' then column2 is ..." It might be impossible to express that as a validation rule (Access has a limit on the length) or as a CHECK constraint (don't know the limits, but I'm sure there are some). OTOH, if the one *column* is "sex", then you can enforce integrity constraints with a simple validation rule, CHECK constraint, or foreign key constraint. Ditto for the other columns. Species, gender, and age are all "measured" values. You should probably assess them each time the bird is in hand. If you find a bird's sex has changed from "M" to "F", you know you have a data error. If you just overwrite "M" with "F", you still have a data error, but you'll probably never discover it. An additional problem is that valid values for age might be species-specific. I know that, in field observations, "fourth winter" is an identifiable plumage for some gulls, but not for, say, Carolina Chickadees. I don't know how the OP measures age, and I don't have any idea how to do it myself (other than by plumage). Again this may be true but it is specific to this application. I am guessing the original poster can make judgments about where the data is tracked Well, we both read the original post. Are you really that confident in the OP's judgment? I'm not. (And that's an observation, not a criticism. Not many of us got this stuff right the first time. I know I certainly didn't.) Instead of creating tables with numerous fields for different data the posted wanted two fields - one to name the value and another to store the value itself. This is what I was bringing into question. And you were right to do that. It's *highly* questionable. -- Mike Sherrill Information Management Systems |
#10
|
|||
|
|||
Thank for all your explanations. The idea of using the validation rules or
CEHCK constraints had not occured to me. I hav eonly used them a few times. Also the whole subject of normalization had been a bit theoritical to me. I think I understand the basics and in msot cases my dbs have been normalized to a large degree but I am sure I have broken some rules in places. I wasn't aware of the specific normalization types 1NF, 2NF and so on. I did a search and found this article - http://www.devshed.com/c/a/MySQL/An-...rmalization/1/ - Assuming it is accuratly written than I am at least designing my dbs to satisfy 3NF which I was happy to find out. Thanks Tony "Mike Sherrill" wrote in message ... On Fri, 11 Feb 2005 09:06:10 -0600, "Tony Vrolyk" tvrolyk at mlhg dot net wrote: [snippage] I was really questioning the table structure. If one were to take the original post to it's extreme, no table would ever have more then two fields besides the record ID. Over-normalization seemed an appropriate sounding term for the situation even if it has not generally accepted meaning - "Normalization" has a pretty precise meaning in database design. There's such a thing as 1NF, 2NF, and DKNF. There's no such thing as "over NF". The problem here is in hiding metadata--the name of an attribute--within data. It would be hard. Integrity constraints might be impossible to declare. I'm more concerned with constraints. Not sure what you mean here. Could you clarify? Valid values for sex are "male", "female", and "I don't know" (I guess). Let's say that valid values for age are simply "juvenile" and "adult". In order to keep the data right, you'd have to build a validation rule or a CHECK constraint to cover each case, along the lines of "If column1 is 'sex' then column2 is either 'male' or 'female', but if column1 is 'age' then column2 is either 'juvenile' or 'adult', but if column1 is 'ectoparasite' then column2 is ..." It might be impossible to express that as a validation rule (Access has a limit on the length) or as a CHECK constraint (don't know the limits, but I'm sure there are some). OTOH, if the one *column* is "sex", then you can enforce integrity constraints with a simple validation rule, CHECK constraint, or foreign key constraint. Ditto for the other columns. Species, gender, and age are all "measured" values. You should probably assess them each time the bird is in hand. If you find a bird's sex has changed from "M" to "F", you know you have a data error. If you just overwrite "M" with "F", you still have a data error, but you'll probably never discover it. An additional problem is that valid values for age might be species-specific. I know that, in field observations, "fourth winter" is an identifiable plumage for some gulls, but not for, say, Carolina Chickadees. I don't know how the OP measures age, and I don't have any idea how to do it myself (other than by plumage). Again this may be true but it is specific to this application. I am guessing the original poster can make judgments about where the data is tracked Well, we both read the original post. Are you really that confident in the OP's judgment? I'm not. (And that's an observation, not a criticism. Not many of us got this stuff right the first time. I know I certainly didn't.) Instead of creating tables with numerous fields for different data the posted wanted two fields - one to name the value and another to store the value itself. This is what I was bringing into question. And you were right to do that. It's *highly* questionable. -- Mike Sherrill Information Management Systems |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
multiple docs, one data source | kp | Mailmerge | 12 | January 31st, 2005 04:41 PM |
Format on data to import to Access tables? (I need your advice) | Niklas Östergren | General Discussion | 5 | December 13th, 2004 02:54 PM |
Is this possible with Excel Chart? | q582gmzhi | Charts and Charting | 1 | September 8th, 2004 03:33 AM |
Adding staff photographs to my database | KK | New Users | 2 | September 3rd, 2004 07:41 AM |
How to create graphs in a monthly report where the base data can change | John Clarke | Charts and Charting | 3 | June 25th, 2004 02:22 AM |