If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
Thanks for taking the time to reply. In particular, I now understand that
making a field (or combination of fields) the PK creates what you have been referring to as a clustered index. I am going to combine things from several of your replies here. If I use name and address as the PK, what becomes of relationships if the name and/or address changes? I would want my PK to be something that will not change. Also, having two people with the same name at the same address is not at all unusual. For women in particular the designation Jr., III, etc. is rarely used, so the names are literally indistinguishable without further clarification. Birth date is one way of distinguishing the two, but it may not be appropriate to gather such information. It may be that a comments field or something of the sort is the only way of telling the two apart. A mother and daughter under the same roof, whether at a family business or at home, is far from an unlikely occurence, and one against which I must guard. I use queries (SQL) for ordering. In a Contacts table or some such listing of names I generally use name, for instance. I base forms and reports on the query, not on the table. How does this enter into ordering of data as you have described it? wrote in message oups.com... BruceM wrote: Suppose I wanted to create a clustered index in an Access table. How would I do that? That's it! You've hit on the golden question. You create a clustered index by using the PRIMARY KEY declaration. There is no other way to create a clustered index in Access/Jet. If you want a non-nullable unique CONSTRAINT, you use NOT NULL UNIQUE. If you want a non-nullable unique clustered INDEX, you use PRIMARY KEY. CONSTRAINTs are all about data integrity (logical). INDEXes are all about performance (physical). The term does not appear in Access Help, and discussions of the subject tend to assume the reader knows what a clustered index is and how to create one. There is info out there but it is easy to miss. One view is that there is no 'choice' for a table's clustered index, it's either PK order (PK exists) or data/time order (no PK exists). In SQL Server, for example, you can explicitly specify NONCLUSTERED. In Access/Jet, CLUSTERED is implicit, default and compulsory i.e. comes as standard with PK every time even if you don't want it. The point is, for an autonumber you *don't* want it. Here's a couple of relevant articles you may have missed: ACC2000: Defragment and Compact Database to Improve Performance http://support.microsoft.com/default...b;en-us;209769 New Features in Microsoft Jet Version 3.0 http://support.microsoft.com/default...b;en-us;137039 Even if one is created, what benefits will I notice? What are the benefits? Improved performance, especially with queries that can take advantage of physically contiguous rows e.g. GROUP BY or BETWEEN constructs. That is assuming you've chosen the PK appropriately. Conversely, if you've chosen unwisely, e.g. you've made you autonumber column the PK, you will take a performance hit. Will you notice? There are too many factors to generalize; you must test. With a table of 100 rows, I doubt you would be able to *measure* any performance difference Regarding John Doe, it may well be a name used by more than one person. How does this fit in with clustered indexes? I may need duplication in that field. I suppose an autonumber could help you out here i.e. you only need (last_name, first_name) for you clustered index but you need to satisfy the UNIQUE attribute that PRIMARY KEY requires. Note the ordinal position of the columns in the PRIMARY KEY declaration are significant CREATE TABLE Blah ( first_name VARCHAR(35) DEFAULT '{{NK}}' NOT NULL, last_name VARCHAR(35) DEFAULT '{{NK}}' NOT NULL, ... (other columns) ... uniquifier IDENTITY (1,1) NOT NULL, PRIMARY KEY (last_name, first_name, uniquifier) ); ... However, the autonumber is usually not required because there should be a natural key i.e. attribute(s) which uniquely identify an entity. So use the existing key at the end of the PK declaration. Using an autonumber in place of (rather than in addition to) a natural key will lead to pain sooner or later. |
#22
|
|||
|
|||
Thank you for the reply. I think the person (or people) who argue for a
"natural" PK have made the point that the PK does not need to be used in the relationships. In other words, a unique field can be on the one side of a one-to-many, and the PK can be a separate entity. The point of a PK, in that way of thinking, is to guard against duplication (e.g. entering the same person twice) in a way that would not be possible if an individual field is set to Indexed (No Duplicates). I would rather use a form's Before Update event for such checking. In another posting I referred to the real-life situation of a mother and daughter with the same name in a family-run business. Even if it is uncommon, it is certainly at least possible. I would want the option of allowing the apparent duplication in at least some cases. I would need to reinvent the PK if I was using a multi-field PK for that purpose and discovered an apparent duplication. I am inclined to use autonumber or some other fixed value (e.g. Invoice Number) as the PK. I use code to contrive an incrementing value in the case of the aforementioned Invoice Number and other situations in which the number is part of what the user sees. I don't see a need for two guaranteed unique identifiers in one record. Even in the case of Employee ID numbers I lean toward a meaningless number, because if the company changes hands they may switch to a different numbering system for Employee IDs. However, I am interested in hearing other views on the subject. In the course of the dialogue I am learning some new things, even if I don't decide to change to multi-field PKs as a matter of course. "Amy Blankenship" wrote in message ... "BruceM" wrote in message ... What would you do to guarantee uniqueness in a Contacts table or some such involving names and addresses, in light of the fact that names and addresses are subject to change? SQL underlies Access queries. The design grid is a sort of SQL GUI (as I understand it). So I think you're saying that displayed order (e.g. sorted by last name) is not what you are talking about when you talk about physical order. If I understand, you are saying that the structure of the index determines the order on the disk, not the order in the table when it is viewed directly. I have a database that includes an Employees table. The primary key is the EmployeeID. With it to do over again I might have used something else, because it is at least possible that they will one day change the format of EmployeeID, which is just a sequential 4-digit number. Which is why most people use completely meaningless Autonumber fields as primary keys. Because you can't change the value, format, or anything else of a field that is currently being used as a primary key. Also the autonumber field will usually have a smaller size (on disk, no less) than a more meaningful key. Therefore, if you are using it in a relationship or relationships, the other tables will have to store less information when they are referring to that primary key of this table. So, for instance, if you had an employeeID that was an autonumber, all of the other tables that refer to your EmployeeID would have saved 11 bytes every time they had a foreign key to your employee table, and you could have stored what is now your employeeID primary key just once, for a total of just the one 15 byte storage of the employeeID string. This is the whole point of normalization. Anything that is actually used as data should just be stored once, with the smallest possible reference to it from other places that need to relate to the base data. More than likely you'll eventually have to move to an Autonumber primary key there for the above listed reasons. Most of us encounter this situation at least once, and from that point forward we use Autonumber primary keys, since fixing the problem once it has developed is much more of a pain than preventing it. Hope this clarifies; -Amy |
#23
|
|||
|
|||
Well, the battle between whether to use Natural Keys or Surrogate Keys has
waged for years and we will not resolve it here. But please do not tell me and others what I do not understand just because our opinions differ. A Primary Key has two properites: 1) it must be unique, and 2) it must not be NULL. In any table there may be several fields (or combinations) of fields which could possibly fulfill these conditions. These are called Candidate Keys. The job of the database developer is to decide between these CKs to see which is the absolute best. In the case of the Contacts table (mentioned in a separate post); last_name, first_name and postal_address could provide a key, but it is not a really good choice. While this may provide uniqueness (maybe), there is a real possibility that a person does not have a postal address (no part of a compound primary key can be NULL). This is one of the difficulties of finding a good natural keys to make up the primary key. Now, there are actually two purposes for uniqueness: 1) Real-world uniqueness to make sure an entity appears only once in a table, and 2) Relational uniqueness for relating records between tables. As you noted, autonumber fields are lousy in terms of real-world uniqueness. However, they are perfect for creating uniqueness for relating records. In this sense, the autonumber field is absolutely unique. It can appear ONLY once and can never be repeated. Further, as a long integer, it is VERY efficient for relating records compared to a compound primary key composed of text fields. So when I recommend autonumbers for primary keys, I also recommend creating a separate UNIQUE INDEX that is composed of fields from a Natural Key. This fulfils the function of real-world uniqueness and is more flexible, because you CAN allow Nulls in a unique index if you want. In this way, both functions of a Primary Key are fulfilled to their maximum efficiency. You may not agree with this. I know many respected developers who don't. Opinions differ, but this does not give you the right to disparage those who disagree with you. -- --Roger Carlson Access Database Samples: www.rogersaccesslibrary.com Want answers to your Access questions in your Email? Free subscription: http://peach.ease.lsoft.com/scripts/...UBED1=ACCESS-L wrote in message oups.com... BruceM wrote: That you disagree with somebody does not make that person wrong. That someone has a track record of providing a wide range of assistance does not make that person right ;-) In this case, I disagree with the person because they do not understand what PRIMARY KEY means. Autonumber does not make a good uniquifier, let alone a good PK (different concepts). Remember this list? the advantages of using autonumber a 1. Convenience, because it's provided by the 'system'; 2. It's an 'efficient' data type; 3. Erm... 4. That's it! |
#24
|
|||
|
|||
BruceM wrote: If I use name and address as the PK, what becomes of relationships if the name and/or address changes? If your table only has three attributes (e.g. first_name, last_name, postal_address) and a composite/concatenation does not provide a key then you don't have a key. You must expand your data model e.g. collect more information, create an artificial key and expose it (e.g. assign it to your contact entities). Adding a 'notes' column into the mix is probably not going to lead to an acceptable solution to the problem. If you are going down the road of creating your own encoding system, you should be able to do better than the autonumber algorithm (i.e. MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have done with your CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]') constraint (it has 'internal' validation i.e. a value other than four numeric characters is obviously incorrect). ALTER TABLE Prospects ADD CONSRAINT fk_prospects__contacts FOREIGN KEY (contact_first_name, contact_last_name, contact_postal_address) REFERENCES Contacts (first_name, last_name, postal_address) ON DELETE CASCADE ON UPDATE CASCADE; As I alluded to before, you may have problems with the postal address which may not be captured in a suitable way to be used for in a key. I would want my PK to be something that will not change. Permanency is a boon but one doesn't always have that luxury. |
#25
|
|||
|
|||
I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a relationship, would the purpose of your PK be to guard against duplication? If so, do you regard that as a more efficient use of recources than data validation code in the form's Before Update event? Also, if by "expose it" (in reference to an artificial key) you mean show it to the user, why would that be necessary? wrote in message oups.com... BruceM wrote: If I use name and address as the PK, what becomes of relationships if the name and/or address changes? If your table only has three attributes (e.g. first_name, last_name, postal_address) and a composite/concatenation does not provide a key then you don't have a key. You must expand your data model e.g. collect more information, create an artificial key and expose it (e.g. assign it to your contact entities). Adding a 'notes' column into the mix is probably not going to lead to an acceptable solution to the problem. If you are going down the road of creating your own encoding system, you should be able to do better than the autonumber algorithm (i.e. MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have done with your CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]') constraint (it has 'internal' validation i.e. a value other than four numeric characters is obviously incorrect). ALTER TABLE Prospects ADD CONSRAINT fk_prospects__contacts FOREIGN KEY (contact_first_name, contact_last_name, contact_postal_address) REFERENCES Contacts (first_name, last_name, postal_address) ON DELETE CASCADE ON UPDATE CASCADE; As I alluded to before, you may have problems with the postal address which may not be captured in a suitable way to be used for in a key. I would want my PK to be something that will not change. Permanency is a boon but one doesn't always have that luxury. |
#26
|
|||
|
|||
Roger Carlson wrote: the battle between whether to use Natural Keys or Surrogate Keys has waged for years Again, you've misunderstood. That's a different debate which I don't want to do. A Primary Key has two properites: 1) it must be unique, and 2) it must not be NULL. Again, you've misunderstood. I can add to your list of properties: 3) It has special meaning in the SQL implementation. In the case of Access/Jet, this special meaning is clustered index. Are you aware of this special meaning? If you are, why do you think an incrementing integer with no meaning in the data model makes for a good clustered index? Your two properties can be satisfied by constraining with NOT NULL UNIQUE. There's only one way of creating a clustered index in Access/Jet and that's using PRIMARY KEY. As you noted, autonumber fields are lousy in terms of real-world uniqueness. However, they are perfect for creating uniqueness for relating records. Perfect? As I said up thread, the only good things about autonumber is that it is convenient (because it provided by the 'system') and it is efficient (data type = INTEGER). Do you really think there is no better key generation algorithm than autonumber's? If autonumbers were perfect, we wouldn't see daily posts of the like, 'Why do I have gaps in my ID numbers?' I can see how 'quick and dirty' may equate to 'good' given the right circumstances but 'perfect' may be an overstatement. when I recommend autonumbers for primary keys, I also recommend creating a separate UNIQUE INDEX that is composed of fields from a Natural Key. This fulfils the function of real-world uniqueness and is more flexible, because you CAN allow Nulls in a unique index if you want. Why not use a UNIQUE on the autonumber column and use your clustered index (PRIMARY KEY) for a more worthy candidate? You may not agree with this. I know many respected developers who don't. Opinions differ, but this does not give you the right to disparage those who disagree with you. I am saying I believe you have misunderstood what PRIMARY KEY means. All you have said indicates you are unaware of the special clustered index meaning that PRIMARY KEY has in Access/Jet. I am open minded to there being another explanation: why do you think an autonumber makes a good clustered index? Apologies if you feel disparaged. That was not my intention. We all make misstatements from time to time. |
#27
|
|||
|
|||
BruceM wrote: I think I understand that your PK and your field(s) on the one side of one-to-many may not be the same. If so, and if the PK is not part of a relationship, would the purpose of your PK be to guard against duplication? No, it would be the 'field(s) on the one side of one-to-many' that would guard against duplication. The purpose of the PK would be to avoid a performance-degrading clustered index and, if possible, to provide for a performance-enhancing clustered index. If so, do you regard that as a more efficient use of recources than data validation code in the form's Before Update event? I do not rely on front end applications to enforce data integrity and would strongly discourage such development. If I have a business rule which Jet cannot enforce via constraints then I would recommend porting to MSDE. For me, 'efficiency' doesn't come anything close to data integrity in terms of importance. Also, if by "expose it" (in reference to an artificial key) you mean show it to the user, why would that be necessary? You know Miss N E Person ID=1 of Main Street and Miss N E Person ID=2 of Main Street are different people, but when you speak to one of them on the phone, how do *they* tell you which one they are? |
#28
|
|||
|
|||
wrote in message
oups.com... Again, you've misunderstood. I can add to your list of properties: 3) It has special meaning in the SQL implementation. In the case of Access/Jet, this special meaning is clustered index. Are you aware of this special meaning? If you are, why do you think an incrementing integer with no meaning in the data model makes for a good clustered index? I don't. I don't recall talking about clustered indexes at all. I also believe that a clustered index is a requirement for a primary key. Certainly, EF Codd said nothing about it. This is an implementation issue, not a design issue. Perfect? As I said up thread, the only good things about autonumber is that it is convenient (because it provided by the 'system') and it is efficient (data type = INTEGER). Do you really think there is no better key generation algorithm than autonumber's? If autonumbers were perfect, we wouldn't see daily posts of the like, 'Why do I have gaps in my ID numbers?' I can see how 'quick and dirty' may equate to 'good' given the right circumstances but 'perfect' may be an overstatement. Yes, perfect. (BTW, autonumbers are Long Integer, not Integer). In terms of Relational uniqueness, why do you care if gaps develop? This is internal to the database and the user should never see them. People who ask those questions don't understand the purpose of an autonumber primary key. It's got nothing to do with 'quick and dirty'. And in fact, convenience is only a small part of what makes autonumber PKs perfect. As you will see below, they also minimize redundant data in the database, make query design and application design quicker and easier. And if you don't think that's important, then you are not an application developer. Why not use a UNIQUE on the autonumber column and use your clustered index (PRIMARY KEY) for a more worthy candidate? Because this is backwards. There is no reason to put a Unique constraint on an autonumber. It is unique by definition. One major problem with compound primary keys is that it increases the amount of Redundant data. Since Normalization is the process of removing redundant data, by Codd's definition, the smallest unique identifier is the best. An autonumber will provide the smallest amount of redundancy. In terms of efficiency, not only is the autonumber more efficient for executing Joins, it is much easier to write queries with autonumber primary/foreign keys than compound text fields. Consider the Contact table. Suppose each Contact could have multiple phone numbers. In this case, proper design dictiates you would need a separate table for the phone numbers. The tables would look like this: Contacts PhNumbers ========= =========== LastName(cpk)-----LastName(cpk fk) FirstName(cpk)----FirstName(cpk fk) Address(cpk)------Address(cpk fk) (more fields) Phone (cpk) Creating a query would look like this: SELECT Contacts.*, PHNumbers.Phone FROM Contacts INNER JOIN PHNumbers ON (Contacts.Address = PHNumbers.Address) AND (Contacts.FirstName = PHNumbers.FirstName) AND (Contacts.LastName = PHNumbers.LastName); Whereas with an autonumber primary key, you reduce the size of your database by storing smaller primary/foreign key values, it is also easier to write queries: Contacts PhNumbers ========= =========== ContactID(pk)--| PhoneID (pk) LastName --ContactID( fk) FirstName Phone Address (more fields) Creating a query would look like this: SELECT Contacts.*, PHNumbers.Phone FROM Contacts INNER JOIN PHNumbers ON (Contacts.ContactID = PHNumbers.ContactID); This would be even worse if Contacts participated in a Many-To-Many relationship. I am saying I believe you have misunderstood what PRIMARY KEY means. All you have said indicates you are unaware of the special clustered index meaning that PRIMARY KEY has in Access/Jet. I am open minded to there being another explanation: why do you think an autonumber makes a good clustered index? I understand quite well what a Primary Key means. By database theory it has nothing to do with custered indexes and I don't know why you insist upon it. While a clustered index may be a good thing, it is by no means required by database theory or for a good database design. Especially when it complicates things like creating queries, forms, reports and the like. Database design which complicates application design is not a good thing either. Apologies if you feel disparaged. That was not my intention. We all make misstatements from time to time. I have made no misstatements. Certainly I disagree with you, but that doesn't by definition make me wrong. |
#29
|
|||
|
|||
CORRECTION:
I don't. I don't recall talking about clustered indexes at all. I also DO NOT believe that a clustered index is a requirement for a primary key. Certainly, EF Codd said nothing about it. This is an implementation issue, not a design issue. -- --Roger Carlson Access Database Samples: www.rogersaccesslibrary.com Want answers to your Access questions in your Email? Free subscription: http://peach.ease.lsoft.com/scripts/...UBED1=ACCESS-L |
#30
|
|||
|
|||
"I do not rely on front end applications to enforce data integrity and would
strongly discourage such development. If I have a business rule which Jet cannot enforce via constraints then I would recommend porting to MSDE." I really don't know what you mean by "porting to MSDE". I found out what MSDE is, but for my purposes I will find a way to validate the data, whether at the table level or in the front end. You can strongly discourage using the front end for such purposes, but have not provided a reason why. If there is a Spouse First Name field it may be required if the person is married, but certainly not otherwise. My choice is the front end for such validation rather than another piece of software. Before Update works for my purposes. I will continue to use it. Data integrity is not compromised. The database works smoothly and quickly. If it is "inefficient" it is so on a level that is not important to me. "Miss N E Person ID=1 of Main Street and Miss N E Person ID=2 of Main Street are different people". Then I will figure out a way of telling them apart that is useful to the person who needs to call or contact one or the other. Knowing that they are different records in the database (because they have different ID numbers) is not helpful in telling them apart. wrote in message oups.com... BruceM wrote: I think I understand that your PK and your field(s) on the one side of one-to-many may not be the same. If so, and if the PK is not part of a relationship, would the purpose of your PK be to guard against duplication? No, it would be the 'field(s) on the one side of one-to-many' that would guard against duplication. The purpose of the PK would be to avoid a performance-degrading clustered index and, if possible, to provide for a performance-enhancing clustered index. If so, do you regard that as a more efficient use of recources than data validation code in the form's Before Update event? I do not rely on front end applications to enforce data integrity and would strongly discourage such development. If I have a business rule which Jet cannot enforce via constraints then I would recommend porting to MSDE. For me, 'efficiency' doesn't come anything close to data integrity in terms of importance. Also, if by "expose it" (in reference to an artificial key) you mean show it to the user, why would that be necessary? You know Miss N E Person ID=1 of Main Street and Miss N E Person ID=2 of Main Street are different people, but when you speak to one of them on the phone, how do *they* tell you which one they are? |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sorting a table by concatenating several fields in the same table | salsaguy | Running & Setting Up Queries | 3 | March 6th, 2005 08:41 PM |
Sorting a table by concatenating several fields in the same table | salsaguy | Running & Setting Up Queries | 0 | March 6th, 2005 01:33 AM |
Additional fields for form based parameter query/null fields | geeksdoitbetter | Running & Setting Up Queries | 2 | January 7th, 2005 10:05 PM |
Selecting Fields for Update | Steve Daigler | Page Layout | 4 | October 15th, 2004 02:13 PM |
My tables lost their AutoNumber fields | Bill Nicholson | Database Design | 2 | July 2nd, 2004 02:20 AM |