Autonumber Fields

#21 September 26th, 2005, 12:32 PM

Thanks for taking the time to reply. In particular, I now understand that
making a field (or combination of fields) the PK creates what you have been
referring to as a clustered index.
I am going to combine things from several of your replies here. If I use
name and address as the PK, what becomes of relationships if the name and/or
address changes? I would want my PK to be something that will not change.
Also, having two people with the same name at the same address is not at all
unusual. For women in particular the designation Jr., III, etc. is rarely
used, so the names are literally indistinguishable without further
clarification. Birth date is one way of distinguishing the two, but it may
not be appropriate to gather such information. It may be that a comments
field or something of the sort is the only way of telling the two apart. A
mother and daughter under the same roof, whether at a family business or at
home, is far from an unlikely occurence, and one against which I must guard.
I use queries (SQL) for ordering. In a Contacts table or some such listing
of names I generally use name, for instance. I base forms and reports on
the query, not on the table. How does this enter into ordering of data as
you have described it?

wrote in message
oups.com...

BruceM wrote:
Suppose I wanted to create a clustered index in an Access table. How
would
I do that?

That's it! You've hit on the golden question. You create a clustered
index by using the PRIMARY KEY declaration. There is no other way to
create a clustered index in Access/Jet.

If you want a non-nullable unique CONSTRAINT, you use NOT NULL UNIQUE.
If you want a non-nullable unique clustered INDEX, you use PRIMARY KEY.
CONSTRAINTs are all about data integrity (logical). INDEXes are all
about performance (physical).

The term does not appear in Access Help, and discussions of the
subject tend to assume the reader knows what a clustered index is and how
to
create one.

There is info out there but it is easy to miss. One view is that there
is no 'choice' for a table's clustered index, it's either PK order (PK
exists) or data/time order (no PK exists). In SQL Server, for example,
you can explicitly specify NONCLUSTERED. In Access/Jet, CLUSTERED is
implicit, default and compulsory i.e. comes as standard with PK every
time even if you don't want it. The point is, for an autonumber you
*don't* want it.

Here's a couple of relevant articles you may have missed:

ACC2000: Defragment and Compact Database to Improve Performance
http://support.microsoft.com/default...b;en-us;209769

New Features in Microsoft Jet Version 3.0
http://support.microsoft.com/default...b;en-us;137039

Even if one is created, what benefits will I notice?

What are the benefits? Improved performance, especially with queries
that can take advantage of physically contiguous rows e.g. GROUP BY or
BETWEEN constructs. That is assuming you've chosen the PK
appropriately. Conversely, if you've chosen unwisely, e.g. you've made
you autonumber column the PK, you will take a performance hit. Will you
notice? There are too many factors to generalize; you must test. With a
table of 100 rows, I doubt you would be able to *measure* any
performance difference

Regarding John Doe, it may well be a name used by more than one person.
How
does this fit in with clustered indexes? I may need duplication in that
field.

I suppose an autonumber could help you out here i.e. you only need
(last_name, first_name) for you clustered index but you need to satisfy
the UNIQUE attribute that PRIMARY KEY requires. Note the ordinal
position of the columns in the PRIMARY KEY declaration are significant

CREATE TABLE Blah (
first_name VARCHAR(35) DEFAULT '{{NK}}' NOT NULL,
last_name VARCHAR(35) DEFAULT '{{NK}}' NOT NULL,
... (other columns) ...
uniquifier IDENTITY (1,1) NOT NULL,
PRIMARY KEY (last_name, first_name, uniquifier)
);

... However, the autonumber is usually not required because there
should be a natural key i.e. attribute(s) which uniquely identify an
entity. So use the existing key at the end of the PK declaration. Using
an autonumber in place of (rather than in addition to) a natural key
will lead to pain sooner or later.

#22 September 26th, 2005, 02:27 PM

Thank you for the reply. I think the person (or people) who argue for a
"natural" PK have made the point that the PK does not need to be used in the
relationships. In other words, a unique field can be on the one side of a
one-to-many, and the PK can be a separate entity. The point of a PK, in
that way of thinking, is to guard against duplication (e.g. entering the
same person twice) in a way that would not be possible if an individual
field is set to Indexed (No Duplicates).
I would rather use a form's Before Update event for such checking. In
another posting I referred to the real-life situation of a mother and
daughter with the same name in a family-run business. Even if it is
uncommon, it is certainly at least possible. I would want the option of
allowing the apparent duplication in at least some cases. I would need to
reinvent the PK if I was using a multi-field PK for that purpose and
discovered an apparent duplication.
I am inclined to use autonumber or some other fixed value (e.g. Invoice
Number) as the PK. I use code to contrive an incrementing value in the case
of the aforementioned Invoice Number and other situations in which the
number is part of what the user sees. I don't see a need for two guaranteed
unique identifiers in one record. Even in the case of Employee ID numbers I
lean toward a meaningless number, because if the company changes hands they
may switch to a different numbering system for Employee IDs. However, I am
interested in hearing other views on the subject. In the course of the
dialogue I am learning some new things, even if I don't decide to change to
multi-field PKs as a matter of course.

"Amy Blankenship" wrote in message
...

"BruceM" wrote in message
...
What would you do to guarantee uniqueness in a Contacts table or some
such involving names and addresses, in light of the fact that names and
addresses are subject to change?

SQL underlies Access queries. The design grid is a sort of SQL GUI (as I
understand it). So I think you're saying that displayed order (e.g.
sorted by last name) is not what you are talking about when you talk
about physical order. If I understand, you are saying that the structure
of the index determines the order on the disk, not the order in the table
when it is viewed directly.

I have a database that includes an Employees table. The primary key is
the EmployeeID. With it to do over again I might have used something
else, because it is at least possible that they will one day change the
format of EmployeeID, which is just a sequential 4-digit number.

Which is why most people use completely meaningless Autonumber fields as
primary keys. Because you can't change the value, format, or anything
else of a field that is currently being used as a primary key. Also the
autonumber field will usually have a smaller size (on disk, no less) than
a more meaningful key. Therefore, if you are using it in a relationship
or relationships, the other tables will have to store less information
when they are referring to that primary key of this table.

So, for instance, if you had an employeeID that was an autonumber, all of
the other tables that refer to your EmployeeID would have saved 11 bytes
every time they had a foreign key to your employee table, and you could
have stored what is now your employeeID primary key just once, for a total
of just the one 15 byte storage of the employeeID string. This is the
whole point of normalization. Anything that is actually used as data
should just be stored once, with the smallest possible reference to it
from other places that need to relate to the base data.

More than likely you'll eventually have to move to an Autonumber primary
key there for the above listed reasons. Most of us encounter this
situation at least once, and from that point forward we use Autonumber
primary keys, since fixing the problem once it has developed is much more
of a pain than preventing it.

Hope this clarifies;

-Amy

#23 September 26th, 2005, 02:45 PM

Well, the battle between whether to use Natural Keys or Surrogate Keys has
waged for years and we will not resolve it here. But please do not tell me
and others what I do not understand just because our opinions differ.

A Primary Key has two properites: 1) it must be unique, and 2) it must not
be NULL. In any table there may be several fields (or combinations) of
fields which could possibly fulfill these conditions. These are called
Candidate Keys. The job of the database developer is to decide between
these CKs to see which is the absolute best. In the case of the Contacts
table (mentioned in a separate post); last_name, first_name and
postal_address could provide a key, but it is not a really good choice.
While this may provide uniqueness (maybe), there is a real possibility that
a person does not have a postal address (no part of a compound primary key
can be NULL). This is one of the difficulties of finding a good natural
keys to make up the primary key.

Now, there are actually two purposes for uniqueness: 1) Real-world
uniqueness to make sure an entity appears only once in a table, and 2)
Relational uniqueness for relating records between tables. As you noted,
autonumber fields are lousy in terms of real-world uniqueness. However,
they are perfect for creating uniqueness for relating records. In this
sense, the autonumber field is absolutely unique. It can appear ONLY once
and can never be repeated. Further, as a long integer, it is VERY efficient
for relating records compared to a compound primary key composed of text
fields.

So when I recommend autonumbers for primary keys, I also recommend creating
a separate UNIQUE INDEX that is composed of fields from a Natural Key. This
fulfils the function of real-world uniqueness and is more flexible, because
you CAN allow Nulls in a unique index if you want. In this way, both
functions of a Primary Key are fulfilled to their maximum efficiency.

You may not agree with this. I know many respected developers who don't.
Opinions differ, but this does not give you the right to disparage those who
disagree with you.

--
--Roger Carlson
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/...UBED1=ACCESS-L

wrote in message
oups.com...

BruceM wrote:
That you disagree with somebody does not make that person wrong.

That someone has a track record of providing a wide range of assistance
does not make that person right ;-)

In this case, I disagree with the person because they do not understand
what PRIMARY KEY means. Autonumber does not make a good uniquifier, let
alone a good PK (different concepts). Remember this list? the
advantages of using autonumber a

1. Convenience, because it's provided by the 'system';
2. It's an 'efficient' data type;
3. Erm...
4. That's it!

#24 September 26th, 2005, 02:59 PM

BruceM wrote:
If I use
name and address as the PK, what becomes of relationships if the name and/or
address changes?

If your table only has three attributes (e.g. first_name, last_name,
postal_address) and a composite/concatenation does not provide a key
then you don't have a key. You must expand your data model e.g. collect
more information, create an artificial key and expose it (e.g. assign
it to your contact entities). Adding a 'notes' column into the mix is
probably not going to lead to an acceptable solution to the problem.

If you are going down the road of creating your own encoding system,
you should be able to do better than the autonumber algorithm (i.e.
MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have
done with your

CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]')

constraint (it has 'internal' validation i.e. a value other than four
numeric characters is obviously incorrect).

ALTER TABLE Prospects
ADD CONSRAINT fk_prospects__contacts FOREIGN KEY
(contact_first_name, contact_last_name, contact_postal_address)
REFERENCES Contacts
(first_name, last_name, postal_address)
ON DELETE CASCADE
ON UPDATE CASCADE;

As I alluded to before, you may have problems with the postal address
which may not be captured in a suitable way to be used for in a key.

I would want my PK to be something that will not change.

Permanency is a boon but one doesn't always have that luxury.

#25 September 26th, 2005, 03:25 PM

I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a
relationship, would the purpose of your PK be to guard against duplication?
If so, do you regard that as a more efficient use of recources than data
validation code in the form's Before Update event?
Also, if by "expose it" (in reference to an artificial key) you mean show it
to the user, why would that be necessary?

wrote in message
oups.com...

BruceM wrote:
If I use
name and address as the PK, what becomes of relationships if the name
and/or
address changes?

If your table only has three attributes (e.g. first_name, last_name,
postal_address) and a composite/concatenation does not provide a key
then you don't have a key. You must expand your data model e.g. collect
more information, create an artificial key and expose it (e.g. assign
it to your contact entities). Adding a 'notes' column into the mix is
probably not going to lead to an acceptable solution to the problem.

If you are going down the road of creating your own encoding system,
you should be able to do better than the autonumber algorithm (i.e.
MAX+1 or MAX+10000 if something could have gone 'wrong'), as you have
done with your

CHECK(EmployeeID LIKE '[1-9][0-9][0-9][0-9]')

constraint (it has 'internal' validation i.e. a value other than four
numeric characters is obviously incorrect).

ALTER TABLE Prospects
ADD CONSRAINT fk_prospects__contacts FOREIGN KEY
(contact_first_name, contact_last_name, contact_postal_address)
REFERENCES Contacts
(first_name, last_name, postal_address)
ON DELETE CASCADE
ON UPDATE CASCADE;

As I alluded to before, you may have problems with the postal address
which may not be captured in a suitable way to be used for in a key.

I would want my PK to be something that will not change.

Permanency is a boon but one doesn't always have that luxury.

#26 September 26th, 2005, 03:59 PM

Roger Carlson wrote:
the battle between whether to use Natural Keys or Surrogate Keys has
waged for years

Again, you've misunderstood. That's a different debate which I don't
want to do.

A Primary Key has two properites: 1) it must be unique,
and 2) it must not be NULL.

Again, you've misunderstood. I can add to your list of properties: 3)
It has special meaning in the SQL implementation. In the case of
Access/Jet, this special meaning is clustered index. Are you aware of
this special meaning? If you are, why do you think an incrementing
integer with no meaning in the data model makes for a good clustered
index?

Your two properties can be satisfied by constraining with NOT NULL
UNIQUE. There's only one way of creating a clustered index in
Access/Jet and that's using PRIMARY KEY.

As you noted,
autonumber fields are lousy in terms of real-world uniqueness.
However, they are perfect for creating uniqueness for relating records.

Perfect? As I said up thread, the only good things about autonumber is
that it is convenient (because it provided by the 'system') and it is
efficient (data type = INTEGER). Do you really think there is no better
key generation algorithm than autonumber's? If autonumbers were
perfect, we wouldn't see daily posts of the like, 'Why do I have gaps
in my ID numbers?' I can see how 'quick and dirty' may equate to 'good'
given the right circumstances but 'perfect' may be an overstatement.

when I recommend autonumbers for primary keys, I also recommend
creating a separate UNIQUE INDEX that is composed of fields from
a Natural Key. This fulfils the function of real-world uniqueness and
is more flexible, because you CAN allow Nulls in a unique index if
you want.

Why not use a UNIQUE on the autonumber column and use your clustered
index (PRIMARY KEY) for a more worthy candidate?

You may not agree with this. I know many respected developers who don't.
Opinions differ, but this does not give you the right to disparage those who
disagree with you.

I am saying I believe you have misunderstood what PRIMARY KEY means.
All you have said indicates you are unaware of the special clustered
index meaning that PRIMARY KEY has in Access/Jet. I am open minded to
there being another explanation: why do you think an autonumber makes a
good clustered index?

Apologies if you feel disparaged. That was not my intention. We all
make misstatements from time to time.

#27 September 26th, 2005, 04:29 PM

BruceM wrote:
I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a
relationship, would the purpose of your PK be to guard against duplication?

No, it would be the 'field(s) on the one side of one-to-many' that
would guard against duplication. The purpose of the PK would be to
avoid a performance-degrading clustered index and, if possible, to
provide for a performance-enhancing clustered index.

If so, do you regard that as a more efficient use of recources than data
validation code in the form's Before Update event?

I do not rely on front end applications to enforce data integrity and
would strongly discourage such development. If I have a business rule
which Jet cannot enforce via constraints then I would recommend porting
to MSDE. For me, 'efficiency' doesn't come anything close to data
integrity in terms of importance.

Also, if by "expose it" (in reference to an artificial key) you mean show it
to the user, why would that be necessary?

You know Miss N E Person ID=1 of Main Street and Miss N E Person ID=2
of Main Street are different people, but when you speak to one of them
on the phone, how do *they* tell you which one they are?

#28 September 26th, 2005, 04:33 PM

wrote in message
oups.com...

Again, you've misunderstood. I can add to your list of properties: 3)
It has special meaning in the SQL implementation. In the case of
Access/Jet, this special meaning is clustered index. Are you aware of
this special meaning? If you are, why do you think an incrementing
integer with no meaning in the data model makes for a good clustered
index?

I don't. I don't recall talking about clustered indexes at all. I also
believe that a clustered index is a requirement for a primary key.
Certainly, EF Codd said nothing about it. This is an implementation issue,
not a design issue.

Perfect? As I said up thread, the only good things about autonumber is
that it is convenient (because it provided by the 'system') and it is
efficient (data type = INTEGER). Do you really think there is no better
key generation algorithm than autonumber's? If autonumbers were
perfect, we wouldn't see daily posts of the like, 'Why do I have gaps
in my ID numbers?' I can see how 'quick and dirty' may equate to 'good'
given the right circumstances but 'perfect' may be an overstatement.

Yes, perfect. (BTW, autonumbers are Long Integer, not Integer). In terms
of Relational uniqueness, why do you care if gaps develop? This is internal
to the database and the user should never see them. People who ask those
questions don't understand the purpose of an autonumber primary key. It's
got nothing to do with 'quick and dirty'. And in fact, convenience is only
a small part of what makes autonumber PKs perfect. As you will see below,
they also minimize redundant data in the database, make query design and
application design quicker and easier. And if you don't think that's
important, then you are not an application developer.

Why not use a UNIQUE on the autonumber column and use your clustered
index (PRIMARY KEY) for a more worthy candidate?

Because this is backwards. There is no reason to put a Unique constraint on
an autonumber. It is unique by definition. One major problem with compound
primary keys is that it increases the amount of Redundant data. Since
Normalization is the process of removing redundant data, by Codd's
definition, the smallest unique identifier is the best. An autonumber will
provide the smallest amount of redundancy.

In terms of efficiency, not only is the autonumber more efficient for
executing Joins, it is much easier to write queries with autonumber
primary/foreign keys than compound text fields. Consider the Contact table.
Suppose each Contact could have multiple phone numbers. In this case,
proper design dictiates you would need a separate table for the phone
numbers. The tables would look like this:

Contacts PhNumbers
========= ===========
LastName(cpk)-----LastName(cpk fk)
FirstName(cpk)----FirstName(cpk fk)
Address(cpk)------Address(cpk fk)
(more fields) Phone (cpk)

Creating a query would look like this:
SELECT Contacts.*, PHNumbers.Phone
FROM Contacts INNER JOIN PHNumbers ON
(Contacts.Address = PHNumbers.Address) AND
(Contacts.FirstName = PHNumbers.FirstName) AND
(Contacts.LastName = PHNumbers.LastName);

Whereas with an autonumber primary key, you reduce the size of your database
by storing smaller primary/foreign key values, it is also easier to write
queries:

Contacts PhNumbers
========= ===========
ContactID(pk)--| PhoneID (pk)
LastName --ContactID( fk)
FirstName Phone
Address
(more fields)

Creating a query would look like this:
SELECT Contacts.*, PHNumbers.Phone
FROM Contacts INNER JOIN PHNumbers ON
(Contacts.ContactID = PHNumbers.ContactID);

This would be even worse if Contacts participated in a Many-To-Many
relationship.

I am saying I believe you have misunderstood what PRIMARY KEY means.
All you have said indicates you are unaware of the special clustered
index meaning that PRIMARY KEY has in Access/Jet. I am open minded to
there being another explanation: why do you think an autonumber makes a
good clustered index?

I understand quite well what a Primary Key means. By database theory it has
nothing to do with custered indexes and I don't know why you insist upon it.
While a clustered index may be a good thing, it is by no means required by
database theory or for a good database design. Especially when it
complicates things like creating queries, forms, reports and the like.
Database design which complicates application design is not a good thing
either.

Apologies if you feel disparaged. That was not my intention. We all
make misstatements from time to time.

I have made no misstatements. Certainly I disagree with you, but that
doesn't by definition make me wrong.

#29 September 26th, 2005, 04:39 PM

CORRECTION:

I don't. I don't recall talking about clustered indexes at all. I also DO
NOT
believe that a clustered index is a requirement for a primary key.
Certainly, EF Codd said nothing about it. This is an implementation issue,
not a design issue.

--
--Roger Carlson
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/...UBED1=ACCESS-L

#30 September 26th, 2005, 05:03 PM

"I do not rely on front end applications to enforce data integrity and would
strongly discourage such development. If I have a business rule which Jet
cannot enforce via constraints then I would recommend porting to MSDE."

I really don't know what you mean by "porting to MSDE". I found out what
MSDE is, but for my purposes I will find a way to validate the data, whether
at the table level or in the front end. You can strongly discourage using
the front end for such purposes, but have not provided a reason why. If
there is a Spouse First Name field it may be required if the person is
married, but certainly not otherwise. My choice is the front end for such
validation rather than another piece of software. Before Update works for
my purposes. I will continue to use it. Data integrity is not compromised.
The database works smoothly and quickly. If it is "inefficient" it is so on
a level that is not important to me.

"Miss N E Person ID=1 of Main Street and Miss N E Person ID=2 of Main Street
are different people". Then I will figure out a way of telling them apart
that is useful to the person who needs to call or contact one or the other.
Knowing that they are different records in the database (because they have
different ID numbers) is not helpful in telling them apart.

wrote in message
oups.com...

BruceM wrote:
I think I understand that your PK and your field(s) on the one side of
one-to-many may not be the same. If so, and if the PK is not part of a
relationship, would the purpose of your PK be to guard against
duplication?

No, it would be the 'field(s) on the one side of one-to-many' that
would guard against duplication. The purpose of the PK would be to
avoid a performance-degrading clustered index and, if possible, to
provide for a performance-enhancing clustered index.

If so, do you regard that as a more efficient use of recources than data
validation code in the form's Before Update event?

I do not rely on front end applications to enforce data integrity and
would strongly discourage such development. If I have a business rule
which Jet cannot enforce via constraints then I would recommend porting
to MSDE. For me, 'efficiency' doesn't come anything close to data
integrity in terms of importance.

Also, if by "expose it" (in reference to an artificial key) you mean show
it
to the user, why would that be necessary?

You know Miss N E Person ID=1 of Main Street and Miss N E Person ID=2
of Main Street are different people, but when you speak to one of them
on the phone, how do *they* tell you which one they are?

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Sorting a table by concatenating several fields in the same table	salsaguy	Running & Setting Up Queries	3	March 6th, 2005 08:41 PM
Sorting a table by concatenating several fields in the same table	salsaguy	Running & Setting Up Queries	0	March 6th, 2005 01:33 AM
Additional fields for form based parameter query/null fields	geeksdoitbetter	Running & Setting Up Queries	2	January 7th, 2005 10:05 PM
Selecting Fields for Update	Steve Daigler	Page Layout	4	October 15th, 2004 02:13 PM
My tables lost their AutoNumber fields	Bill Nicholson	Database Design	2	July 2nd, 2004 02:20 AM