Primary Keys

**onedaywhen**

Lyle Fairfield wrote:
The answer, for Access/Jet the PK determines the (non-maintained)
clustered index, the physical ordering on disk.

From

http://msdn2.microsoft.com/en-us/library/wd9d69b1.aspx

THE CLUSTERED PROPERTY IS IGNORED FOR DATABASES THAT USE THE MICROSOFT
JET DATABASE ENGINE BECAUSE THE JET DATABASE ENGINE DOES NOT SUPPORT
CLUSTERED INDEXES.

No need to shout.

Try reading more widely:

New features in Jet Version 3.0:
http://support.microsoft.com/default.aspx?id=137039

Quote: "Compacting the database now results in the indices being stored

in a clustered-index format. While the clustered index isn't maintained

until the next compact, performance is still improved ... The new
clustered-key compact method is based on the primary key of the table.
New data entered will be in time order."

ACC2000: Defragment and Compact Database to Improve Performance
http://support.microsoft.com/default...b;en-us;209769

Quote: "A disk defragmenter will place all files, including the
database file into contiguous clusters on a hard disk ... If a primary
key exists in the table, compacting re-stores table records into their
Primary Key order. This provides the equivalent of Non-maintained
Clustered Indexes"

I think the phrase 'not supported' is used to convey the fact that in
Jet you cannot specify the clustered index independent of the PRIMARY
KEY as you can in, say, SQL Server. It may just mean that there is no
syntax for CLUSTERED INDEX.

Regardless of what 'not suuported' means, clustered indexes definitely
exist for Jet and PRIMARY KEY is the way to leverage them.

Jamie.

--

**Jamie Collins**

onedaywhen wrote:
No need to shout.

BTW I tend to put SQL keywrods in uppercase e.g. PRIMARY KEY. Sorry if
you thought I was shouting.

Jamie.

--

**David W. Fenton**

"onedaywhen" wrote in
oups.com:

If you've understood the above you should come to the conclusion
that a sole autonumber column will never make a good PRIMARY KEY
in Access/Jet, because a random/incrementing integer/GUID does not
make a good clustered index. I'd suggest that anyone who uses
their autonumber column in a BETWEEN or GROUP BY construct has got
something wrong in design and/or queries. I'd further suggest that
anyone who uses BETWEEN or GROUP BY constructs which do not
include columns that comprise their PKs are likely to have made a
poor choice of PK.

A random PK would result in the placement of records on as many data
pages as possible, thus improving concurrency.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Jamie Collins**

David W. Fenton wrote:
If you've understood the above you should come to the conclusion
that a sole autonumber column will never make a good PRIMARY KEY
in Access/Jet, because a random/incrementing integer/GUID does not
make a good clustered index.

A random PK would result in the placement of records on as many data
pages as possible, thus improving concurrency.

Good point, I was thinking too narrowly.

If my table had columns for surname, initials and telephone number and
my queries predominantly use BETWEEN on the surname column, having the
table physically ordered on telephone number may make my queries
perform worse than if the physical order was on surname (can you
imagine trying to use a paper copy telephone directory ordered on
telephone number g ?!)

As I said, the choice of PK should be determined by the SQL DML e.g.
you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Jamie.

--

**Jamie Collins**

Jamie Collins wrote:
If my table had columns for surname, initials and telephone number and
my queries predominantly use BETWEEN on the surname column, having the
table physically ordered on telephone number may make my queries
perform worse than if the physical order was on surname (can you
imagine trying to use a paper copy telephone directory ordered on
telephone number g ?!)

As I said, the choice of PK should be determined by the SQL DML e.g.
you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Oops! I meant to add:

In other words I want to fetch rows on the same page and contiguous
pages; you want to maximise the chances of the rows each user will be
interested in are on different pages (am I correct?) I think in my
simple contacts example physically ordering on surname would provide
good concurrency as well. Whatever, it's clear we are both thinking
about the Jet implementation (i.e. contiguous storage on disk) when
considering PKs. Can everyone else say the same?

Jamie.

--

**Lyle Fairfield**

Regardless of what 'not suuported' means, clustered indexes definitely
exist for Jet and PRIMARY KEY is the way to leverage them.

One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.
This is a far cry from the convenience and power or a clustered index.

**Jamie Collins**

Lyle Fairfield wrote:
One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.

We seem to be in agreement here i.e. what to consider when choosing a
PK.

This is a far cry from the convenience and power [of] a clustered index.

The gap doesn't appear that wide to me but I'm willing to learn
otherwise. Details?

Thanks,
Jamie.

--

**Lyle Fairfield**

Jamie Collins wrote:
Lyle Fairfield wrote:
One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.

We seem to be in agreement here i.e. what to consider when choosing a
PK.

This is a far cry from the convenience and power [of] a clustered index.

The gap doesn't appear that wide to me but I'm willing to learn
otherwise. Details?

Thanks,
Jamie.

It appears wide to me. A clustered index in SQL-Server is maintained.
No compacting or defragging is required.
In Jet, if the table/file is small, the performance advantages will,
probably, not be noticed.
If the table/file is large, there will be a penalty (time/resources) in
compacting.
Even if we compact, there is, TTBOMK, no guarantee that the compacting
will use contiguous sectors, although we might hope for that if the
disk is new. Pages containing consecutive (from the pimary key point
of view) might be distant from each other (from the disk's point of
view). Defragging is likely to cure this. But defragging is slow. And
defragging may result in the space after the MDB/E being used by
another file. So that, as soon as we update or insert another record it
may again be remoteness (from the disk's point of view) from records
with "adjacent" primary keys. And to correct this remoteness we may
have to compact and defrag.
In a maintained clustered index all of this (we hope) is planned and
managed by the database engine.
This seems to me to be wide gap.
I understand that with a static database, the gap would be considerably
less.
In general, I agree with you that if one were doing a lot of sql work
based on = 'Mainwaring; and = 'Milne' it might, depending on other
needs of the db, it might be efficient to use some primary key that
helped with identifying the records wanted or processed.
I think I have said sufficient about this ...if you reply you shall
have the field to yourself.

**David W. Fenton**

"Jamie Collins" wrote in
ps.com:

David W. Fenton wrote:
If you've understood the above you should come to the
conclusion that a sole autonumber column will never make a good
PRIMARY KEY in Access/Jet, because a random/incrementing
integer/GUID does not make a good clustered index.

A random PK would result in the placement of records on as many
data pages as possible, thus improving concurrency.

Good point, I was thinking too narrowly.

If my table had columns for surname, initials and telephone number
and my queries predominantly use BETWEEN on the surname column,
having the table physically ordered on telephone number may make
my queries perform worse than if the physical order was on surname
(can you imagine trying to use a paper copy telephone directory
ordered on telephone number g ?!)

As I said, the choice of PK should be determined by the SQL DML
e.g. you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Given that with Jet you can only have the one clustered index, I
really think this is a pretty irrelevant consideration. I never use
the BETWEEN operator on anything but date fields, which in the vast
majority of tables could not possibly ever be a candidate for PK
(and very seldom even a candidate for inclusion in a compound
natural key, which wouldn't give you the clustered index benefit,
anyway, unless the date was the first field of the compound key).

So, I just don't see any practical benefit in your pointing out the
performance advantage of the clustered index.

Have you tested SEEKs on non-PK indexes?

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton**

"Jamie Collins" wrote in
ps.com:

Jamie Collins wrote:
If my table had columns for surname, initials and telephone
number and my queries predominantly use BETWEEN on the surname
column, having the table physically ordered on telephone number
may make my queries perform worse than if the physical order was
on surname (can you imagine trying to use a paper copy telephone
directory ordered on telephone number g ?!)

As I said, the choice of PK should be determined by the SQL DML
e.g. you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Oops! I meant to add:

In other words I want to fetch rows on the same page and
contiguous pages; you want to maximise the chances of the rows
each user will be interested in are on different pages (am I
correct?) . . .

No. I want to minimize the chance that two users will be editing
data on the same data page.

. . . I think in my
simple contacts example physically ordering on surname would
provide good concurrency as well. Whatever, it's clear we are both
thinking about the Jet implementation (i.e. contiguous storage on
disk) when considering PKs. Can everyone else say the same?

Well, either way, it's irrelevant for newly added records before the
database is compacted, since those are all going to land in their
own data page that is not written back in PK order.

As long as you've got records added after the last compact, there's
always going to be some data pages (and index data pages) that are
not in the final order (whether your natural key order or my random
Autonumber order), and thus the performance gain can never be fully
realized in an actively used database.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode