OfficeFrustration - View Single Post - General solution for missing sequence numbers

Peter Danes

You're right, they don't know. Nor do they care. They're scientists, not
computer techs. They expect a database to fill a need, not dictate to them
in matters that do not interest them.

The numbering convention is something used all over the world in this
particular field (parasitic lichenology, to be specific) and calling them
stupid for using a numbering convention seems pointless to me. Records are
generally not deleted, this entire issue came about because gaps in the
numbering appeared due to misuse of a tool not designed for the job.
Duplicates of a sample are sometimes sent to other institutions and the
database now has tools for handling that, but the only time an actual record
might be deleted is if a sample is lost or accidentally destroyed, and in
that case the number is NOT re-used. In fact, I doubt that even then would
the record be deleted, it would probably simply have a remark put in a
comment field to the effect that the sample is no longer in existence, but
the information is still valuable.

And sequential numbering is not 'just wrong'. Thousands of things in daily
life are numbered so, streets, houses, rooms, movie tickets, store receipts.
Can you imagine a library where book catalog numbers were not in sequence?
Or zip codes assigned randomly, instead of at least being geographically
grouped? Certainly, users occasionally have unrealistic expectations and
some education is often necessary. But expecting an entire scientific
community to alter their standards to suit my notions of order is
ridiculous.

--
Pete

"Rob Oldfield" pí¹e v diskusním pøíspìvku
...

In that situation the issue isn't about how to 'fill in the gaps', it's
how
to explain to a particular group of scientists that they don't have a clue
about how relational databases work. Are they really stupid enough to not
understand that, although my highest ID number is 10000, that I only have
1000 samples? Even if I point out that 9000 of those records are marked
as
'not really a sample' (i.e. the idea of disallowing deletions and marking
the record as inactive instead)?

Expecting sequential numbering is just wrong. It's up to those of us who
actually know about the issues raised by John Vinson to not allow those
who
don't to tell us how to put databases together.

"Peter Danes" wrote in message
...
Such situations are common, for a variety of reasons. Depends on the
database and the user and what the data is for. The particular example
that
inspired this outburst is a mycological database, where the numbers are
used
to sequentially number the scientist's samples. She told me that
numbering
is important for others in the field to know roughly how many samples a
particular researcher has, and for internal inventory purposes, that
they
don't expect to have holes in the numbering sequence.

If someone who has 1,000 samples in their collection publishes something
about their sample number 10,000 and it is known that the person does
not
have anywhere near 10,000 samples, it would be viewed as odd at the very
least, possibly unethical and such a person would find himself not taken
seriously by other researchers. One or two numbers amiss in this
situation
is obviously not a major concern.

And for the internal inventory controls, if someone sees sample 152 next
to
150, they are going to wonder where is number 151. The inventory methods
used expect sequential numbering and a missing number is an indication
of
something wrong. For inventory numbers in the original database, she
used
the record number that appears in the text box of Access's navigation
control in conjunction with an autonumber field. You may guess what sort
of
hash resulted from that. I started out trying to fix a few things for
her
and wound up doing almost a complete re-write of the entire thing and
this
numbering issue is one of the things that surfaced. She wants to be able
to
fill in all the gaps as well as add new numbers to the end as she
collects
new samples.

Pete

"David C. Holley" píse v diskusním príspevku
...
What is the specific *NEED* to find the missing numbers?

Peter Danes wrote:
I occasionally need to determine a number that I don't have in a
sequence, either the first missing one in a gap in a set of
sequential
numbers or the next one in line at the end of a numbered series.
Always
it meant some fumbling around, with either VBA at first or later with
SQL
when I got good enough at it, establishing the proper join parameters
and
such. For SQL experts, this is probably routine and trivial, but for
me
it was always a bit of a chore. The last straw came with a database
which
I recently wrote, where the converted data had such a numbered
series,
and the owner wanted to be able to do both, fill in missing numbers
in
the gaps AND add new numbers at the end.

Walking home from a bar last night, I got to thinking about it and
realized that both problems are actually fairly similar and that a
simple
and general solution is possible.
I put together a simple table containing one field with the following
entries:

1,2,3,4, 8,9,10, 15,16,17,18, 20, 22,23,24,25, 28,29,30

Missing a

5,6,7, 11,12,13,14, 19, 21, 26,27 and 31 on up.

This is the dataset used for all of the following examples.

Finding the next new number at the end of a series with SQL is
trivial;
here is a simplified version of a statement that I found somewhere in
the
discussion groups a few years ago:

SELECT Max(MyTable.MySeqFld)+1 FROM MyTable;

This will return a one-record, one-field recordset containing exactly
one
value: 31, which is one greater than the largest value so far used in
that field. This is what you would want to use instead of Access's
autonumber, if the field is to contain meaningful sequence numbering,
rather than just a unique identifier.

Locating gaps is a little more complicated: it involves a self-join
from
N to N+1 and finding where N+1 doesn't exist, indicating a gap at
that
point.

SELECT MT1.MySeqFld+1
FROM MyTable AS MT1 LEFT JOIN MyTable AS MT2 ON
MT1.MySeqFld+1=MT2.MySeqFld
WHERE MT2.MySeqFld IS NULL;

This generates a recordset of 5, 11, 19, 21, 26, 31, where each value
is
the first missing value in a gap, including the "open gap" at the
end,
and that's where the trick to a general solution begins. Since these
situations normally call for either the first (lowest number) gap or
last
(end of recordset) gap, you need either the first or last record
returned
by this query. Sorting and using the TOP predicate gives you exactly
that.

SELECT TOP 1 MT1.MySeqFld+1
FROM MyTable AS MT1 LEFT JOIN MyTable AS MT2 ON
MT1.MySeqFld+1=MT2.MySeqFld
WHERE MT2.MySeqFld IS NULL
ORDER BY MT1.MySeqFld;

This will again return a one-record, one-field recordset containing
exactly one value: 5, the first missing number in the first gap in
the
sequence. Ascending sort order is the default, so the smallest number
is
the first returned.

SELECT TOP 1 MT1.MySeqFld+1
FROM MyTable AS MT1 LEFT JOIN MyTable AS MT2 ON
MT1.MySeqFld+1=MT2.MySeqFld
WHERE MT2.MySeqFld IS NULL
ORDER BY MT1.MySeqFld DESC;

This will return a one-record, one-field recordset containing exactly
one
value: 31, the same "one greater than the highest value so far used
in
that field" that is returned by the first simple example. Specifying
the
descending order here is necessary, since we want the last (greatest)
record from the set and Access SQL does not have a BOTTOM predicate.

Finally, an even more general statement can be used:

SELECT TOP 1 MT1.MySeqFld+1
FROM MyTable AS MT1 LEFT JOIN MyTable AS MT2 ON
MT1.MySeqFld+1=MT2.MySeqFld
WHERE MT2.MySeqFld IS NULL
ORDER BY ((INSTR("LF",[First or Last (F or L)]) *2)-3)*MT1.MySeqFld;

This expects one parameter, F or L and will return either the first
missing number or the next number at the end of the line. The INSTR
expression evaluates to either -1 or 1 (or -3 if the parameter
supplied
is neither F nor L, but that has the same effect as -1 in this
instance),
that is then used as a multiplier for the sort field, so the sort is
either by the field or by the negative of the field (or 3 times the
negative of the field), giving either ascending or descending order
and
with the TOP 1 predicate again returns exactly the one value of
interest.