If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Query to group similar number combinations?
I'm trying to create a make-table query that will group records by two
fields. The fields, named "Left" and "Right," both contain 4-digit identification numbers, and each combination of the two fields should identify a unique individual. Unfortunately, data entry/recording errors are pretty common in these fields. The point of the query is to assign a unique, single-field ID to each individual, using the primary key from the table created by this query. In a perfect world, I could just group records by "Left" and "Right," add the results to a temporary table, and use the PK from that table as an ID. But as I mentioned above, the data is not exactly perfect. I have created queries to find individuals with similar Left and Right codes, but I can't think of a good way to group all these records together. Here's and input example: RecordID Left Right 1 1001 1002 2 1001 1002 3 1002 1001 4 1003 1004 5 1005 1006 6 1005 1009 7 1005 1006 What I'm trying to do is create some field, based on "Left" and "Right," that I could use in a group by statement to return three rows from the table above, since there are probably 3 individuals represented there. (The first three records are probably all the same individual, as are the last three records). Not even sure if this would be possible, since in different cases things are grouped based on different criteria, but I thought I would put it out there and see what people think. |
#2
|
|||
|
|||
Query to group similar number combinations?
The first three records are probably all the same individual, as are the
last three records I do not see your logic as all three have 1005, two have 1006, but another has 1009 that does not fit. -- Build a little, test a little. "esn" wrote: I'm trying to create a make-table query that will group records by two fields. The fields, named "Left" and "Right," both contain 4-digit identification numbers, and each combination of the two fields should identify a unique individual. Unfortunately, data entry/recording errors are pretty common in these fields. The point of the query is to assign a unique, single-field ID to each individual, using the primary key from the table created by this query. In a perfect world, I could just group records by "Left" and "Right," add the results to a temporary table, and use the PK from that table as an ID. But as I mentioned above, the data is not exactly perfect. I have created queries to find individuals with similar Left and Right codes, but I can't think of a good way to group all these records together. Here's and input example: RecordID Left Right 1 1001 1002 2 1001 1002 3 1002 1001 4 1003 1004 5 1005 1006 6 1005 1009 7 1005 1006 What I'm trying to do is create some field, based on "Left" and "Right," that I could use in a group by statement to return three rows from the table above, since there are probably 3 individuals represented there. (The first three records are probably all the same individual, as are the last three records). Not even sure if this would be possible, since in different cases things are grouped based on different criteria, but I thought I would put it out there and see what people think. . |
#3
|
|||
|
|||
Query to group similar number combinations?
The example was chosen to represent some of the more common errors in
recording/entering data - one common error is to replace a 6 with a 9 (I also see a lot of 3s replaced with 8s, etc). From poring through this data for a few months, I know (based largely on other data stored with these records and checking entered data against recorded data) that it's much more likely that those last three records represent the same individual and that the "1009" is an error, than that there are two distinct individuals with numbers that are so similar. Had this particular project been designed a little more carefully, I could actually guarantee that, as each four-digit code would only be available once, so there would be no way for two different individuals to share a code in one column but have different codes in the other. In reality, there have been a few codes used more than once. Still, the odds of two different individuals ending up with 7 of the 8 digits matching is miniscule compared to the probability of error. The same goes for the first three records - technically, it may be possible for two different individuals to have the same codes but in opposite columns. However, the probability that the codes were accidentally transposed somewhere in the path from the real world to the database is orders of magnitude higher than the probability that those codes are correct and actually represent two different individuals. Sounds like a nasty assumption, I know, but I've looked into it quite a bit and in every case I've found (somewhere in the hundreds, out of around 10000 records) the evidence from other fields in the records overwhelmingly pointed to errors in the data, as opposed to those records actually representing unique individuals. I should also have mentioned that it is possible for these codes to be lost and replaced (they refer to physical markers) - in which case an single individual can be represented by more than one combination of left and right codes. It's a very messy situation, and I have a feeling that automating the process of identifying which records represent the same individual may not be possible, but was hoping someone might have some ideas for me. I've built a form where users can select from a group of records with "similar" codes to identify the individual, but that doesn't do any good until I've assigned an individual ID to most of the records. |
#4
|
|||
|
|||
Query to group similar number combinations?
one common error is to replace a 6 with a 9 (I also see a lot of 3s
replaced with 8s, etc). You can not create a query to find those that are the typos you mentioned unless either you build a table of possible error combinations and let it suggest possible errors. The other way that might be possible is if there is supposed to be a minimum number of records in a set and it fill a set with something that is an half match. What do you want the output to look like for those that are easy to find such as the reversed field? -- Build a little, test a little. "esn" wrote: The example was chosen to represent some of the more common errors in recording/entering data - one common error is to replace a 6 with a 9 (I also see a lot of 3s replaced with 8s, etc). From poring through this data for a few months, I know (based largely on other data stored with these records and checking entered data against recorded data) that it's much more likely that those last three records represent the same individual and that the "1009" is an error, than that there are two distinct individuals with numbers that are so similar. Had this particular project been designed a little more carefully, I could actually guarantee that, as each four-digit code would only be available once, so there would be no way for two different individuals to share a code in one column but have different codes in the other. In reality, there have been a few codes used more than once. Still, the odds of two different individuals ending up with 7 of the 8 digits matching is miniscule compared to the probability of error. The same goes for the first three records - technically, it may be possible for two different individuals to have the same codes but in opposite columns. However, the probability that the codes were accidentally transposed somewhere in the path from the real world to the database is orders of magnitude higher than the probability that those codes are correct and actually represent two different individuals. Sounds like a nasty assumption, I know, but I've looked into it quite a bit and in every case I've found (somewhere in the hundreds, out of around 10000 records) the evidence from other fields in the records overwhelmingly pointed to errors in the data, as opposed to those records actually representing unique individuals. I should also have mentioned that it is possible for these codes to be lost and replaced (they refer to physical markers) - in which case an single individual can be represented by more than one combination of left and right codes. It's a very messy situation, and I have a feeling that automating the process of identifying which records represent the same individual may not be possible, but was hoping someone might have some ideas for me. I've built a form where users can select from a group of records with "similar" codes to identify the individual, but that doesn't do any good until I've assigned an individual ID to most of the records. . |
#5
|
|||
|
|||
Query to group similar number combinations?
What do you want the output to look like for those that are easy to find such as the reversed field? That's the next hurdle - once I have those records grouped correctly, the idea is to make a temporary table from them, then use the autonumber ID from the temporary table as an "IndividualID," and insert that ID back into the original records. The eventual result would be, using the example above, RecordID Left Right IndividualID 1 1001 1002 1 2 1001 1002 1 3 1002 1001 1 4 1003 1004 2 5 1005 1006 3 6 1005 1009 3 7 1005 1006 3 There are more elegant ways to do that as well, as any search for "sequential numbering group" will turn up. That would be fine too - the initial problem is getting access to understand that those records comprise a group, so that I can tackle the problem of numbering them. |
Thread Tools | |
Display Modes | |
|
|