If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
DISTINCT vs GROUPBY ?
DISTINCT vs GROUPBY ?
Hello all, I'm trying to figure out if there is any difference of performance using either GROUPBY or DISTINCT in the following scenario. I have a table named [Orders] that contains many fields, among them two called [Supplier_ID] and [InvoiceNumber]. The combo of the 2 is not unique and may repeat over several item records. I'm trying to generate a unique list of all [Supplier_ID][InvoiceNumber] Is it faster/better to use GROUPBY or DISTINCT ? many thx |
#2
|
|||
|
|||
Do you have separate Supplier and Invoice Tables? If not, I would start
there, and use one to many relationships. Ex. One Supplier will have many Invoices; One Invoice will have many Orders. K Board Stef wrote: DISTINCT vs GROUPBY ? Hello all, I'm trying to figure out if there is any difference of performance using either GROUPBY or DISTINCT in the following scenario. I have a table named [Orders] that contains many fields, among them two called [Supplier_ID] and [InvoiceNumber]. The combo of the 2 is not unique and may repeat over several item records. I'm trying to generate a unique list of all [Supplier_ID][InvoiceNumber] Is it faster/better to use GROUPBY or DISTINCT ? many thx -- Message posted via http://www.accessmonster.com |
#3
|
|||
|
|||
Hi,
GROUP BY should be faster, since both are likely to relay on sorting, and DISTINCT has no idea about which fields are important to consider (it will sort the whole SELECTed list of expressions), while GROUP BY can be instructed. Furthermore, use a dummy aggregate for fields in the list, when used with GROUP BY, such as LAST( ), on fields that are not defining a "group": SELECT FirstName, LastName, LAST(address), LAST(phoneNumber) FROM myTable GROUP BY FirstName, LastName could be faster than SELECT FirstName, LastName, address, phoneNumber FROM myTable GROUP BY FirstName, LastName, address, phoneNumber if FirstName and LastName define the "groups", because here, the sorting list has a larger number of bytes to handle, for each entry in the sorted list. Hoping it may help, Vanderghast, Access MVP "Stef" wrote in message ... DISTINCT vs GROUPBY ? Hello all, I'm trying to figure out if there is any difference of performance using either GROUPBY or DISTINCT in the following scenario. I have a table named [Orders] that contains many fields, among them two called [Supplier_ID] and [InvoiceNumber]. The combo of the 2 is not unique and may repeat over several item records. I'm trying to generate a unique list of all [Supplier_ID][InvoiceNumber] Is it faster/better to use GROUPBY or DISTINCT ? many thx |
#4
|
|||
|
|||
Michel Walsh wrote: GROUP BY should be faster, since both are likely to relay on sorting, and DISTINCT has no idea about which fields are important to consider (it will sort the whole SELECTed list of expressions), while GROUP BY can be instructed. Furthermore, use a dummy aggregate for fields in the list, when used with GROUP BY, such as LAST( ), on fields that are not defining a "group" I'm not convinced by your argument Neither rely on sorting. Instinct tells me DISTINCT would be faster because it doesn't have to perform any aggregation, dummy or otherwise. DISTINCT merely ignores duplicated rows, which is easy for the optimizer e.g. a hash table. Here are my test results: SELECT DISTINCT data_col FROM MillionRowTable; 1.266 secs SELECT COUNT(*), data_col FROM MillionRowTable GROUP BY data_col; 1.953 secs SELECT LAST(data_col), data_col FROM MillionRowTable GROUP BY data_col; 1.937 secs Based on the above, I wouldn't consider GROUP BY to be significantly slower but still think DISTINCT is the best choice. |
#5
|
|||
|
|||
Hi,
Even if it hashes, the hash has to be computed on each and every selected expression, in the case you use more that ONE expression in the SELECT statement, and nothing insures that two different row would never produce the SAME hash value.... You are just sure that if two rows produce a different hash, then they are different, but not the reverse. Hashing is O(1), while plain sorting by comparison is O(2), so internally, the database engine should perform the hashing, anyhow, when sorting on large set. I fail to really see your point. Your test did not include the GROUP BY statement equivalent to your distinct, which is: SELECT data_col FROM MillionRowTable GROUP BY data_col; All your actual GROUP BY statement are penalized by returning TWO columns (on a million rows) while your DISTINCT just return one column. That is plainly unfair (almost twice the IO to produce the result). I can be unfair too, what about: SELECT DISTINCT pk, f1, f2, f3, f4, f5 FROM myTableWithAprimaryKey versus SELECT pk, LAST(f1), LAST(f2), LAST(f3), LAST(f4), LAST(f5) FROM myTableWithAprimaryKey GROUP BY pk Vanderghast, Access MVP wrote in message ups.com... Michel Walsh wrote: GROUP BY should be faster, since both are likely to relay on sorting, and DISTINCT has no idea about which fields are important to consider (it will sort the whole SELECTed list of expressions), while GROUP BY can be instructed. Furthermore, use a dummy aggregate for fields in the list, when used with GROUP BY, such as LAST( ), on fields that are not defining a "group" I'm not convinced by your argument Neither rely on sorting. Instinct tells me DISTINCT would be faster because it doesn't have to perform any aggregation, dummy or otherwise. DISTINCT merely ignores duplicated rows, which is easy for the optimizer e.g. a hash table. Here are my test results: SELECT DISTINCT data_col FROM MillionRowTable; 1.266 secs SELECT COUNT(*), data_col FROM MillionRowTable GROUP BY data_col; 1.953 secs SELECT LAST(data_col), data_col FROM MillionRowTable GROUP BY data_col; 1.937 secs Based on the above, I wouldn't consider GROUP BY to be significantly slower but still think DISTINCT is the best choice. |
#6
|
|||
|
|||
Michel Walsh wrote: Even if it hashes I don't know that it does. I was simply guessing that a DISTINCT would be easy to optimize. Your test did not include the GROUP BY statement equivalent to your distinct, which is: SELECT data_col FROM MillionRowTable GROUP BY data_col; I get 0.985 secs. I can be unfair too I wasn't trying to be unfair, I was merely being foolish |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Counting Distinct Entries | Rick Deemer | New Users | 2 | September 10th, 2005 06:55 PM |
DISTINCT option of an aggregate function in Access 2003 | [email protected] | Running & Setting Up Queries | 1 | May 5th, 2005 04:46 PM |
Newbie: Distinct values | steve | Running & Setting Up Queries | 3 | October 28th, 2004 08:35 PM |
A2K Strange Characters in Distinct Query/Form Reference/combo box | david epsom dot com dot au | Running & Setting Up Queries | 0 | September 15th, 2004 10:33 AM |
SELECT DISTINCT | MacDuff | Running & Setting Up Queries | 3 | July 24th, 2004 07:46 PM |