If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Duplicate Difficulties
Hello All,
I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! -- Still Learning... |
#2
|
|||
|
|||
Duplicate Difficulties
Couple of questions:
How are your timestamps formated? Are they a traditional stamp, like 4/28/2009 10:00 PM, or are they something custom When you that you need to delete entries in the 'same five minutes', is this a static window of time? For example if the time is between 10:00:00 and 10:04:59, then are these the same five minutes? 10:05:00 to 10:09:59 would be the next, etc. Let me know. -- Regards, PJ Please rate this post using the vote buttons if it was helpful. ":: Amy ::" wrote: Hello All, I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! -- Still Learning... |
#3
|
|||
|
|||
Duplicate Difficulties
:: Amy :: wrote:
Hello All, I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! Hello, Consider this example in A1:C15 Acct Time Dups J 17 0 E 1 1 K 3 0 B 18 0 K 19 1 J 7 0 E 6 2 G 4 0 F 9 0 C 6 0 E 11 2 K 16 1 E 16 1 B 12 0 Columns A and B are random data. The formula in C2 and filled down is: =SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1 A "Dups" value greater than 0 means the Acct has at least one other row matching Acct with a Time within 5 units. (I am using whole "Time" numbers to simplify the explanation. Since you are probably working with a date/time-type field you will need to adjust the comparison =5 to something like =(5/24/60) in order to convert to Excel's date/time numbering.) But before you sort on Dups and delete 0 I would draw your attention to the special case of Acct "E". Acct "E" has times of 1,6,11 and 16; each is within 5 units of some other. However, the total Time span of this Acct is 15. Do all these count as duplicates? |
#4
|
|||
|
|||
Duplicate Difficulties
ok... here is an example of the info that i am working w/
Unfortunately when I export this report to excel the time stamp is like a text... I can fix that however, we are looking at about 40,000 rows Employee Account # Date Time John Doe 83773011 3/20/2009 4:46PM John Doe 83773011 3/20/2009 4:27PM John Doe 97868507 3/20/2009 4:26PM John Doe 97868507 3/20/2009 4:25PM John Doe 2433904 3/20/2009 4:19PM John Doe 2433904 3/20/2009 4:18PM John Doe 2433904 3/20/2009 4:18PM John Doe 92396306 3/20/2009 4:15PM John Doe 103451304 3/20/2009 4:08PM John Doe 54459201 3/20/2009 3:56PM John Doe 54459201 3/20/2009 3:52PM John Doe 6101201 3/20/2009 3:39PM John Doe 108010401 3/20/2009 3:24PM John Doe 81673524 3/20/2009 3:15PM Thank you sooooo much for your time... i appreciate any help that I can get -- Still Learning... "smartin" wrote: :: Amy :: wrote: Hello All, I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! Hello, Consider this example in A1:C15 Acct Time Dups J 17 0 E 1 1 K 3 0 B 18 0 K 19 1 J 7 0 E 6 2 G 4 0 F 9 0 C 6 0 E 11 2 K 16 1 E 16 1 B 12 0 Columns A and B are random data. The formula in C2 and filled down is: =SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1 A "Dups" value greater than 0 means the Acct has at least one other row matching Acct with a Time within 5 units. (I am using whole "Time" numbers to simplify the explanation. Since you are probably working with a date/time-type field you will need to adjust the comparison =5 to something like =(5/24/60) in order to convert to Excel's date/time numbering.) But before you sort on Dups and delete 0 I would draw your attention to the special case of Acct "E". Acct "E" has times of 1,6,11 and 16; each is within 5 units of some other. However, the total Time span of this Acct is 15. Do all these count as duplicates? |
#5
|
|||
|
|||
Duplicate Difficulties
Excel 2007
Text to Columns, MRound, Remove Duplicates http://www.mediafire.com/file/igjn2mwewtm/04_29_09.xlsx |
#6
|
|||
|
|||
Duplicate Difficulties
No worries. Divide and conquer.
Your sample appears nicely in A115. I added in E2: =VALUE(LEFT(D2,LEN(D2)-2))+0.5*(RIGHT(D2,2)="PM") (Removes the "AM/PM" tail and converts to a number representing the time of day. Format this column as Time to see for real.) And in F2: =C2+E2 (Adds the date to the time of day. Now we have a proper date/time value.) Finally in G2: =SUMPRODUCT((B2=$B$2:$B$15)*(ABS(F2-$F$2:$F$15)=5/24/60))-1 Maybe one worry... the SUMPRODUCT will probably be very slow on 40k rows. :: Amy :: wrote: ok... here is an example of the info that i am working w/ Unfortunately when I export this report to excel the time stamp is like a text... I can fix that however, we are looking at about 40,000 rows Employee Account # Date Time John Doe 83773011 3/20/2009 4:46PM John Doe 83773011 3/20/2009 4:27PM John Doe 97868507 3/20/2009 4:26PM John Doe 97868507 3/20/2009 4:25PM John Doe 2433904 3/20/2009 4:19PM John Doe 2433904 3/20/2009 4:18PM John Doe 2433904 3/20/2009 4:18PM John Doe 92396306 3/20/2009 4:15PM John Doe 103451304 3/20/2009 4:08PM John Doe 54459201 3/20/2009 3:56PM John Doe 54459201 3/20/2009 3:52PM John Doe 6101201 3/20/2009 3:39PM John Doe 108010401 3/20/2009 3:24PM John Doe 81673524 3/20/2009 3:15PM Thank you sooooo much for your time... i appreciate any help that I can get |
#7
|
|||
|
|||
Duplicate Difficulties
Are the duplicates next to each other(row to row)?
Dennis ":: Amy ::" :: Amy wrote in message ... Hello All, I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! -- Still Learning... |
#8
|
|||
|
|||
Duplicate Difficulties
Are the Account#'s and the Time values in separate columns?
A B C D John Doe 83773011 3/20/2009 4:46PM If they are, I would do the following 1. Add a column on the end(right side) that contains the formula =if(B1=B2,"Duplicate Account") note: lets say this is in cell E1 (are the account numbers duplicated?) 2. Add another column on the end(right side) that contains the formula =if(E1="Duplicate Account",D1-D2) note: lets say this is in cell F1 (if the account numbers duplicated, what is the time difference?) 3. Add another column on the end(right side) that contains the formula =if(F10:05,"Less Than 5 Minutes") note: lets say this is in cell G1 (if the account numbers duplicated, is the time difference less than 5 minutes?) Then use a macro to look through the G column and start erasing the bad line(duplicate records under 5 minutes). Dennis ":: Amy ::" :: Amy wrote in message ... ok... here is an example of the info that i am working w/ Unfortunately when I export this report to excel the time stamp is like a text... I can fix that however, we are looking at about 40,000 rows Employee Account # Date Time John Doe 83773011 3/20/2009 4:46PM John Doe 83773011 3/20/2009 4:27PM John Doe 97868507 3/20/2009 4:26PM John Doe 97868507 3/20/2009 4:25PM John Doe 2433904 3/20/2009 4:19PM John Doe 2433904 3/20/2009 4:18PM John Doe 2433904 3/20/2009 4:18PM John Doe 92396306 3/20/2009 4:15PM John Doe 103451304 3/20/2009 4:08PM John Doe 54459201 3/20/2009 3:56PM John Doe 54459201 3/20/2009 3:52PM John Doe 6101201 3/20/2009 3:39PM John Doe 108010401 3/20/2009 3:24PM John Doe 81673524 3/20/2009 3:15PM Thank you sooooo much for your time... i appreciate any help that I can get -- Still Learning... "smartin" wrote: :: Amy :: wrote: Hello All, I am working a list with over 40,000 rows... I need to see if there is a way that can delete duplicate rows... Now to complicate things... Have a list that has acct numbers and times... but i want delete duplicates that are in the same 5 mins. Make sense??? Please Help!!! Hello, Consider this example in A1:C15 Acct Time Dups J 17 0 E 1 1 K 3 0 B 18 0 K 19 1 J 7 0 E 6 2 G 4 0 F 9 0 C 6 0 E 11 2 K 16 1 E 16 1 B 12 0 Columns A and B are random data. The formula in C2 and filled down is: =SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1 A "Dups" value greater than 0 means the Acct has at least one other row matching Acct with a Time within 5 units. (I am using whole "Time" numbers to simplify the explanation. Since you are probably working with a date/time-type field you will need to adjust the comparison =5 to something like =(5/24/60) in order to convert to Excel's date/time numbering.) But before you sort on Dups and delete 0 I would draw your attention to the special case of Acct "E". Acct "E" has times of 1,6,11 and 16; each is within 5 units of some other. However, the total Time span of this Acct is 15. Do all these count as duplicates? |
Thread Tools | |
Display Modes | |
|
|