A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » Worksheet Functions
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Duplicate Difficulties



 
 
Thread Tools Display Modes
  #1  
Old April 28th, 2009, 09:00 PM posted to microsoft.public.excel.worksheet.functions
:: Amy ::
external usenet poster
 
Posts: 4
Default Duplicate Difficulties

Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!
--
Still Learning...
  #2  
Old April 29th, 2009, 12:19 AM posted to microsoft.public.excel.worksheet.functions
PJFry
external usenet poster
 
Posts: 148
Default Duplicate Difficulties

Couple of questions:

How are your timestamps formated? Are they a traditional stamp, like
4/28/2009 10:00 PM, or are they something custom

When you that you need to delete entries in the 'same five minutes', is this
a static window of time? For example if the time is between 10:00:00 and
10:04:59, then are these the same five minutes? 10:05:00 to 10:09:59 would
be the next, etc.

Let me know.

--
Regards,

PJ
Please rate this post using the vote buttons if it was helpful.



":: Amy ::" wrote:

Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!
--
Still Learning...

  #3  
Old April 29th, 2009, 01:32 AM posted to microsoft.public.excel.worksheet.functions
smartin
external usenet poster
 
Posts: 780
Default Duplicate Difficulties

:: Amy :: wrote:
Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!


Hello,

Consider this example in A1:C15

Acct Time Dups
J 17 0
E 1 1
K 3 0
B 18 0
K 19 1
J 7 0
E 6 2
G 4 0
F 9 0
C 6 0
E 11 2
K 16 1
E 16 1
B 12 0

Columns A and B are random data. The formula in C2 and filled down is:
=SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1

A "Dups" value greater than 0 means the Acct has at least one other row
matching Acct with a Time within 5 units.

(I am using whole "Time" numbers to simplify the explanation. Since you
are probably working with a date/time-type field you will need to adjust
the comparison
=5
to something like
=(5/24/60)
in order to convert to Excel's date/time numbering.)

But before you sort on Dups and delete 0 I would draw your attention
to the special case of Acct "E".

Acct "E" has times of 1,6,11 and 16; each is within 5 units of some
other. However, the total Time span of this Acct is 15. Do all these
count as duplicates?
  #4  
Old April 29th, 2009, 06:22 PM posted to microsoft.public.excel.worksheet.functions
:: Amy ::
external usenet poster
 
Posts: 4
Default Duplicate Difficulties

ok... here is an example of the info that i am working w/
Unfortunately when I export this report to excel the time stamp is like a
text...
I can fix that however, we are looking at about 40,000 rows

Employee Account # Date Time
John Doe 83773011 3/20/2009 4:46PM
John Doe 83773011 3/20/2009 4:27PM
John Doe 97868507 3/20/2009 4:26PM
John Doe 97868507 3/20/2009 4:25PM
John Doe 2433904 3/20/2009 4:19PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 92396306 3/20/2009 4:15PM
John Doe 103451304 3/20/2009 4:08PM
John Doe 54459201 3/20/2009 3:56PM
John Doe 54459201 3/20/2009 3:52PM
John Doe 6101201 3/20/2009 3:39PM
John Doe 108010401 3/20/2009 3:24PM
John Doe 81673524 3/20/2009 3:15PM
Thank you sooooo much for your time... i appreciate any help that I can
get
--
Still Learning...


"smartin" wrote:

:: Amy :: wrote:
Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!


Hello,

Consider this example in A1:C15

Acct Time Dups
J 17 0
E 1 1
K 3 0
B 18 0
K 19 1
J 7 0
E 6 2
G 4 0
F 9 0
C 6 0
E 11 2
K 16 1
E 16 1
B 12 0

Columns A and B are random data. The formula in C2 and filled down is:
=SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1

A "Dups" value greater than 0 means the Acct has at least one other row
matching Acct with a Time within 5 units.

(I am using whole "Time" numbers to simplify the explanation. Since you
are probably working with a date/time-type field you will need to adjust
the comparison
=5
to something like
=(5/24/60)
in order to convert to Excel's date/time numbering.)

But before you sort on Dups and delete 0 I would draw your attention
to the special case of Acct "E".

Acct "E" has times of 1,6,11 and 16; each is within 5 units of some
other. However, the total Time span of this Acct is 15. Do all these
count as duplicates?

  #5  
Old April 29th, 2009, 09:26 PM posted to microsoft.public.excel.worksheet.functions
Herbert Seidenberg
external usenet poster
 
Posts: 1,113
Default Duplicate Difficulties

Excel 2007
Text to Columns, MRound, Remove Duplicates
http://www.mediafire.com/file/igjn2mwewtm/04_29_09.xlsx
  #6  
Old April 30th, 2009, 12:49 AM posted to microsoft.public.excel.worksheet.functions
smartin
external usenet poster
 
Posts: 780
Default Duplicate Difficulties

No worries. Divide and conquer.

Your sample appears nicely in A115.

I added in E2:
=VALUE(LEFT(D2,LEN(D2)-2))+0.5*(RIGHT(D2,2)="PM")
(Removes the "AM/PM" tail and converts to a number representing the time
of day. Format this column as Time to see for real.)

And in F2:
=C2+E2
(Adds the date to the time of day. Now we have a proper date/time value.)

Finally in G2:
=SUMPRODUCT((B2=$B$2:$B$15)*(ABS(F2-$F$2:$F$15)=5/24/60))-1

Maybe one worry... the SUMPRODUCT will probably be very slow on 40k rows.


:: Amy :: wrote:
ok... here is an example of the info that i am working w/
Unfortunately when I export this report to excel the time stamp is like a
text...
I can fix that however, we are looking at about 40,000 rows

Employee Account # Date Time
John Doe 83773011 3/20/2009 4:46PM
John Doe 83773011 3/20/2009 4:27PM
John Doe 97868507 3/20/2009 4:26PM
John Doe 97868507 3/20/2009 4:25PM
John Doe 2433904 3/20/2009 4:19PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 92396306 3/20/2009 4:15PM
John Doe 103451304 3/20/2009 4:08PM
John Doe 54459201 3/20/2009 3:56PM
John Doe 54459201 3/20/2009 3:52PM
John Doe 6101201 3/20/2009 3:39PM
John Doe 108010401 3/20/2009 3:24PM
John Doe 81673524 3/20/2009 3:15PM
Thank you sooooo much for your time... i appreciate any help that I can
get

  #7  
Old May 2nd, 2009, 03:02 AM posted to microsoft.public.excel.worksheet.functions
Dennis Tucker
external usenet poster
 
Posts: 67
Default Duplicate Difficulties

Are the duplicates next to each other(row to row)?

Dennis


":: Amy ::" :: Amy wrote in message
...
Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!
--
Still Learning...


  #8  
Old May 3rd, 2009, 08:22 PM posted to microsoft.public.excel.worksheet.functions
Dennis Tucker
external usenet poster
 
Posts: 67
Default Duplicate Difficulties

Are the Account#'s and the Time values in separate columns?

A B C D
John Doe 83773011 3/20/2009 4:46PM

If they are, I would do the following
1. Add a column on the end(right side) that contains the formula
=if(B1=B2,"Duplicate Account") note: lets say this is in cell E1 (are the
account numbers duplicated?)
2. Add another column on the end(right side) that contains the formula
=if(E1="Duplicate Account",D1-D2) note: lets say this is in cell F1 (if the
account numbers duplicated, what is the time difference?)
3. Add another column on the end(right side) that contains the formula
=if(F10:05,"Less Than 5 Minutes") note: lets say this is in cell G1 (if
the account numbers duplicated, is the time difference less than 5 minutes?)

Then use a macro to look through the G column and start erasing the bad
line(duplicate records under 5 minutes).

Dennis



":: Amy ::" :: Amy wrote in message
...
ok... here is an example of the info that i am working w/
Unfortunately when I export this report to excel the time stamp is like a
text...
I can fix that however, we are looking at about 40,000 rows

Employee Account # Date Time
John Doe 83773011 3/20/2009 4:46PM
John Doe 83773011 3/20/2009 4:27PM
John Doe 97868507 3/20/2009 4:26PM
John Doe 97868507 3/20/2009 4:25PM
John Doe 2433904 3/20/2009 4:19PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 92396306 3/20/2009 4:15PM
John Doe 103451304 3/20/2009 4:08PM
John Doe 54459201 3/20/2009 3:56PM
John Doe 54459201 3/20/2009 3:52PM
John Doe 6101201 3/20/2009 3:39PM
John Doe 108010401 3/20/2009 3:24PM
John Doe 81673524 3/20/2009 3:15PM
Thank you sooooo much for your time... i appreciate any help that I can
get
--
Still Learning...


"smartin" wrote:

:: Amy :: wrote:
Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and
times...
but i want delete duplicates that are in the same 5 mins. Make
sense???
Please Help!!!


Hello,

Consider this example in A1:C15

Acct Time Dups
J 17 0
E 1 1
K 3 0
B 18 0
K 19 1
J 7 0
E 6 2
G 4 0
F 9 0
C 6 0
E 11 2
K 16 1
E 16 1
B 12 0

Columns A and B are random data. The formula in C2 and filled down is:
=SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)=5))-1

A "Dups" value greater than 0 means the Acct has at least one other row
matching Acct with a Time within 5 units.

(I am using whole "Time" numbers to simplify the explanation. Since you
are probably working with a date/time-type field you will need to adjust
the comparison
=5
to something like
=(5/24/60)
in order to convert to Excel's date/time numbering.)

But before you sort on Dups and delete 0 I would draw your attention
to the special case of Acct "E".

Acct "E" has times of 1,6,11 and 16; each is within 5 units of some
other. However, the total Time span of this Acct is 15. Do all these
count as duplicates?

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 07:14 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.