A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Deleting duplicates - kind of



 
 
Thread Tools Display Modes
  #1  
Old May 23rd, 2008, 03:13 PM posted to microsoft.public.excel.misc
tamara
external usenet poster
 
Posts: 47
Default Deleting duplicates - kind of

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?
  #2  
Old May 25th, 2008, 08:12 AM posted to microsoft.public.excel.misc
JLatham
external usenet poster
 
Posts: 3,017
Default Deleting duplicates - kind of

In these 40K rows of data, what field (column) or combination of
fields/columns would constitute a unique identification. I mean that like
this: You mentioned Group Name - would the Group Name be considered unique;
all entries with the same Group Name would be considered as different parts
of the same row and so could be consolidated into one single row?

With that information someone here (probably end up being me) can come up
with recommendations or, in my case, some VBA code to run as a macro to
consolidate the data for you and eliminate the extra entries.


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #3  
Old May 25th, 2008, 07:43 PM posted to microsoft.public.excel.misc
JLatham
external usenet poster
 
Posts: 3,017
Default Deleting duplicates - kind of

Tamara,
Perhaps this will do what you need. I've made my best guesses based on your
initial post.
Start by making a copy of your workbook so that if things do go bad you'll
still have the source information. Press [Alt]+[F11] to put the code below
(with any changes to the 1st 2 Const values you need to make) into one of the
workbooks. Once the VB Editor opens, use Insert | Module to start a new code
module. Copy the code below and paste it into the code module. Select the
sheet with data on it and use Tools | Macro | Macros and the [Run] button to
run the code. As noted in it, it will take some time - somewhere between 9
and 20 minutes probably, depending on your setup.


Sub CollateAndRemoveExtraEntries()
'sheet to be processed must be selected
'before calling this Macro
'Tested Time-To-Run:
' Excel 2003 on Win XP
' Single Core AMD 3200+: 9m 37s
' Excel 2007 on Win XP
' Dual Core AMD 4800+: 18m 07s
' Excel 2007 on Vista
' Intel CoreDuo 6600: 14m 10s
'
'There's often help to be found at
' HelpFrom @ jlathamsite. com
'
'change these Const values to
'match your worksheet layout
Const uniqueIDColumn = "A" ' where to look for dupes
'assumes all columns from A to 'lastUsedColumn' are involved
Const lastUsedColumn = "W" ' last column of data

Dim dataOffsets() As Long ' offsets to columns of data
Dim sortRange As Range
Dim baseCell As Range
Dim anyColOffset As Long
Dim LC As Long ' loop counter
Dim RO As Long ' row offset pointer
Dim BCO As Long ' row offset from baseCell
Dim lastUsedRow As Long
Dim tempString As String ' work space

'set up to hold data offsets
ReDim dataOffsets(1 To _
Range(lastUsedColumn & "1").Column)
'fill offset array, assumes all columns
'from A to 'lastUsedColumn' are involved
For LC = LBound(dataOffsets) To UBound(dataOffsets)
dataOffsets(LC) = Cells(1, LC).Column - _
Range(uniqueIDColumn & 1).Column
Next
'begin by sorting the data
'borrow sortRange for the sorting
Set sortRange = Range("A1:" & _
Range(uniqueIDColumn & Rows.Count).End(xlUp).Address)
'speed up the process
Application.ScreenUpdating = False
'sort the range assuming it has a header row and
'sort by the uniqueIDColumn column.
tempString = uniqueIDColumn & "1"
sortRange.Sort Key1:=Range(tempString), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Set sortRange = Nothing ' free up resources
'now we can start collating data
Set baseCell = Range(uniqueIDColumn & 1)
BCO = 1 ' initialize
lastUsedRow = Range(uniqueIDColumn & _
Rows.Count).End(xlUp).Row
Do Until baseCell.Row lastUsedRow
Set baseCell = Range(uniqueIDColumn & BCO)
RO = 1 ' reset/initialize
Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _
baseCell.Offset(RO, 0) = baseCell
'matched item, collate
For LC = LBound(dataOffsets) To UBound(dataOffsets)
If dataOffsets(LC) 0 Then
'not the unique ID column
If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _
Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then
'copy data
baseCell.Offset(0, dataOffsets(LC)) = _
baseCell.Offset(RO, dataOffsets(LC))
End If
End If
Next ' end column content loop
RO = RO + 1 ' ready to look at next row
Loop ' end of matched loop
'move the base cell
BCO = baseCell.Row + RO
Loop ' end of comparisons & collating
'now on to remove the extra entries
'again assumes that row 1 has labels
'work from bottom up to 2nd row with data in it
For RO = lastUsedRow To 2 Step -1
If Range(uniqueIDColumn & RO) = _
Range(uniqueIDColumn & RO - 1) Then
Range(uniqueIDColumn & RO).EntireRow.Delete
End If
Next
Application.ScreenUpdating = True
End Sub


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #4  
Old May 27th, 2008, 05:20 PM posted to microsoft.public.excel.misc
tamara
external usenet poster
 
Posts: 47
Default Deleting duplicates - kind of

I kind of gave up on having someone help me and was doing it the very long,
long, way! Thank you so much!!

"JLatham" wrote:

Tamara,
Perhaps this will do what you need. I've made my best guesses based on your
initial post.
Start by making a copy of your workbook so that if things do go bad you'll
still have the source information. Press [Alt]+[F11] to put the code below
(with any changes to the 1st 2 Const values you need to make) into one of the
workbooks. Once the VB Editor opens, use Insert | Module to start a new code
module. Copy the code below and paste it into the code module. Select the
sheet with data on it and use Tools | Macro | Macros and the [Run] button to
run the code. As noted in it, it will take some time - somewhere between 9
and 20 minutes probably, depending on your setup.


Sub CollateAndRemoveExtraEntries()
'sheet to be processed must be selected
'before calling this Macro
'Tested Time-To-Run:
' Excel 2003 on Win XP
' Single Core AMD 3200+: 9m 37s
' Excel 2007 on Win XP
' Dual Core AMD 4800+: 18m 07s
' Excel 2007 on Vista
' Intel CoreDuo 6600: 14m 10s
'
'There's often help to be found at
' HelpFrom @ jlathamsite. com
'
'change these Const values to
'match your worksheet layout
Const uniqueIDColumn = "A" ' where to look for dupes
'assumes all columns from A to 'lastUsedColumn' are involved
Const lastUsedColumn = "W" ' last column of data

Dim dataOffsets() As Long ' offsets to columns of data
Dim sortRange As Range
Dim baseCell As Range
Dim anyColOffset As Long
Dim LC As Long ' loop counter
Dim RO As Long ' row offset pointer
Dim BCO As Long ' row offset from baseCell
Dim lastUsedRow As Long
Dim tempString As String ' work space

'set up to hold data offsets
ReDim dataOffsets(1 To _
Range(lastUsedColumn & "1").Column)
'fill offset array, assumes all columns
'from A to 'lastUsedColumn' are involved
For LC = LBound(dataOffsets) To UBound(dataOffsets)
dataOffsets(LC) = Cells(1, LC).Column - _
Range(uniqueIDColumn & 1).Column
Next
'begin by sorting the data
'borrow sortRange for the sorting
Set sortRange = Range("A1:" & _
Range(uniqueIDColumn & Rows.Count).End(xlUp).Address)
'speed up the process
Application.ScreenUpdating = False
'sort the range assuming it has a header row and
'sort by the uniqueIDColumn column.
tempString = uniqueIDColumn & "1"
sortRange.Sort Key1:=Range(tempString), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Set sortRange = Nothing ' free up resources
'now we can start collating data
Set baseCell = Range(uniqueIDColumn & 1)
BCO = 1 ' initialize
lastUsedRow = Range(uniqueIDColumn & _
Rows.Count).End(xlUp).Row
Do Until baseCell.Row lastUsedRow
Set baseCell = Range(uniqueIDColumn & BCO)
RO = 1 ' reset/initialize
Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _
baseCell.Offset(RO, 0) = baseCell
'matched item, collate
For LC = LBound(dataOffsets) To UBound(dataOffsets)
If dataOffsets(LC) 0 Then
'not the unique ID column
If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _
Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then
'copy data
baseCell.Offset(0, dataOffsets(LC)) = _
baseCell.Offset(RO, dataOffsets(LC))
End If
End If
Next ' end column content loop
RO = RO + 1 ' ready to look at next row
Loop ' end of matched loop
'move the base cell
BCO = baseCell.Row + RO
Loop ' end of comparisons & collating
'now on to remove the extra entries
'again assumes that row 1 has labels
'work from bottom up to 2nd row with data in it
For RO = lastUsedRow To 2 Step -1
If Range(uniqueIDColumn & RO) = _
Range(uniqueIDColumn & RO - 1) Then
Range(uniqueIDColumn & RO).EntireRow.Delete
End If
Next
Application.ScreenUpdating = True
End Sub


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #5  
Old May 27th, 2008, 07:04 PM posted to microsoft.public.excel.misc
tamara
external usenet poster
 
Posts: 47
Default Deleting duplicates - kind of

That worked perfectly!!! There is still some clean up to do, of course, as
some of the group names had minor punctuation differences. All in all
though, that knocked out over 10k records of nothing but crap!! Thank you so
much!!! You just saved me HOURS!!

Tamara

"JLatham" wrote:

Tamara,
Perhaps this will do what you need. I've made my best guesses based on your
initial post.
Start by making a copy of your workbook so that if things do go bad you'll
still have the source information. Press [Alt]+[F11] to put the code below
(with any changes to the 1st 2 Const values you need to make) into one of the
workbooks. Once the VB Editor opens, use Insert | Module to start a new code
module. Copy the code below and paste it into the code module. Select the
sheet with data on it and use Tools | Macro | Macros and the [Run] button to
run the code. As noted in it, it will take some time - somewhere between 9
and 20 minutes probably, depending on your setup.


Sub CollateAndRemoveExtraEntries()
'sheet to be processed must be selected
'before calling this Macro
'Tested Time-To-Run:
' Excel 2003 on Win XP
' Single Core AMD 3200+: 9m 37s
' Excel 2007 on Win XP
' Dual Core AMD 4800+: 18m 07s
' Excel 2007 on Vista
' Intel CoreDuo 6600: 14m 10s
'
'There's often help to be found at
' HelpFrom @ jlathamsite. com
'
'change these Const values to
'match your worksheet layout
Const uniqueIDColumn = "A" ' where to look for dupes
'assumes all columns from A to 'lastUsedColumn' are involved
Const lastUsedColumn = "W" ' last column of data

Dim dataOffsets() As Long ' offsets to columns of data
Dim sortRange As Range
Dim baseCell As Range
Dim anyColOffset As Long
Dim LC As Long ' loop counter
Dim RO As Long ' row offset pointer
Dim BCO As Long ' row offset from baseCell
Dim lastUsedRow As Long
Dim tempString As String ' work space

'set up to hold data offsets
ReDim dataOffsets(1 To _
Range(lastUsedColumn & "1").Column)
'fill offset array, assumes all columns
'from A to 'lastUsedColumn' are involved
For LC = LBound(dataOffsets) To UBound(dataOffsets)
dataOffsets(LC) = Cells(1, LC).Column - _
Range(uniqueIDColumn & 1).Column
Next
'begin by sorting the data
'borrow sortRange for the sorting
Set sortRange = Range("A1:" & _
Range(uniqueIDColumn & Rows.Count).End(xlUp).Address)
'speed up the process
Application.ScreenUpdating = False
'sort the range assuming it has a header row and
'sort by the uniqueIDColumn column.
tempString = uniqueIDColumn & "1"
sortRange.Sort Key1:=Range(tempString), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Set sortRange = Nothing ' free up resources
'now we can start collating data
Set baseCell = Range(uniqueIDColumn & 1)
BCO = 1 ' initialize
lastUsedRow = Range(uniqueIDColumn & _
Rows.Count).End(xlUp).Row
Do Until baseCell.Row lastUsedRow
Set baseCell = Range(uniqueIDColumn & BCO)
RO = 1 ' reset/initialize
Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _
baseCell.Offset(RO, 0) = baseCell
'matched item, collate
For LC = LBound(dataOffsets) To UBound(dataOffsets)
If dataOffsets(LC) 0 Then
'not the unique ID column
If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _
Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then
'copy data
baseCell.Offset(0, dataOffsets(LC)) = _
baseCell.Offset(RO, dataOffsets(LC))
End If
End If
Next ' end column content loop
RO = RO + 1 ' ready to look at next row
Loop ' end of matched loop
'move the base cell
BCO = baseCell.Row + RO
Loop ' end of comparisons & collating
'now on to remove the extra entries
'again assumes that row 1 has labels
'work from bottom up to 2nd row with data in it
For RO = lastUsedRow To 2 Step -1
If Range(uniqueIDColumn & RO) = _
Range(uniqueIDColumn & RO - 1) Then
Range(uniqueIDColumn & RO).EntireRow.Delete
End If
Next
Application.ScreenUpdating = True
End Sub


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #6  
Old May 28th, 2008, 12:24 AM posted to microsoft.public.excel.misc
JLatham
external usenet poster
 
Posts: 3,017
Default Deleting duplicates - kind of

Always nice to turn hours of work into minutes, isn't it?

Glad I could help. Thanks for letting us know it worked.

"Tamara" wrote:

That worked perfectly!!! There is still some clean up to do, of course, as
some of the group names had minor punctuation differences. All in all
though, that knocked out over 10k records of nothing but crap!! Thank you so
much!!! You just saved me HOURS!!

Tamara

"JLatham" wrote:

Tamara,
Perhaps this will do what you need. I've made my best guesses based on your
initial post.
Start by making a copy of your workbook so that if things do go bad you'll
still have the source information. Press [Alt]+[F11] to put the code below
(with any changes to the 1st 2 Const values you need to make) into one of the
workbooks. Once the VB Editor opens, use Insert | Module to start a new code
module. Copy the code below and paste it into the code module. Select the
sheet with data on it and use Tools | Macro | Macros and the [Run] button to
run the code. As noted in it, it will take some time - somewhere between 9
and 20 minutes probably, depending on your setup.


Sub CollateAndRemoveExtraEntries()
'sheet to be processed must be selected
'before calling this Macro
'Tested Time-To-Run:
' Excel 2003 on Win XP
' Single Core AMD 3200+: 9m 37s
' Excel 2007 on Win XP
' Dual Core AMD 4800+: 18m 07s
' Excel 2007 on Vista
' Intel CoreDuo 6600: 14m 10s
'
'There's often help to be found at
' HelpFrom @ jlathamsite. com
'
'change these Const values to
'match your worksheet layout
Const uniqueIDColumn = "A" ' where to look for dupes
'assumes all columns from A to 'lastUsedColumn' are involved
Const lastUsedColumn = "W" ' last column of data

Dim dataOffsets() As Long ' offsets to columns of data
Dim sortRange As Range
Dim baseCell As Range
Dim anyColOffset As Long
Dim LC As Long ' loop counter
Dim RO As Long ' row offset pointer
Dim BCO As Long ' row offset from baseCell
Dim lastUsedRow As Long
Dim tempString As String ' work space

'set up to hold data offsets
ReDim dataOffsets(1 To _
Range(lastUsedColumn & "1").Column)
'fill offset array, assumes all columns
'from A to 'lastUsedColumn' are involved
For LC = LBound(dataOffsets) To UBound(dataOffsets)
dataOffsets(LC) = Cells(1, LC).Column - _
Range(uniqueIDColumn & 1).Column
Next
'begin by sorting the data
'borrow sortRange for the sorting
Set sortRange = Range("A1:" & _
Range(uniqueIDColumn & Rows.Count).End(xlUp).Address)
'speed up the process
Application.ScreenUpdating = False
'sort the range assuming it has a header row and
'sort by the uniqueIDColumn column.
tempString = uniqueIDColumn & "1"
sortRange.Sort Key1:=Range(tempString), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Set sortRange = Nothing ' free up resources
'now we can start collating data
Set baseCell = Range(uniqueIDColumn & 1)
BCO = 1 ' initialize
lastUsedRow = Range(uniqueIDColumn & _
Rows.Count).End(xlUp).Row
Do Until baseCell.Row lastUsedRow
Set baseCell = Range(uniqueIDColumn & BCO)
RO = 1 ' reset/initialize
Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _
baseCell.Offset(RO, 0) = baseCell
'matched item, collate
For LC = LBound(dataOffsets) To UBound(dataOffsets)
If dataOffsets(LC) 0 Then
'not the unique ID column
If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _
Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then
'copy data
baseCell.Offset(0, dataOffsets(LC)) = _
baseCell.Offset(RO, dataOffsets(LC))
End If
End If
Next ' end column content loop
RO = RO + 1 ' ready to look at next row
Loop ' end of matched loop
'move the base cell
BCO = baseCell.Row + RO
Loop ' end of comparisons & collating
'now on to remove the extra entries
'again assumes that row 1 has labels
'work from bottom up to 2nd row with data in it
For RO = lastUsedRow To 2 Step -1
If Range(uniqueIDColumn & RO) = _
Range(uniqueIDColumn & RO - 1) Then
Range(uniqueIDColumn & RO).EntireRow.Delete
End If
Next
Application.ScreenUpdating = True
End Sub


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #7  
Old December 18th, 2009, 05:16 PM posted to microsoft.public.excel.misc
Martinl
external usenet poster
 
Posts: 18
Default Deleting duplicates - kind of

Excellent JLatham!!! Also worked for me. That's some know how.

"JLatham" wrote:

Always nice to turn hours of work into minutes, isn't it?

Glad I could help. Thanks for letting us know it worked.

"Tamara" wrote:

That worked perfectly!!! There is still some clean up to do, of course, as
some of the group names had minor punctuation differences. All in all
though, that knocked out over 10k records of nothing but crap!! Thank you so
much!!! You just saved me HOURS!!

Tamara

"JLatham" wrote:

Tamara,
Perhaps this will do what you need. I've made my best guesses based on your
initial post.
Start by making a copy of your workbook so that if things do go bad you'll
still have the source information. Press [Alt]+[F11] to put the code below
(with any changes to the 1st 2 Const values you need to make) into one of the
workbooks. Once the VB Editor opens, use Insert | Module to start a new code
module. Copy the code below and paste it into the code module. Select the
sheet with data on it and use Tools | Macro | Macros and the [Run] button to
run the code. As noted in it, it will take some time - somewhere between 9
and 20 minutes probably, depending on your setup.


Sub CollateAndRemoveExtraEntries()
'sheet to be processed must be selected
'before calling this Macro
'Tested Time-To-Run:
' Excel 2003 on Win XP
' Single Core AMD 3200+: 9m 37s
' Excel 2007 on Win XP
' Dual Core AMD 4800+: 18m 07s
' Excel 2007 on Vista
' Intel CoreDuo 6600: 14m 10s
'
'There's often help to be found at
' HelpFrom @ jlathamsite. com
'
'change these Const values to
'match your worksheet layout
Const uniqueIDColumn = "A" ' where to look for dupes
'assumes all columns from A to 'lastUsedColumn' are involved
Const lastUsedColumn = "W" ' last column of data

Dim dataOffsets() As Long ' offsets to columns of data
Dim sortRange As Range
Dim baseCell As Range
Dim anyColOffset As Long
Dim LC As Long ' loop counter
Dim RO As Long ' row offset pointer
Dim BCO As Long ' row offset from baseCell
Dim lastUsedRow As Long
Dim tempString As String ' work space

'set up to hold data offsets
ReDim dataOffsets(1 To _
Range(lastUsedColumn & "1").Column)
'fill offset array, assumes all columns
'from A to 'lastUsedColumn' are involved
For LC = LBound(dataOffsets) To UBound(dataOffsets)
dataOffsets(LC) = Cells(1, LC).Column - _
Range(uniqueIDColumn & 1).Column
Next
'begin by sorting the data
'borrow sortRange for the sorting
Set sortRange = Range("A1:" & _
Range(uniqueIDColumn & Rows.Count).End(xlUp).Address)
'speed up the process
Application.ScreenUpdating = False
'sort the range assuming it has a header row and
'sort by the uniqueIDColumn column.
tempString = uniqueIDColumn & "1"
sortRange.Sort Key1:=Range(tempString), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
Set sortRange = Nothing ' free up resources
'now we can start collating data
Set baseCell = Range(uniqueIDColumn & 1)
BCO = 1 ' initialize
lastUsedRow = Range(uniqueIDColumn & _
Rows.Count).End(xlUp).Row
Do Until baseCell.Row lastUsedRow
Set baseCell = Range(uniqueIDColumn & BCO)
RO = 1 ' reset/initialize
Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _
baseCell.Offset(RO, 0) = baseCell
'matched item, collate
For LC = LBound(dataOffsets) To UBound(dataOffsets)
If dataOffsets(LC) 0 Then
'not the unique ID column
If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _
Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then
'copy data
baseCell.Offset(0, dataOffsets(LC)) = _
baseCell.Offset(RO, dataOffsets(LC))
End If
End If
Next ' end column content loop
RO = RO + 1 ' ready to look at next row
Loop ' end of matched loop
'move the base cell
BCO = baseCell.Row + RO
Loop ' end of comparisons & collating
'now on to remove the extra entries
'again assumes that row 1 has labels
'work from bottom up to 2nd row with data in it
For RO = lastUsedRow To 2 Step -1
If Range(uniqueIDColumn & RO) = _
Range(uniqueIDColumn & RO - 1) Then
Range(uniqueIDColumn & RO).EntireRow.Delete
End If
Next
Application.ScreenUpdating = True
End Sub


"Tamara" wrote:

My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address,
Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact,
EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status,
RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns.

Due to merging data from 10 different spreadsheets into one big spreadsheet
(40k records), I have been trying to find a way to consolidate all
information for each unique record onto one line. Some are entirely
duplicate and some just need to consolidate the information. (i.e. Group name
is the same for three records but one record has the address and another has
the SIC code and the third is blank)

Is there a simple way to do this? I tried running the EasyFilter Add-In
that was mentioned on another post and it took all of my memory and timed out
after an hour. I looked at the pearson guy's site but it wouldn't
consolidate AND delete duplicates. Any ideas?

  #8  
Old December 26th, 2009, 07:23 PM posted to microsoft.public.excel.misc
Herbert Seidenberg
external usenet poster
 
Posts: 1,113
Default Deleting duplicates - kind of

Excel 2007 Tables
For non-VBA users.
Consolidate and remove duplicates.
Just formulas.
http://c0444202.cdn.cloudfiles.racks.../12_06_09.xlsx
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 07:06 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.