If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Deleting duplicates - kind of
My boss had me merge together a ton of data.
Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#2
|
|||
|
|||
Deleting duplicates - kind of
In these 40K rows of data, what field (column) or combination of
fields/columns would constitute a unique identification. I mean that like this: You mentioned Group Name - would the Group Name be considered unique; all entries with the same Group Name would be considered as different parts of the same row and so could be consolidated into one single row? With that information someone here (probably end up being me) can come up with recommendations or, in my case, some VBA code to run as a macro to consolidate the data for you and eliminate the extra entries. "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#3
|
|||
|
|||
Deleting duplicates - kind of
Tamara,
Perhaps this will do what you need. I've made my best guesses based on your initial post. Start by making a copy of your workbook so that if things do go bad you'll still have the source information. Press [Alt]+[F11] to put the code below (with any changes to the 1st 2 Const values you need to make) into one of the workbooks. Once the VB Editor opens, use Insert | Module to start a new code module. Copy the code below and paste it into the code module. Select the sheet with data on it and use Tools | Macro | Macros and the [Run] button to run the code. As noted in it, it will take some time - somewhere between 9 and 20 minutes probably, depending on your setup. Sub CollateAndRemoveExtraEntries() 'sheet to be processed must be selected 'before calling this Macro 'Tested Time-To-Run: ' Excel 2003 on Win XP ' Single Core AMD 3200+: 9m 37s ' Excel 2007 on Win XP ' Dual Core AMD 4800+: 18m 07s ' Excel 2007 on Vista ' Intel CoreDuo 6600: 14m 10s ' 'There's often help to be found at ' HelpFrom @ jlathamsite. com ' 'change these Const values to 'match your worksheet layout Const uniqueIDColumn = "A" ' where to look for dupes 'assumes all columns from A to 'lastUsedColumn' are involved Const lastUsedColumn = "W" ' last column of data Dim dataOffsets() As Long ' offsets to columns of data Dim sortRange As Range Dim baseCell As Range Dim anyColOffset As Long Dim LC As Long ' loop counter Dim RO As Long ' row offset pointer Dim BCO As Long ' row offset from baseCell Dim lastUsedRow As Long Dim tempString As String ' work space 'set up to hold data offsets ReDim dataOffsets(1 To _ Range(lastUsedColumn & "1").Column) 'fill offset array, assumes all columns 'from A to 'lastUsedColumn' are involved For LC = LBound(dataOffsets) To UBound(dataOffsets) dataOffsets(LC) = Cells(1, LC).Column - _ Range(uniqueIDColumn & 1).Column Next 'begin by sorting the data 'borrow sortRange for the sorting Set sortRange = Range("A1:" & _ Range(uniqueIDColumn & Rows.Count).End(xlUp).Address) 'speed up the process Application.ScreenUpdating = False 'sort the range assuming it has a header row and 'sort by the uniqueIDColumn column. tempString = uniqueIDColumn & "1" sortRange.Sort Key1:=Range(tempString), _ Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom Set sortRange = Nothing ' free up resources 'now we can start collating data Set baseCell = Range(uniqueIDColumn & 1) BCO = 1 ' initialize lastUsedRow = Range(uniqueIDColumn & _ Rows.Count).End(xlUp).Row Do Until baseCell.Row lastUsedRow Set baseCell = Range(uniqueIDColumn & BCO) RO = 1 ' reset/initialize Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _ baseCell.Offset(RO, 0) = baseCell 'matched item, collate For LC = LBound(dataOffsets) To UBound(dataOffsets) If dataOffsets(LC) 0 Then 'not the unique ID column If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _ Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then 'copy data baseCell.Offset(0, dataOffsets(LC)) = _ baseCell.Offset(RO, dataOffsets(LC)) End If End If Next ' end column content loop RO = RO + 1 ' ready to look at next row Loop ' end of matched loop 'move the base cell BCO = baseCell.Row + RO Loop ' end of comparisons & collating 'now on to remove the extra entries 'again assumes that row 1 has labels 'work from bottom up to 2nd row with data in it For RO = lastUsedRow To 2 Step -1 If Range(uniqueIDColumn & RO) = _ Range(uniqueIDColumn & RO - 1) Then Range(uniqueIDColumn & RO).EntireRow.Delete End If Next Application.ScreenUpdating = True End Sub "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#4
|
|||
|
|||
Deleting duplicates - kind of
I kind of gave up on having someone help me and was doing it the very long,
long, way! Thank you so much!! "JLatham" wrote: Tamara, Perhaps this will do what you need. I've made my best guesses based on your initial post. Start by making a copy of your workbook so that if things do go bad you'll still have the source information. Press [Alt]+[F11] to put the code below (with any changes to the 1st 2 Const values you need to make) into one of the workbooks. Once the VB Editor opens, use Insert | Module to start a new code module. Copy the code below and paste it into the code module. Select the sheet with data on it and use Tools | Macro | Macros and the [Run] button to run the code. As noted in it, it will take some time - somewhere between 9 and 20 minutes probably, depending on your setup. Sub CollateAndRemoveExtraEntries() 'sheet to be processed must be selected 'before calling this Macro 'Tested Time-To-Run: ' Excel 2003 on Win XP ' Single Core AMD 3200+: 9m 37s ' Excel 2007 on Win XP ' Dual Core AMD 4800+: 18m 07s ' Excel 2007 on Vista ' Intel CoreDuo 6600: 14m 10s ' 'There's often help to be found at ' HelpFrom @ jlathamsite. com ' 'change these Const values to 'match your worksheet layout Const uniqueIDColumn = "A" ' where to look for dupes 'assumes all columns from A to 'lastUsedColumn' are involved Const lastUsedColumn = "W" ' last column of data Dim dataOffsets() As Long ' offsets to columns of data Dim sortRange As Range Dim baseCell As Range Dim anyColOffset As Long Dim LC As Long ' loop counter Dim RO As Long ' row offset pointer Dim BCO As Long ' row offset from baseCell Dim lastUsedRow As Long Dim tempString As String ' work space 'set up to hold data offsets ReDim dataOffsets(1 To _ Range(lastUsedColumn & "1").Column) 'fill offset array, assumes all columns 'from A to 'lastUsedColumn' are involved For LC = LBound(dataOffsets) To UBound(dataOffsets) dataOffsets(LC) = Cells(1, LC).Column - _ Range(uniqueIDColumn & 1).Column Next 'begin by sorting the data 'borrow sortRange for the sorting Set sortRange = Range("A1:" & _ Range(uniqueIDColumn & Rows.Count).End(xlUp).Address) 'speed up the process Application.ScreenUpdating = False 'sort the range assuming it has a header row and 'sort by the uniqueIDColumn column. tempString = uniqueIDColumn & "1" sortRange.Sort Key1:=Range(tempString), _ Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom Set sortRange = Nothing ' free up resources 'now we can start collating data Set baseCell = Range(uniqueIDColumn & 1) BCO = 1 ' initialize lastUsedRow = Range(uniqueIDColumn & _ Rows.Count).End(xlUp).Row Do Until baseCell.Row lastUsedRow Set baseCell = Range(uniqueIDColumn & BCO) RO = 1 ' reset/initialize Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _ baseCell.Offset(RO, 0) = baseCell 'matched item, collate For LC = LBound(dataOffsets) To UBound(dataOffsets) If dataOffsets(LC) 0 Then 'not the unique ID column If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _ Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then 'copy data baseCell.Offset(0, dataOffsets(LC)) = _ baseCell.Offset(RO, dataOffsets(LC)) End If End If Next ' end column content loop RO = RO + 1 ' ready to look at next row Loop ' end of matched loop 'move the base cell BCO = baseCell.Row + RO Loop ' end of comparisons & collating 'now on to remove the extra entries 'again assumes that row 1 has labels 'work from bottom up to 2nd row with data in it For RO = lastUsedRow To 2 Step -1 If Range(uniqueIDColumn & RO) = _ Range(uniqueIDColumn & RO - 1) Then Range(uniqueIDColumn & RO).EntireRow.Delete End If Next Application.ScreenUpdating = True End Sub "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#5
|
|||
|
|||
Deleting duplicates - kind of
That worked perfectly!!! There is still some clean up to do, of course, as
some of the group names had minor punctuation differences. All in all though, that knocked out over 10k records of nothing but crap!! Thank you so much!!! You just saved me HOURS!! Tamara "JLatham" wrote: Tamara, Perhaps this will do what you need. I've made my best guesses based on your initial post. Start by making a copy of your workbook so that if things do go bad you'll still have the source information. Press [Alt]+[F11] to put the code below (with any changes to the 1st 2 Const values you need to make) into one of the workbooks. Once the VB Editor opens, use Insert | Module to start a new code module. Copy the code below and paste it into the code module. Select the sheet with data on it and use Tools | Macro | Macros and the [Run] button to run the code. As noted in it, it will take some time - somewhere between 9 and 20 minutes probably, depending on your setup. Sub CollateAndRemoveExtraEntries() 'sheet to be processed must be selected 'before calling this Macro 'Tested Time-To-Run: ' Excel 2003 on Win XP ' Single Core AMD 3200+: 9m 37s ' Excel 2007 on Win XP ' Dual Core AMD 4800+: 18m 07s ' Excel 2007 on Vista ' Intel CoreDuo 6600: 14m 10s ' 'There's often help to be found at ' HelpFrom @ jlathamsite. com ' 'change these Const values to 'match your worksheet layout Const uniqueIDColumn = "A" ' where to look for dupes 'assumes all columns from A to 'lastUsedColumn' are involved Const lastUsedColumn = "W" ' last column of data Dim dataOffsets() As Long ' offsets to columns of data Dim sortRange As Range Dim baseCell As Range Dim anyColOffset As Long Dim LC As Long ' loop counter Dim RO As Long ' row offset pointer Dim BCO As Long ' row offset from baseCell Dim lastUsedRow As Long Dim tempString As String ' work space 'set up to hold data offsets ReDim dataOffsets(1 To _ Range(lastUsedColumn & "1").Column) 'fill offset array, assumes all columns 'from A to 'lastUsedColumn' are involved For LC = LBound(dataOffsets) To UBound(dataOffsets) dataOffsets(LC) = Cells(1, LC).Column - _ Range(uniqueIDColumn & 1).Column Next 'begin by sorting the data 'borrow sortRange for the sorting Set sortRange = Range("A1:" & _ Range(uniqueIDColumn & Rows.Count).End(xlUp).Address) 'speed up the process Application.ScreenUpdating = False 'sort the range assuming it has a header row and 'sort by the uniqueIDColumn column. tempString = uniqueIDColumn & "1" sortRange.Sort Key1:=Range(tempString), _ Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom Set sortRange = Nothing ' free up resources 'now we can start collating data Set baseCell = Range(uniqueIDColumn & 1) BCO = 1 ' initialize lastUsedRow = Range(uniqueIDColumn & _ Rows.Count).End(xlUp).Row Do Until baseCell.Row lastUsedRow Set baseCell = Range(uniqueIDColumn & BCO) RO = 1 ' reset/initialize Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _ baseCell.Offset(RO, 0) = baseCell 'matched item, collate For LC = LBound(dataOffsets) To UBound(dataOffsets) If dataOffsets(LC) 0 Then 'not the unique ID column If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _ Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then 'copy data baseCell.Offset(0, dataOffsets(LC)) = _ baseCell.Offset(RO, dataOffsets(LC)) End If End If Next ' end column content loop RO = RO + 1 ' ready to look at next row Loop ' end of matched loop 'move the base cell BCO = baseCell.Row + RO Loop ' end of comparisons & collating 'now on to remove the extra entries 'again assumes that row 1 has labels 'work from bottom up to 2nd row with data in it For RO = lastUsedRow To 2 Step -1 If Range(uniqueIDColumn & RO) = _ Range(uniqueIDColumn & RO - 1) Then Range(uniqueIDColumn & RO).EntireRow.Delete End If Next Application.ScreenUpdating = True End Sub "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#6
|
|||
|
|||
Deleting duplicates - kind of
Always nice to turn hours of work into minutes, isn't it?
Glad I could help. Thanks for letting us know it worked. "Tamara" wrote: That worked perfectly!!! There is still some clean up to do, of course, as some of the group names had minor punctuation differences. All in all though, that knocked out over 10k records of nothing but crap!! Thank you so much!!! You just saved me HOURS!! Tamara "JLatham" wrote: Tamara, Perhaps this will do what you need. I've made my best guesses based on your initial post. Start by making a copy of your workbook so that if things do go bad you'll still have the source information. Press [Alt]+[F11] to put the code below (with any changes to the 1st 2 Const values you need to make) into one of the workbooks. Once the VB Editor opens, use Insert | Module to start a new code module. Copy the code below and paste it into the code module. Select the sheet with data on it and use Tools | Macro | Macros and the [Run] button to run the code. As noted in it, it will take some time - somewhere between 9 and 20 minutes probably, depending on your setup. Sub CollateAndRemoveExtraEntries() 'sheet to be processed must be selected 'before calling this Macro 'Tested Time-To-Run: ' Excel 2003 on Win XP ' Single Core AMD 3200+: 9m 37s ' Excel 2007 on Win XP ' Dual Core AMD 4800+: 18m 07s ' Excel 2007 on Vista ' Intel CoreDuo 6600: 14m 10s ' 'There's often help to be found at ' HelpFrom @ jlathamsite. com ' 'change these Const values to 'match your worksheet layout Const uniqueIDColumn = "A" ' where to look for dupes 'assumes all columns from A to 'lastUsedColumn' are involved Const lastUsedColumn = "W" ' last column of data Dim dataOffsets() As Long ' offsets to columns of data Dim sortRange As Range Dim baseCell As Range Dim anyColOffset As Long Dim LC As Long ' loop counter Dim RO As Long ' row offset pointer Dim BCO As Long ' row offset from baseCell Dim lastUsedRow As Long Dim tempString As String ' work space 'set up to hold data offsets ReDim dataOffsets(1 To _ Range(lastUsedColumn & "1").Column) 'fill offset array, assumes all columns 'from A to 'lastUsedColumn' are involved For LC = LBound(dataOffsets) To UBound(dataOffsets) dataOffsets(LC) = Cells(1, LC).Column - _ Range(uniqueIDColumn & 1).Column Next 'begin by sorting the data 'borrow sortRange for the sorting Set sortRange = Range("A1:" & _ Range(uniqueIDColumn & Rows.Count).End(xlUp).Address) 'speed up the process Application.ScreenUpdating = False 'sort the range assuming it has a header row and 'sort by the uniqueIDColumn column. tempString = uniqueIDColumn & "1" sortRange.Sort Key1:=Range(tempString), _ Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom Set sortRange = Nothing ' free up resources 'now we can start collating data Set baseCell = Range(uniqueIDColumn & 1) BCO = 1 ' initialize lastUsedRow = Range(uniqueIDColumn & _ Rows.Count).End(xlUp).Row Do Until baseCell.Row lastUsedRow Set baseCell = Range(uniqueIDColumn & BCO) RO = 1 ' reset/initialize Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _ baseCell.Offset(RO, 0) = baseCell 'matched item, collate For LC = LBound(dataOffsets) To UBound(dataOffsets) If dataOffsets(LC) 0 Then 'not the unique ID column If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _ Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then 'copy data baseCell.Offset(0, dataOffsets(LC)) = _ baseCell.Offset(RO, dataOffsets(LC)) End If End If Next ' end column content loop RO = RO + 1 ' ready to look at next row Loop ' end of matched loop 'move the base cell BCO = baseCell.Row + RO Loop ' end of comparisons & collating 'now on to remove the extra entries 'again assumes that row 1 has labels 'work from bottom up to 2nd row with data in it For RO = lastUsedRow To 2 Step -1 If Range(uniqueIDColumn & RO) = _ Range(uniqueIDColumn & RO - 1) Then Range(uniqueIDColumn & RO).EntireRow.Delete End If Next Application.ScreenUpdating = True End Sub "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#7
|
|||
|
|||
Deleting duplicates - kind of
Excellent JLatham!!! Also worked for me. That's some know how.
"JLatham" wrote: Always nice to turn hours of work into minutes, isn't it? Glad I could help. Thanks for letting us know it worked. "Tamara" wrote: That worked perfectly!!! There is still some clean up to do, of course, as some of the group names had minor punctuation differences. All in all though, that knocked out over 10k records of nothing but crap!! Thank you so much!!! You just saved me HOURS!! Tamara "JLatham" wrote: Tamara, Perhaps this will do what you need. I've made my best guesses based on your initial post. Start by making a copy of your workbook so that if things do go bad you'll still have the source information. Press [Alt]+[F11] to put the code below (with any changes to the 1st 2 Const values you need to make) into one of the workbooks. Once the VB Editor opens, use Insert | Module to start a new code module. Copy the code below and paste it into the code module. Select the sheet with data on it and use Tools | Macro | Macros and the [Run] button to run the code. As noted in it, it will take some time - somewhere between 9 and 20 minutes probably, depending on your setup. Sub CollateAndRemoveExtraEntries() 'sheet to be processed must be selected 'before calling this Macro 'Tested Time-To-Run: ' Excel 2003 on Win XP ' Single Core AMD 3200+: 9m 37s ' Excel 2007 on Win XP ' Dual Core AMD 4800+: 18m 07s ' Excel 2007 on Vista ' Intel CoreDuo 6600: 14m 10s ' 'There's often help to be found at ' HelpFrom @ jlathamsite. com ' 'change these Const values to 'match your worksheet layout Const uniqueIDColumn = "A" ' where to look for dupes 'assumes all columns from A to 'lastUsedColumn' are involved Const lastUsedColumn = "W" ' last column of data Dim dataOffsets() As Long ' offsets to columns of data Dim sortRange As Range Dim baseCell As Range Dim anyColOffset As Long Dim LC As Long ' loop counter Dim RO As Long ' row offset pointer Dim BCO As Long ' row offset from baseCell Dim lastUsedRow As Long Dim tempString As String ' work space 'set up to hold data offsets ReDim dataOffsets(1 To _ Range(lastUsedColumn & "1").Column) 'fill offset array, assumes all columns 'from A to 'lastUsedColumn' are involved For LC = LBound(dataOffsets) To UBound(dataOffsets) dataOffsets(LC) = Cells(1, LC).Column - _ Range(uniqueIDColumn & 1).Column Next 'begin by sorting the data 'borrow sortRange for the sorting Set sortRange = Range("A1:" & _ Range(uniqueIDColumn & Rows.Count).End(xlUp).Address) 'speed up the process Application.ScreenUpdating = False 'sort the range assuming it has a header row and 'sort by the uniqueIDColumn column. tempString = uniqueIDColumn & "1" sortRange.Sort Key1:=Range(tempString), _ Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom Set sortRange = Nothing ' free up resources 'now we can start collating data Set baseCell = Range(uniqueIDColumn & 1) BCO = 1 ' initialize lastUsedRow = Range(uniqueIDColumn & _ Rows.Count).End(xlUp).Row Do Until baseCell.Row lastUsedRow Set baseCell = Range(uniqueIDColumn & BCO) RO = 1 ' reset/initialize Do While baseCell.Offset(RO, 0).Row = lastUsedRow And _ baseCell.Offset(RO, 0) = baseCell 'matched item, collate For LC = LBound(dataOffsets) To UBound(dataOffsets) If dataOffsets(LC) 0 Then 'not the unique ID column If IsEmpty(baseCell.Offset(0, dataOffsets(LC))) And _ Not IsEmpty(baseCell.Offset(RO, dataOffsets(LC))) Then 'copy data baseCell.Offset(0, dataOffsets(LC)) = _ baseCell.Offset(RO, dataOffsets(LC)) End If End If Next ' end column content loop RO = RO + 1 ' ready to look at next row Loop ' end of matched loop 'move the base cell BCO = baseCell.Row + RO Loop ' end of comparisons & collating 'now on to remove the extra entries 'again assumes that row 1 has labels 'work from bottom up to 2nd row with data in it For RO = lastUsedRow To 2 Step -1 If Range(uniqueIDColumn & RO) = _ Range(uniqueIDColumn & RO - 1) Then Range(uniqueIDColumn & RO).EntireRow.Delete End If Next Application.ScreenUpdating = True End Sub "Tamara" wrote: My boss had me merge together a ton of data. Group Name, Renewal Month, Broker Name, Broker Phone, Address, Address 2, City, State, County, Zip, Employer Phone, Employer Key Contact, EE's, SIC Code, Current Carrier, EE Rate, EE Rate2, Benefit Design, Status, RFP Date, Comments 1, Comments 2, Comments 3 - Are all the columns. Due to merging data from 10 different spreadsheets into one big spreadsheet (40k records), I have been trying to find a way to consolidate all information for each unique record onto one line. Some are entirely duplicate and some just need to consolidate the information. (i.e. Group name is the same for three records but one record has the address and another has the SIC code and the third is blank) Is there a simple way to do this? I tried running the EasyFilter Add-In that was mentioned on another post and it took all of my memory and timed out after an hour. I looked at the pearson guy's site but it wouldn't consolidate AND delete duplicates. Any ideas? |
#8
|
|||
|
|||
Deleting duplicates - kind of
Excel 2007 Tables
For non-VBA users. Consolidate and remove duplicates. Just formulas. http://c0444202.cdn.cloudfiles.racks.../12_06_09.xlsx |
Thread Tools | |
Display Modes | |
|
|