A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Word » Page Layout
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

How to extract raw text from columns



 
 
Thread Tools Display Modes
  #1  
Old October 27th, 2009, 04:26 PM
Dave Miles Dave Miles is offline
Member
 
First recorded activity by OfficeFrustration: Oct 2009
Posts: 1
Angry How to extract raw text from columns

I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line?

Thanks!
  #2  
Old October 27th, 2009, 05:58 PM posted to microsoft.public.word.pagelayout
macropod[_2_]
external usenet poster
 
Posts: 2,402
Default How to extract raw text from columns

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

  #3  
Old October 28th, 2009, 12:24 AM posted to microsoft.public.word.pagelayout
Dave Miles[_3_]
external usenet poster
 
Posts: 2
Default How to extract raw text from columns

It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.

  #4  
Old October 28th, 2009, 12:54 AM posted to microsoft.public.word.pagelayout
Doug Robbins - Word MVP
external usenet poster
 
Posts: 8,239
Default How to extract raw text from columns

Send me a copy of the document to look at.

--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.



  #5  
Old October 28th, 2009, 06:54 AM posted to microsoft.public.word.pagelayout
macropod[_2_]
external usenet poster
 
Posts: 2,402
Default How to extract raw text from columns

So what sort of column arrangement are you using? And how do you keep the items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message ...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.

  #6  
Old October 28th, 2009, 07:15 AM posted to microsoft.public.word.pagelayout
Doug Robbins - Word MVP
external usenet poster
 
Posts: 8,239
Default How to extract raw text from columns

Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using EditReplace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use EditReplace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"macropod" wrote in message
...
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles
.



  #7  
Old October 29th, 2009, 03:21 AM posted to microsoft.public.word.pagelayout
Dave Miles[_3_]
external usenet poster
 
Posts: 2
Default How to extract raw text from columns

Hey Doug & Paul,

I think the docs may be generated by Access. I understand that the source
comes in in Excel and the reports are generated from that. Yes, the simple
answer would be to work from the Excel sheets but they contain more data
than I license so I have to take what I get......sad but true



"Doug Robbins - Word MVP" wrote:

Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using EditReplace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use EditReplace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"macropod" wrote in message
...
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles
.



.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 11:55 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.