Fake returns?

**WilliamWMeyer** · October 7th, 2008, 04:18 PM posted to microsoft.public.word.formatting.longdocs

Hi folks,

I often am dealing with Word files I didn't create. That is, a human being
other than myself created them (perhaps on Macs), or Save As/Export filters
created them. For example, Saving As .doc/.rtf out of Adobe Acrobat, or OCR
scanning software that saves as .doc/rtf.

Because of this I often run into fake returns, which I can manipulate to
some degree, in Find & Replace with ^p and ^13.

However, if I select a group of these paragraphs and apply a Style to them,
their fakeness is revealed and the block of paragraphs is treated as if it
were one paragraph.

I used to see and solve a problem like this, in which a para mark had a ^10
before or after it, but I haven't seen those ^10s since a couple Word or
operating system versions ago.

Any ideas?
WilliamW

**Klaus Linke** · October 8th, 2008, 07:03 PM posted to microsoft.public.word.formatting.longdocs

Hi William,

Yes, as you say that's a problem that has been around a while.
The issues with the non-working ¶ para marks should go away once you save
the file in a native Word format (doc, rtf, docx...).

Or, if that's not a good option, replace ^13 with ^p, and maybe ^10 with ^p
to be on the safe side. I do that routinely at the beginning of macros that
process text files or other non-word files.

Regards,
Klaus

"WilliamWMeyer" wrote:
Hi folks,

I often am dealing with Word files I didn't create. That is, a human being
other than myself created them (perhaps on Macs), or Save As/Export
filters created them. For example, Saving As .doc/.rtf out of Adobe
Acrobat, or OCR scanning software that saves as .doc/rtf.

Because of this I often run into fake returns, which I can manipulate to
some degree, in Find & Replace with ^p and ^13.

However, if I select a group of these paragraphs and apply a Style to
them, their fakeness is revealed and the block of paragraphs is treated as
if it were one paragraph.

I used to see and solve a problem like this, in which a para mark had a
^10 before or after it, but I haven't seen those ^10s since a couple Word
or operating system versions ago.

Any ideas?
WilliamW

**WilliamWMeyer** · October 9th, 2008, 04:53 PM posted to microsoft.public.word.formatting.longdocs

"Klaus Linke" wrote in message
...
Hi William,

Yes, as you say that's a problem that has been around a while.
The issues with the non-working ¶ para marks should go away once you save
the file in a native Word format (doc, rtf, docx...).

Or, if that's not a good option, replace ^13 with ^p, and maybe ^10 with
^p to be on the safe side. I do that routinely at the beginning of macros
that process text files or other non-word files.

Regards,
Klaus

Hi Klaus,

Thanks for responding. Yes, I remembered after posting that I asked this
before (!), and that the response you gave me then about changing ^13 to ^p,
does solve the problem. (^13 to ^p does the same thing, either with
*wildcards* checked or without.)

The main thing that throws me, is that I use a wonderful macro I got from a
Microsoft-provided template called Macros8 that was supplied with Word
several versions back. That template has a number of useful macros, but the
one I use most is called ANSIValue, which displays the ANSI values of a
swiped group of characters.

Therefore, when I swipe these four characters surrounding a para return:

e.¶
G

I get 101 46 13 71

Regardless of whether it's fake paras or real paras I get that 13 -- and
only that 13.

However, in earlier days of Word I would see 10 13 often enough (or 13 10, I
don't remember the order), but I haven't seen a true 10 13 this way in
several years. If you look at the Word file in a text editor there's no sign
of a difference, so I figure the difference must be in the header of the
Word file that specifies that there is one type of para-break encoding, when
in fact the file contains mixed para-break encodings.

Along these same lines of getting under the hood of what's happening in the
Word file, I'd love to have a better understanding of how to determine when
files contain Unicode versus when they don't, whether files sometimes
*think* they contain Unicode but in fact they don't and vice versa, etc.

--WilliamW

"WilliamWMeyer" wrote:
Hi folks,

I often am dealing with Word files I didn't create. That is, a human
being other than myself created them (perhaps on Macs), or Save As/Export
filters created them. For example, Saving As .doc/.rtf out of Adobe
Acrobat, or OCR scanning software that saves as .doc/rtf.

Because of this I often run into fake returns, which I can manipulate to
some degree, in Find & Replace with ^p and ^13.

However, if I select a group of these paragraphs and apply a Style to
them, their fakeness is revealed and the block of paragraphs is treated
as if it were one paragraph.

I used to see and solve a problem like this, in which a para mark had a
^10 before or after it, but I haven't seen those ^10s since a couple Word
or operating system versions ago.

Any ideas?
WilliamW

**Klaus Linke** · October 9th, 2008, 05:55 PM posted to microsoft.public.word.formatting.longdocs

[...] If you look at the Word file in a text editor there's no sign of a
difference, so I figure the difference must be in the header of the Word
file that specifies that there is one type of para-break encoding, when in
fact the file contains mixed para-break encodings.

Yes, something like that is my guess too. If you could look into the binary
*.doc format (or its equivalent in memory once Word has loaded a doc),
functioning paragraph marks would likely have a pointer associated with them
that points to a data structure with the style and all the paragraph
formatting.
In the problematic cases, that pointer wasn't created. Just speculation,
though.

Along these same lines of getting under the hood of what's happening in
the Word file, I'd love to have a better understanding of how to determine
when files contain Unicode versus when they don't, whether files sometimes
*think* they contain Unicode but in fact they don't and vice versa, etc.

Interesting questions... One quick way to tell if a file has "Unicode
characters" (precisely, characters that aren't in the old Windows code page
1252) is to try to save as Plain Text (*.txt), choosing the Windows
(Standard) encoding.
If the file contains such characters, the dialog shows a yellow exclamation
mark, and the characters that can't be saved are marked red in the preview
window.

Greetings,
Klaus

**WilliamWMeyer** · October 9th, 2008, 11:20 PM posted to microsoft.public.word.formatting.longdocs

"Klaus Linke" wrote in message
...
Along these same lines of getting under the hood of what's happening in
the Word file, I'd love to have a better understanding of how to
determine when files contain Unicode versus when they don't, whether
files sometimes *think* they contain Unicode but in fact they don't and
vice versa, etc.

Interesting questions... One quick way to tell if a file has "Unicode
characters" (precisely, characters that aren't in the old Windows code
page 1252) is to try to save as Plain Text (*.txt), choosing the Windows
(Standard) encoding.

I know about this, and do it, but I'd like to able to control these things
without a human being having to look at the file.

I've been able to use VBA to cycle through a file character by character.
Typically for files that use Unicode chars, the chars used are within a
100-200 char unicode range. Once I've identified what that range is, then I
change the values my macro searches for to the values in that range. But
going char by char *and* going unicode value by unicode value through the
whole 6000-char range of unicode values would take a verrry long time.
Already, going char by char through the file takes pretty long.

**Klaus Linke** · October 9th, 2008, 11:53 PM posted to microsoft.public.word.formatting.longdocs

I know about this, and do it, but I'd like to able to control these things
without a human being having to look at the file.

Yes, that was the "low tech" approach g

I've been able to use VBA to cycle through a file character by character.
Typically for files that use Unicode chars, the chars used are within a
100-200 char unicode range. Once I've identified what that range is, then
I change the values my macro searches for to the values in that range. But
going char by char *and* going unicode value by unicode value through the
whole 6000-char range of unicode values would take a verrry long time.
Already, going char by char through the file takes pretty long.

Then maybe I have something a little more high-tech for ya (see code
below)...
If you'd rather put the results in an array and process it, instead of
printing it out at the end of the document, I'm sure you can adapt the code.

Klaus

Sub CodesFast()
Dim myString, myStringNew, myChar, myCode
Dim strOutput, HexString, myCharCount
myString = ActiveDocument.Content.Text
strOutput = ""
Do
myChar = left$(myString, 1)
myStringNew = Replace(myString, myChar, "", 1, Compa=vbBinaryCompare)
myCharCount = Len(myString) - Len(myStringNew)
myCode = AscW(myChar) And &HFFFF&
strOutput = strOutput & (myCode) & vbTab
StatusBar = myCode
HexString = Hex$(myCode)
While Len(HexString) 4
HexString = "0" & HexString
Wend
strOutput = strOutput & "U+" & HexString & vbTab
If myCode 31 Then
strOutput = strOutput & myChar
End If
strOutput = strOutput & vbTab & LTrim(STR$(myCharCount))
strOutput = strOutput & vbCr
myString = myStringNew
Loop Until Len(myString) = 0
ActiveDocument.Content.Select
Selection.Collapse Direction:=wdCollapseEnd
Selection.Range.InsertParagraphBefore
Selection.TypeText Text:=" "
Selection.Expand Unit:=wdParagraph
With ActiveDocument.Bookmarks
.Add Range:=Selection.Range, Name:="Codes"
.DefaultSorting = wdSortByName
.ShowHidden = False
End With
Selection.Collapse Direction:=wdCollapseStart
Selection.TypeText strOutput
Selection.GoTo What:=wdGoToBookmark, Name:="Codes"
Selection.ConvertToTable Separator:=wdSeparateByTabs
Selection.SORT ExcludeHeader:=False, FieldNumber:=1, _
SortFieldType:=wdSortFieldNumeric, _
SortOrder:=wdSortOrderAscending
Selection.Rows.ConvertToText Separator:=wdSeparateByTabs
ActiveDocument.Bookmarks("Codes").Delete
End Sub

**WilliamWMeyer** · October 10th, 2008, 11:01 PM posted to microsoft.public.word.formatting.longdocs

"Klaus Linke" wrote in message
...
I know about this, and do it, but I'd like to able to control these
things without a human being having to look at the file.

Yes, that was the "low tech" approach g

I've been able to use VBA to cycle through a file character by character.
Typically for files that use Unicode chars, the chars used are within a
100-200 char unicode range. Once I've identified what that range is, then
I change the values my macro searches for to the values in that range.
But going char by char *and* going unicode value by unicode value through
the whole 6000-char range of unicode values would take a verrry long
time. Already, going char by char through the file takes pretty long.

Then maybe I have something a little more high-tech for ya (see code
below)...
If you'd rather put the results in an array and process it, instead of
printing it out at the end of the document, I'm sure you can adapt the
code.

Klaus

Wow. Thanks, Klaus.

I tried it, and saw the results. The Hex and binary stuff in the code are
beyond my depth right now, but I think I can use this as a jumping off point
for further exploration.

-WilliamW

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode