Curious behaviour: ASCII File Processing

acantor · September 1, 2020

I have written a script to import five sentences from a text-only file.

ASCII File Begin Process: "C:\Test.txt" (Tab Delimited Text (.txt))
ASCII File End Process

<ASCII FILE BEGIN PROCESS Filename="C:\\Test.txt" Format="Tab" Start_Record="1" Process_All="TRUE" Records="1" Variable="%Text%" Start_Index="1" Parse_Blank_Lines="FALSE" Clear_Array="TRUE"/>
<ASCII FILE END PROCESS/>

I pressed the tab key once between each of the five sentences. My sample shows where each tab is located:

"Hello this is 1."<TAB>"And this is 2."<TAB>"Three = 3!"<TAB>"Four four four"<TAB>"The last of five!"

Then my script outputs the five results:

Variable Set Integer %x% to 1
Repeat Start (Repeat 5 times)
  Text Type (Simulate Keystrokes): %Text[%x%]%
  Variable Modify Integer %x%: Increment
End Repeat
 
<VARIABLE SET INTEGER Option="\x00" Destination="%x%" Value="1"/>
<REPEAT START Start="1" Step="1" Count="5" Save="FALSE"/>
<TEXT TYPE Action="0" Text="%Text[%x%]%"/>
<VARIABLE MODIFY INTEGER Option="\x07" Destination="%x%"/>
<END REPEAT/>

It works perfectly for %Text[2]% through %Text[5]%, but there is a wrinkle with %Text[1]%. Instead of outputting this:

Hello this is 1.

I get this:

ï»¿"Hello this is 1."

In other words, the macro is printing the quote marks that delineate %Text[1]% in the file, prefaced with three symbols:

ï»¿

I created the text file in Notepad, so it shouldn't contain any weird or invisible characters.

Any ideas about what is going on?

My kludge for getting rid of the extra characters is to do this. But what an awful workaround!

Variable Modify String: Delete part of text from %Text[1]% starting at 1 and 4 characters long
Variable Modify String: Replace """ in %Text[1]% with ""

Cory · September 1, 2020

I dont' think your file is ASCII, I think it's UTF-8. Those first characters might be the BOM (Byte Order Markers). Look at your file with a hex editor. I use UltraEdit but it's money, so try Notepad++.

Don't forget Notepad was upgraded to support Unicode, and by default is saves to UTF-8 now, not ASCII. In Notepad go File > Save As > and look in the lower right and tell me which encoding you have selected. I'm guessing it's UTF-8 with BOM.

Cory · September 1, 2020

2020-09-01_13-09-12.jpg.bf248e7a1fe39ba86f94e9c17f1d8a15.jpg

Duh. I shoudl have done this first. I created a test file in Notepad and saved as UTF-8 with BOM? See how it starts with 0xEF, 0XBB, 0XBF?

Click here for an explanation of BOM.

acantor · September 1, 2020

Thank you, Cory! I had no idea that Notepad now supports Unicode.

Saving the file and specifying ASCII encoding fixed the problem and solved the mystery.

Cory · September 1, 2020

You're welcome.

MEP doesn't support Unicode, so years ago I had to process a UTF-16 file and what I did was to delete the first three bytes and then take every other byte. Since the text was ASCII, Unicode has the same code page essentially so the first byte of the two bytes could be ignored. I learn much about Unicode then.

Sign In

Curious behaviour: ASCII File Processing

Recommended Posts

acantor

Link to comment

Share on other sites

Cory

Link to comment

Share on other sites

Cory

Link to comment

Share on other sites

acantor

Link to comment

Share on other sites

Cory

Link to comment

Share on other sites

Join the conversation

Browse

Activity