alexlimbert Posted December 3, 2008 Report Share Posted December 3, 2008 I'm not sure why but only some CR/LF's are replaced // Set Environment variable "CRLF" to cr/lf characters Variable Set %T2% to ASCII Char of 10 Variable Set %T3% to ASCII Char of 13 Variable Modify String: Append %T2% to %T3% Variable Modify String: Save %T3% to Environment Variable // Replace Environment variable "CRLF" in String T1 with space Replace "%CRLF%" with " " in %T1% I am trying to do this in an htm file. I'm not sure if that's the problem. Please help. Thanks. Quote Link to comment Share on other sites More sharing options...
kevin Posted December 3, 2008 Report Share Posted December 3, 2008 Short Explanation Some files contain LF or CR characters alone. Also some may have LFCR instead of CRLF. Replacing CRLF will miss those characters. To determine what is going wrong, after running your macro to replace CRLFs, examine the results to see if there are any stray CRs or LFs. I predict you will find them. Some systems only use LFs, not CRLFs, to indicate the end of a line. Detailed Explanation CR is an abbreviation for Carriage Return. Think of an old typewriter or an old printer. When this character is processed the typewriter carriage or the print head moves all the way to the left but the paper is not moved up or down. With this equipment you can print one line on top of another by ending a line with only a CR. LF is an abbreviation for Line Feed. When this character is processed the paper is moved up one line without moving the print head to the left or to the right. You can continue printing on a line below while keeping the same left to right orientation by using a LF character. It is perfectly valid to have CR LF LF LF to move down 3 lines (note the 3 LFs). With images on a computer screen and laser printers there is not a physical 'print head' or 'carriage' but the functionality is emulated. On some computer systems it was decided that it didn't make sense to print two lines on top of one another. So when a CR (without a LF) is received it moves the 'carriage' all the way to the left AND down a line. Similarly, on some systems, a LF (without a CR) 'moves' to the left AND down a line. Some software may create files that include extra CRs or LFs like this: CRLFLF, CRCRLF or CRLFCR. Software may also reverse these characters using LFCR. Sometimes this is due to sloppy programming while other times it may be intentional. When viewing these files using the program they are written for, such as a text editor, word processor or browser, you generally cannot see any difference when the extra characters are there. When I need to know what is inside the file I use a hex editor to view the content of the file. A hex editor displays the hexadecimal values for each character in the file instead of interpreting their meaning. A Google search for 'hex editor' turns up a number of good references. Additional Information Other interesting characters that may be in a file are: HT - Horizontal Tab BS - Backspace Google 'ASCII chart' for more information about these control characters. Quote Link to comment Share on other sites More sharing options...
alexlimbert Posted December 6, 2008 Author Report Share Posted December 6, 2008 Very well written. Thank you. I downloaded a hex editor and it looks like I am getting a lot of "OD OA" combinations were the CR/LF's are. Now that I found this out, do you know if there is a solution to getting rid of them (i.e. the OD OA's)? Thank you. Quote Link to comment Share on other sites More sharing options...
alexlimbert Posted December 6, 2008 Author Report Share Posted December 6, 2008 Correction: 0D 0A not OD OA. Quote Link to comment Share on other sites More sharing options...
kevin Posted December 6, 2008 Report Share Posted December 6, 2008 I downloaded a hex editor and it looks like I am getting a lot of "OD OA" combinations were the CR/LF's are. 0D is the hexadecimal value for CR. 0D hex is 13 decimal. 0A is the hexadecimal value for LF. 0A hex is 10 decimal. So, you are seeing the original CRLFs in the file so you need to double check your macro. Quote Link to comment Share on other sites More sharing options...
alexlimbert Posted December 6, 2008 Author Report Share Posted December 6, 2008 The macro appears okay, but I am seeing something strange. When I save the CRLF stripped text to a text file, there are no more CRLFs or 0D0As. However, when I save the text to a column in a comma delimited CSV file, the column retains most (perhaps all) of the CRLF/0D0As that existed. I double checked and tested to see if this was happen was true by saving the stripped text results to a text file. I opened the text file in the hex editor and there were no 0D0As anywhere. I copied this text to the clipboard and pasted it into an MS Excel cell and the CRLFs returned. I copied this back to a text file, and reopened this text file in the Hex Editor and the 0D0As were back. I still wonder if HTML has something to do with this, or perhaps Excel. Quote Link to comment Share on other sites More sharing options...
stan Posted December 8, 2008 Report Share Posted December 8, 2008 Excel does add extra characters when copying a cell to the clipboard. Most likely this is what you are seeing in your test. Copy a cell to the clipboard and save to a text string variable. Then use the Variable Modify String - Trim command. Save the variable to a text file - Variable Modify String - Save to Text File. Open in the Hex Editor and the 0D0As should be gone. Do the same thing without the Trim and you will see the 0D0As. Quote Link to comment Share on other sites More sharing options...
alexlimbert Posted December 10, 2008 Author Report Share Posted December 10, 2008 Excel does add extra characters when copying a cell to the clipboard. Most likely this is what you are seeing in your test. Copy a cell to the clipboard and save to a text string variable. Then use the Variable Modify String - Trim command. Save the variable to a text file - Variable Modify String - Save to Text File. Open in the Hex Editor and the 0D0As should be gone. Do the same thing without the Trim and you will see the 0D0As. Thank you for the kind recommendation. In this case, unfortunately, I am setting up a database with 30,000 - 40,000 records. Since the database is Excel, the program receiving the field we are talking about has a bunch of carriage returns. I wonder what would happen if I stripped the HTML Tags out. I'll give that a try. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.