dgehman Posted February 25, 2021 Report Share Posted February 25, 2021 I need to clean up a text file that is an index (like a book index, an alphabetical list of words and their page numbers). I want to strip out the single alpha heading (a, b, c, d, etc) and the page numbers. The output can have one or more page numbers and ranges (e.g., ", 21", after "abs" and ", 10, 15, 17, 19-22" after the word "array" in the example below). Example: Quote A AAA, 2, 6 abs, 21 Accessors, 1, 5 acos, 2, 7 Algorithm, 6 arcsegs, 15 arg, 19 args, 16 arguments, 23 Aribitrary, 6 Array, 3 array, 10, 15, 17, 19-22 arrays, 10, 19-21 asin, 2, 7 assembly, 14 atan, 2, 7-8 B BBox, 12-13 The alpha head is always one letter. The individual index lines are comma separated. The ideal result for that example would be: Quote AAA abs Accessors acos Algorithm arcsegs arg args arguments Aribitrary Array array arrays asin assembly atan BBox To remove the single-letter alpha head: Is it possible to search for a single letter [a-z] + CR, then delete that letter + CR? Is there a better way? To delete the page numbers... and here, I'm stuck -- need problem-solving approaches and/or any suggestions. Quote Link to comment Share on other sites More sharing options...
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.