Saturday, April 13, 2013

Vim assistance for linking index entries

I thought the macros I'd written to assist indexing the epubs I'm creating were sufficient. But the latest book I'm converting has 2883 index entries. It took me an hour to index 100 and I'm not prepared to waste another 27 hours for the rest.

So I need more help from Vim. I purchased Steve Losh's Learn Vimscript the Hard Way (because he's been very generous in writing up what he's learned about Vim and VimScript) to learn a bit more about VimScript. I suspect what I need to write is more a case of RTFM than needing any "hidden", "secret" tips but reading the epub has inspired me to proceed with some experimentation.

The macro/keymapping has to do the following:
  • Starting with two panes
    • kwiclist open in 1st
    • MD file open in 2nd
  • Keymap to select line number under cursor in kwiclist and jump to corresponding line in MD file, select keyword and wrap it with scan tag then return to kwiclist pane.
    • must make wrap macro only wrap currently selected text
      • i.e. don't mess with any other wraps on line
  • Fix current tag wrap keymap to only wrap current selection
Being laid low with the 'flu, I was forced to spend a lot of time flat on my back in bed. MacBook Air to the rescue. In many ways writing a script for Vim is the ideal programming environment. I was able to add features keystroke by keystroke and test them. It allowed me to debug and clarify functions and operations of VimScript which I either didn't know or couldn't work out from the documentation. The immediately available help function is also immensely useful.

Anyway, I cheated a lot in terms of good practice. The suggested procedure is to define a Vim function then call its various methods via keymappings. Too long; couldn't wait. In the end I set up function keys F5, F6 and F8 as a chord that allows me to skip through the links quite quickly.

Vim is opened with the two panes:

5021| 156|  development social|all its forms
4886|4922|153|Device(s)
6213|6249|191|Device(s)
6284|6319|193|Device(s)
6456| 197|           Device(s)|State which l
4447|4484|140|Dialectic
4593|4629|144|Dialectic
============================================
produce a reflective activity, while the pra
automatically, as a result of established ha
<!-- 117 -->

If, now, we replace the <span id="ix6">abstr
or society of agents, this society must be c
<span id="ix540">differentiation of the Agent
They have their being, as agents, in the con

The key mappings are:

noremap <F5> :<c-u>execute "normal!
 \"ayw/\\a/\r:nohlsearch\rve\"by:wincmd
 j\r:let @\"=@a\r:@\"\r:let @/=subst
 itute(@b, '.*','\\L\\0', '')\r-nve"
 <cr>

vnoremap <F6> :s/\%V.*\%V./\='<span
 id="ix'.(@y+setreg('y',@y+1)).'">'.
 submatch(0)."<\/span>"/<cr>

noremap <F8> <c-w>k0+

For future reference, let's decode some of that cryptic VimScript.

Lines in the upper pane of the screen are KWIC (Key Word In Context) references. The lower pane is the Markdown file of the epub I am indexing.

If the KWIC indexer found a keyword in the context given it outputs a line like this:
5021| 156|  development social|all its forms
The 1st field is the linenumber in the Markdown file open in the lower pane. The 2nd field is the 'pagenumber'. Pagenumbers allow the KWIC indexer to define a startline and stop line for context but are no longer relevant for an epub so I enclose them in HTML comment tags. The 3rd field are the keywords that were searched for and the 4th field is the context in which the first keyword was found.
If the KWIC indexer could not find a keyword in the context it outputs a line like this:
4593|4629|144|Dialectic
The fields are: start-linenumber, end-linenumber, pagenumber, keyword. I haven't built any parsing of English into the KWIC indexer so it will not find 'dialectical' even though 'dialectic' occurs in the page.

So the procedure is first of all, set the 'y' register in Vim to the next ID it should insert:
:let @y=123
Then switch to the upper pane and place the cursor at the beginning of the next line to index. If the context of the keyword(s) looks correct i.e. the keyswords appear in the line of text shown, press F5.

F5 does the following;
:<c-u>execute "normal!
Clear the line markers and execute the following as though I were typing the keys
 \"ayw/\\a/\r
Yank the word under the cursor into register 'a'; search forward in the line to the next alpha character (i.e. skip over the numerics, whitespace and vertical bars)
:nohlsearch\r
Turn off the search highlighting (in this case practically everything in the file)
ve\"by:wincmdj\r
Visually highlight the word and yank it into register 'b'; change focus to the lower pane
:let @\"=@a\r
Copy register 'a' into the 'anonymous' register
:@\"\r
Run anonymous register as a command. In this case it means 'goto linenumber'
:let @/=substitute(@b, '.*','\\L\\0', '')\r
Load 'search' register with lower-cased text in register 'b'
-nve"
Skip back one line, search forward for next occurrence of search text and visually highlight the word.

At this point all the other occurrences of the primary keyword will be 'search highlighted' in the pane and I can check if 'visually highlighted' one is correct. If correct, I press F6, if not I skip to the correct occurrence and visually highlight it (i.e. 've') and then I press F6.

F6 does the following:
:s/\%V.*\%V./
Substitute the highlighted text
\='<span id="ix'.
for a 'span' tag with an id of 'ix' followed by
(@y+setreg('y',@y+1)).
the number in register 'y'; increment reg 'y' for next time
'">'.
Close the opening 'span' tag
submatch(0).
Insert the highlighted text
"<\/span>"/
Insert the closing 'span' tag

Finally I press F8 which moves the cursor back to the upper pane and skips down one line, ready for the next F5.

So with my MacBook Air set so that function keys don't need the 'Fn' key as well (default is to require the 'Fn' key to be held), I can leave my fingers hovering over F5, F6 and F8 like a piano chord and press them accordingly. Quite often I have to also hit 'j' to skip over a 2nd or 3rd occurrence of a keyword in a page. Also often the 1st occurrence of the keyword is not the correct one and I have to press 'j' to skip down to the correct occurrence. F5 adjusts accordingly and highlights the later occurrence.

Edit: (Too close to the trees to see the forest) With only two panes, F8 does the exact equivalent of the 'j' key so in actual usage the function keys really are like playing a piano: F5, F6, F8, F8, F8, F5, F6, F8 etc. However, even with all this assistance index linking is still very time-consuming because of all the reading and thinking.

Indexing is at least twice as fast now. I did 400 last night in about an hour and that was after discovering a couple of typos in my index which required correction and re-running the KWIC indexer.