2008-03-14 21:16:17
#!/usr/bin/ruby # vim: set fileencoding=utf-8 : import sys import re word = sys.argv[1] file = sys.argv[2] bf = sys.argv[3] txt = open(file,'r').read() txt = re.sub('\x0D\x0A|\x0A|\x0D', '\n', txt) txt = re.sub('(\n)+', ' ', txt) txt = re.sub('\t', ' ', txt) txt = re.sub('<.+?>', ' ', txt) lines = [] strt = 1 while strt > 0: pos = txt.find(word,strt) if pos == -1: break left = txt[pos-30:pos] right = txt[pos:(pos + len(word) + 30)] bwords = left.split(' ') bwords.reverse() fwords = right.split(' ') if bf == 'f': lines.append([fwords[1],left,right,str(pos)]) else: lines.append([bwords[1],left,right,str(pos)]) strt = pos + 1 lines.sort() for line in lines: print line[1].rjust(25) + line[2] + ":" + line[3]キーワードの前の単語あるいは後ろ(一つ先)の単語を先頭にして、続いて検索語の前30文字、後ろ30文字、出現位置という順のリストを項目とする二次元リストを作って、ソートしてから表示させるだけ。前回と同じファイルを検索対象文書としてmoonlightを調べてみる。まずはキーワードの前の単語で:
$ python kwic.py moonlight drodr10.txt b And choose, my son, rather a moonlight night when you sing under tho:18413 all down the long room across moonlight and blackness, came the lady :221914 all alone with disaster, and moonlight pouring down, and the black g:122007 when her lover came again by moonlight had cast them all from her fr:44481 apiers crossing each other by moonlight begun to gleam in the street :206461 panish gardens, remembered by moonlight in Spring, for the other end :20838 rily towards him in the clear moonlight with a sword. Morano was frig:210798 For some moments the spell of moonlight on sunlight hovered: the air :194333 er, lurking along the edge of moonlight and darkness, disappearing an:217555 . He saw her pass through the moonlight and grow dimmer, and glide to:218029 a sheer wall that even in the moonlight fell into blackness. Rodrigue:38713 at, if interrupters come, the moonlight is better suited to the play :19175 ch lies always so near to the moonlight, and was not in front of him :215294 elf in a long hall lit by the moonlight only, which was looking in th:216337 grow dimmer, and glide to the moonlight again that streamed through a:218073 hat had frowned at him in the moonlight when he came here before, fro:355299 hat now approached him in the moonlight round a corner of the house w:205397 he sunset grew dimmer and the moonlight stole in softly, as a cat mig:193054 ho might be familiar with the moonlight in that shadowy chamber shoul:42272 layed languorous music in the moonlight and sang soft by her low balc:44267 longer than man's. And if the moonlight streamed on untroubled, and t:204443 merry play of my sword in the moonlight was often a joy to see, it so:19303 o danced, so sparkled. In the moonlight also one makes no unworthy st:19383 oks they had given him in the moonlight, and all looked back at him w:355650 rafina was moving through the moonlight as though its rays were her s:218946 re was glowing redly into the moonlight through the wide door made fo:199239 traits that were clear in the moonlight eyed him with absolute apathy:217201 w, or whether it was only the moonlight, he never knew. Their spirits:356000 night, all drenched in white moonlight, sheltering huge darkness in :121534 enough, it was not yet wholly moonlight when cantering hooves came do:350318 a chamber partly shining with moonlight. "In there," said the man tha:216854 n white, and all shining with moonlight, came Serafina. Rodriguez in :217931次にキーワードの一つ先の語で:
python kwic.py moonlight drodr10.txt f a chamber partly shining with moonlight. "In there," said the man tha:216854 grow dimmer, and glide to the moonlight again that streamed through a:218073 o danced, so sparkled. In the moonlight also one makes no unworthy st:19383 . He saw her pass through the moonlight and grow dimmer, and glide to:218029 all down the long room across moonlight and blackness, came the lady :221914 ch lies always so near to the moonlight, and was not in front of him :215294 er, lurking along the edge of moonlight and darkness, disappearing an:217555 layed languorous music in the moonlight and sang soft by her low balc:44267 oks they had given him in the moonlight, and all looked back at him w:355650 rafina was moving through the moonlight as though its rays were her s:218946 apiers crossing each other by moonlight begun to gleam in the street :206461 n white, and all shining with moonlight, came Serafina. Rodriguez in :217931 traits that were clear in the moonlight eyed him with absolute apathy:217201 a sheer wall that even in the moonlight fell into blackness. Rodrigue:38713 when her lover came again by moonlight had cast them all from her fr:44481 w, or whether it was only the moonlight, he never knew. Their spirits:356000 ho might be familiar with the moonlight in that shadowy chamber shoul:42272 panish gardens, remembered by moonlight in Spring, for the other end :20838 at, if interrupters come, the moonlight is better suited to the play :19175 And choose, my son, rather a moonlight night when you sing under tho:18413 For some moments the spell of moonlight on sunlight hovered: the air :194333 elf in a long hall lit by the moonlight only, which was looking in th:216337 all alone with disaster, and moonlight pouring down, and the black g:122007 hat now approached him in the moonlight round a corner of the house w:205397 night, all drenched in white moonlight, sheltering huge darkness in :121534 he sunset grew dimmer and the moonlight stole in softly, as a cat mig:193054 longer than man's. And if the moonlight streamed on untroubled, and t:204443 re was glowing redly into the moonlight through the wide door made fo:199239 merry play of my sword in the moonlight was often a joy to see, it so:19303 enough, it was not yet wholly moonlight when cantering hooves came do:350318 hat had frowned at him in the moonlight when he came here before, fro:355299 rily towards him in the clear moonlight with a sword. Morano was frig:210798前が「back」のb、先が「forward」のfで指示しているが、実はbは不要。省略すると検索語の前の単語の順で並べるようになっている。この場合、文末や文頭を区別できないので、前の文や次の文の単語がでてきてしまう。そこで次のようにしてみた。キーワードの後ろの8文字を取得するようにしたのである。単語を認識していない。
#!/usr/bin/ruby # vim: set fileencoding=utf-8 : import sys import re word = sys.argv[1] file = sys.argv[2] bf = sys.argv[3] length = len(word) txt = open(file,'r').read() txt = re.sub('\x0D\x0A|\x0A|\x0D', '\n', txt) txt = re.sub('(\n)+', ' ', txt) txt = re.sub('\t', ' ', txt) txt = re.sub('<.+?>', ' ', txt) lines = [] strt = 1 while strt > 0: pos = txt.find(word,strt) if pos == -1: break left = txt[pos-30:pos] right = txt[pos:(pos + len(word) + 30)] bwords = left.split(' ') bwords.reverse() fwords = right.split(' ') if bf == 'f': lines.append([right[length:length + 8],left,right,str(pos)]) else: lines.append([bwords[1],left,right,str(pos)]) strt = pos + 1 lines.sort() for line in lines: print line[1].rjust(25) + line[2] + ":" + line[3]すると次のような結果になる。
$ python kwic.py moonlight drodr10.txt f grow dimmer, and glide to the moonlight again that streamed through a:218073 o danced, so sparkled. In the moonlight also one makes no unworthy st:19383 all down the long room across moonlight and blackness, came the lady :221914 er, lurking along the edge of moonlight and darkness, disappearing an:217555 . He saw her pass through the moonlight and grow dimmer, and glide to:218029 layed languorous music in the moonlight and sang soft by her low balc:44267 rafina was moving through the moonlight as though its rays were her s:218946 apiers crossing each other by moonlight begun to gleam in the street :206461 traits that were clear in the moonlight eyed him with absolute apathy:217201 a sheer wall that even in the moonlight fell into blackness. Rodrigue:38713 when her lover came again by moonlight had cast them all from her fr:44481 panish gardens, remembered by moonlight in Spring, for the other end :20838 ho might be familiar with the moonlight in that shadowy chamber shoul:42272 at, if interrupters come, the moonlight is better suited to the play :19175 And choose, my son, rather a moonlight night when you sing under tho:18413 For some moments the spell of moonlight on sunlight hovered: the air :194333 elf in a long hall lit by the moonlight only, which was looking in th:216337 all alone with disaster, and moonlight pouring down, and the black g:122007 hat now approached him in the moonlight round a corner of the house w:205397 he sunset grew dimmer and the moonlight stole in softly, as a cat mig:193054 longer than man's. And if the moonlight streamed on untroubled, and t:204443 re was glowing redly into the moonlight through the wide door made fo:199239 merry play of my sword in the moonlight was often a joy to see, it so:19303 enough, it was not yet wholly moonlight when cantering hooves came do:350318 hat had frowned at him in the moonlight when he came here before, fro:355299 rily towards him in the clear moonlight with a sword. Morano was frig:210798 oks they had given him in the moonlight, and all looked back at him w:355650 ch lies always so near to the moonlight, and was not in front of him :215294 n white, and all shining with moonlight, came Serafina. Rodriguez in :217931 w, or whether it was only the moonlight, he never knew. Their spirits:356000 night, all drenched in white moonlight, sheltering huge darkness in :121534 a chamber partly shining with moonlight. "In there," said the man tha:216854コンマやピリオドは下の方にまとめられる。どちらがいいかは好みの問題だと思う。