Tolle et Lege Diary

2008-09-02 22:42:40

　ようやく使い方が少し解ったので、いつものようにKWICである。gglkwic.appspot.comで使えるようにした。コードはしたのような感じで。

import cgi
import wsgiref.handlers
import urllib
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
from django.utils import simplejson
import re
from re import *


class MainPage(webapp.RequestHandler):
  def get(self):
    self.response.out.write("""
      <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja">
	<head>
	<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
	<link type="text/css" rel="stylesheet" href="/css/gglkwic.css" />
	<title>GoogleKWIC</title>
	</head>
        <body>
        <h2>GoogleKWIC</h2>
          <form action="/kwic" method="post">
            <div><input type="text"  name="word" ></div>
            <div>言語：
	    <select name="lang">
		<option value="ja" selected>日本語</option>
		<option value="sl">スロヴェニア語</option>
		<option value="sv">スウェーデン語</option>
		<option value="tr">トルコ語</option>
	    </select>
	    結果数：
            <select name="resnum">
		<option value="2" selected>16</option>
		<option value="4">32</option>
	    </select>
	    </div>
            <div><input type="submit" value="検索"></div>
          </form>
          <p>調べたい語句を入力し、検索ボタンを押してください。<br />
          検索は<a href='http://code.google.com/apis/ajaxsearch/'>
          Google AJAX Search API</a>を利用します。</p>
        </body>
      </html>""")


def kwic(word,lang,resnum):
  urlbase = 'http://ajax.googleapis.com/ajax/services/search/web?'
  res = []
  for i in range(int(resnum)):
    query = {'v':'1.0','q':word.encode('utf-8'),'hl':lang,'rsz':'large',\
    'start':str(i*8),'key':'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'}
    url = urlbase + urllib.urlencode(query)
    result = urlfetch.fetch(url)


    if result.status_code == 200:
      a = simplejson.loads(result.content)
      results = a['responseData']['results']
      for r in results:
        title = r['titleNoFormatting']
        content = r['content']
        url = r['url']
        res.append({'title':title,'content':content,'url':url})
  return res


class Result(webapp.RequestHandler):
  def post(self):
    word = self.request.get('word')
    lang = self.request.get('lang')
    if lang == ('ja' or 'ko' or 'zh-TW' or 'zh-CN'):
      width = 20
    else:
      width = 30
    resnum = self.request.get('resnum')
    results = kwic(word,lang,resnum)
    table = "<table border='0'>"
    for r_item in results:
      content = r_item['content'].replace('<b>...</b>','')
      kwords = re.findall('<b>.+?</b>',content)
      parts = re.split('<b>.+?</b>',content)
      j = 0
      for kword in kwords:
        trow = "<tr><td align='right'>"+parts[j][(len(parts[j]))-width:]+\
        "</td><td align='center'>"+kword+"</td><td>"+parts[j+1][0:width]+\
        "</td><td><a href='"+r_item['url']+"' target='_blank'>"+\
        r_item['title'][0:10]+"</a></td></tr>"
        j += 1
        table += trow
    table += "</table>"
    self.response.out.write('<html xmlns="http://www.w3.org/1999/xhtml" \
    xml:lang="' + lang + '">\n<head>\n<meta http-equiv="content-type" \
    content="text/html; charset=UTF-8" />\n<link type="text/css" \
    rel="stylesheet" href="/css/gglkwic.css" />\n<title>GoogleKWIC\
    </title>\n</head>\n<body><h2>GoogleKWIC</h2>\n<p>Keyword = <strong>')
    self.response.out.write(cgi.escape(word))
    self.response.out.write(('</strong></p>\n'))
    self.response.out.write(table)
    self.response.out.write('</body></html>')


def main():
  application = webapp.WSGIApplication([('/', MainPage),
                                        ('/kwic', Result)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)


if __name__ == '__main__':
  main()

　昨日も書いたように、一回の検索で返ってくる結果の上限はわずか８件。これを何回もまわせばいいのかと思ったが、５回以上回したらエラーが出てしまった。毎回１秒待ってみたけれども、同じだった。ばれているのか。最大32件じゃ頻度を集計して並べ替えたりする気にもならない。これでは全然面白くない。件数の多さではTechnoratiなのだが、どうも使いにくい。何れにせよ、もっと面白い使い方を探してみよう。

Back to Home

過去の日記

2021年

１月
２月
３月
４月
５月
６月
７月
８月
９月
10月
11月
12月

2013年

１月
２月
３月
４月
５月
６月
７月
８月
９月
10月
11月
12月

2012年

2011年

2010年

2009年

2008年

2007年

１月
２月
３月
４月
５月
６月
７月
８月
９月
10月
11月
12月

日記検索

ホームページに戻る

屋根裏の備忘録

2008-09-02 22:42:40

日記検索