<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tesseract: Google re-releases HP&#8217;s OCR tool</title>
	<atom:link href="http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/</link>
	<description></description>
	<lastBuildDate>Thu, 27 Oct 2011 01:37:59 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Steve</title>
		<link>http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/comment-page-1/#comment-187</link>
		<dc:creator>Steve</dc:creator>
		<pubDate>Sun, 24 Sep 2006 18:52:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.nemik.net/2006/09/05/google-re-releases-hps-ocr-tool/#comment-187</guid>
		<description>I guess that comment was too long.

Anyway, the tesseract part was:

convert $i.pgm \ 
  -bordercolor white \ 
  -border 10 \ 
  temp.tif
tesseract temp.tif $i.pgm batch</description>
		<content:encoded><![CDATA[<p>I guess that comment was too long.</p>
<p>Anyway, the tesseract part was:</p>
<p>convert $i.pgm \<br />
  -bordercolor white \<br />
  -border 10 \<br />
  temp.tif<br />
tesseract temp.tif $i.pgm batch</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve</title>
		<link>http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/comment-page-1/#comment-186</link>
		<dc:creator>Steve</dc:creator>
		<pubDate>Sun, 24 Sep 2006 18:50:27 +0000</pubDate>
		<guid isPermaLink="false">http://blog.nemik.net/2006/09/05/google-re-releases-hps-ocr-tool/#comment-186</guid>
		<description>I got it to work pretty well for .pgm files for subtitle ripping from DVDs.  Seemed to slightly outperform gocr.  The files I am working with are strictly black and white, and I had to add a white border to the .tif file for it to function properly.

#!/bin/bash
 
INPUT=$1
TITLE=$2
LANG=$3
COLOR=$4
 
SUBTITLE=`mplayer -dvd-device $INPUT dvd://1 -vo null -ao null -frames 0 -v 2&gt;&amp;1 &#124; grep sid &#124; grep $LANG &#124; awk -F&#039; &#039; &#039;{print &quot;0x&quot; 20+$5}&#039;`
 
echo $SUBTITLE
 
tccat -i $INPUT -T $TITLE -L &#124; tcextract -x ps1 -t vob -a $SUBTITLE &gt; subs-$3
 
subtitle2pgm -o $3 -c $4  $LANG.srt</description>
		<content:encoded><![CDATA[<p>I got it to work pretty well for .pgm files for subtitle ripping from DVDs.  Seemed to slightly outperform gocr.  The files I am working with are strictly black and white, and I had to add a white border to the .tif file for it to function properly.</p>
<p>#!/bin/bash</p>
<p>INPUT=$1<br />
TITLE=$2<br />
LANG=$3<br />
COLOR=$4</p>
<p>SUBTITLE=`mplayer -dvd-device $INPUT dvd://1 -vo null -ao null -frames 0 -v 2&gt;&amp;1 | grep sid | grep $LANG | awk -F&#8217; &#8216; &#8216;{print &#8220;0x&#8221; 20+$5}&#8217;`</p>
<p>echo $SUBTITLE</p>
<p>tccat -i $INPUT -T $TITLE -L | tcextract -x ps1 -t vob -a $SUBTITLE &gt; subs-$3</p>
<p>subtitle2pgm -o $3 -c $4  $LANG.srt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nemik</title>
		<link>http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/comment-page-1/#comment-128</link>
		<dc:creator>nemik</dc:creator>
		<pubDate>Fri, 15 Sep 2006 21:42:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.nemik.net/2006/09/05/google-re-releases-hps-ocr-tool/#comment-128</guid>
		<description>I eventually did get something but the detection still sucks pretty bad. I used the &#039;convert&#039; command  to adjust the contrast a lot to get something working but it only recognizes a few characters. 

Maybe Google will fix it for nicer OCR for its book publishing thing, but something tells me they won&#039;t release it if they do.</description>
		<content:encoded><![CDATA[<p>I eventually did get something but the detection still sucks pretty bad. I used the &#8216;convert&#8217; command  to adjust the contrast a lot to get something working but it only recognizes a few characters. </p>
<p>Maybe Google will fix it for nicer OCR for its book publishing thing, but something tells me they won&#8217;t release it if they do.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: david dahl</title>
		<link>http://blog.nemik.net/2006/09/google-re-releases-hps-ocr-tool/comment-page-1/#comment-126</link>
		<dc:creator>david dahl</dc:creator>
		<pubDate>Fri, 15 Sep 2006 11:58:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.nemik.net/2006/09/05/google-re-releases-hps-ocr-tool/#comment-126</guid>
		<description>I did the same thing... same results. did you try making the tif a bitonal image with imagemagick? I would love to get this tool integrated with python.</description>
		<content:encoded><![CDATA[<p>I did the same thing&#8230; same results. did you try making the tif a bitonal image with imagemagick? I would love to get this tool integrated with python.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

