Saturday, November 05, 2011

Grabbing the vanity card of TBBT into an image

The producer of the TV show "The Big Bang Theory", Mr. Chuck Lorre, always shows the vanity card in the end of each episode. He also posts the same cards on his own website along with those for other shows he produced.

Recently, for some reason, I would like to attach as an image in a e-mail the vanity card for a specific episode of the show from the website. I prefer the image to only contain the content of the card rather than the whole page. This, of course, could be done with screen capturing and cropping of the image using something like GIMP or ImageMagick. However, since I'm a lazy guy, and the chance that I will do this more than once is quite high, manually screen capturing and cropping is certainly not an option for me. Fortunately, I have some ideas on how to do this automatically.


To grab the web page into an image on command line, there are lots of possible ways to do this. The weapon of choice is the still-buggy-but-quite-useful wkhtmltoimage from the project wkhtmltopdf. wkhtmltoimage uses WebKit and Qt to render a given page directly into an image. The great thing about this tool is that, it supports CSS and JavaScript from the page, while you can replace the CSS with your own version and can also append some JavaScripts before rendering happens.

At first, I was trying to render the page into an image, and then pass the image into ImageMatick's convert to cut out only the block of the "vanity card" in the page. However, this approach was proven to be problematic, since it is hard to automatically determine the cropping parameters needed for the "-crop" option of convert. After inspecting the HTML and CSS sources of the page, I decided to experiment with the "visibility" attribute in the CSS definition. I downloaded the CSS file, set the "visibility" attribute to "hidden" for the top most selector (the "#container" selector block in this case), turned on the visibility only for the "#content" block, and supplied the customized CSS to wkhtmltoimage. This gave me an rendered image that only shows the "card" block in the center of a white background. The white "border" then can be easily removed using the "-trim" option of convert.

Although the downloading-and-modifying-CSS approach was a success, supplying a whole modified CSS to wkhtmltoimage is not elegant and could have some potential side-effects. Therefore, the better approach is taking advantage of the ability for wkhtmltoimage to run JavaScripts to alter the "visibility" attribute for appropriate selectors after the page is done loading. Here is my final "one-liner" solution to my problem:


$ wkhtmltoimage \
--run-script "document.getElementById('container').style.visibility='hidden';" \
--run-script "document.getElementById('content').style.visibility='visible';" \
http://chucklorre.com/index-bbt.php?p=364 - \
| convert - -trim tbbt.jpg

The generated JPEG image, "tbbt.jpg", only contains the "card" I want.

The principle behind this could also be applied to other pages. I, as usual, wrote a script to save me some typing that can take an optional production number argument to grab the card for an specific episode. However, since it is an very simple script, I won't bother to post the code here...

No comments: