I’ve just moved the blog over to Amazon EC2 and so far everything seems to be going well. I’d been considering the move for a while and a new feature (well I’m not sure how new it is but I only just noticed it) is a new smaller instance type. The virtual servers Amazon offer used to come in three sizes, small medium and large starting at $0.10*. Pretty quickly they added some bigger sizes (going all the way up to $2.00 per hour for quadruple extra large) as well as some more specialized types like GPU clusters. But it still meant the minimum price per month for a server always on was about $74/month which is expensive for simple web hosting. Now however, their new micro instances are available at a pretty cool $0.02/hour (about $15 a month). For the performance you’re likely to get it’s still probably not the most cost effective solution for plain web hosting, but for having complete access to a server with high availability (and the extra features hosting on Amazon’s infrastructure provides like being able to clone a whole server with one click) it’s pretty good. One final note is to remember that these numbers are not the final costs you’ll have to pay. You still pay for storage and data transfer which in my case look like they’ll be about an extra 10% extra. * Since then the price of the small instance has come down to $0.085/hour or about $63/month.
In an effort to get more storage to share between the three computers at home (two Windows and one MythTV) I setup yet another machine running FreeNAS. FreeNAS is a small (about 30MB) operating system based on FreeBSD designed just to be a NAS (Network Attached Storage). You add hard drives to it and it makes them (optionally) available in several different ways, including:
After a few minor problems setting it up (like a power cable breaking and installing from an old CD-ROM drive that didn’t work) it works great. Copying a large (~40GB) chunk of files to it at once took a while but writing to and reading from it at more sensible levels isn’t noticeably slower than using local files (on a gigabit network).
Following my post about ripping DVDs, here is a method for transcoding the DVDs into something more manageable. I should point out that is probably for the more technical amongst you - there are certainly easier ways to do it but this has the advantage of being very automatable. Since MythTV (and Linux in general) seems to like ffmpeg for video encoding/decoding, I figured I’d use that. You can get a binary version for Windows and read the documentation. The actual command line I use to transcode is:
ffmpeg -i $in_file -vcodec xvid -qscale 5 -acodec copy $out_file That means to use
$in_file as input (a VOB file in my case), use the Xvid codec for the video, set the “quality” to 5, copy the audio straight from the original and save as
$out_file. The quality in this case is just simplification of lots of other settings that are available. 1 is perfect and 31 is the worst. 5 results in files that are about 500MB per hour with MPEG artifacts that are visible when I’m sat at y desk but not when I sit on my bed six feet away which is where I normally watch video from. It may be worth transcoding a short clip with a few different settings to see which your happy with. I made the whole process semi-automatic by writing a CLI PHP script that checks for VOB files in a specifc folder and transcodes the ones it finds. That way I can have the transcoding going on in the background while I rip the DVDs (and then leave it running it overnight to finish). I could make it available to anyone who wants it, but a batch files doing the same thing would probably be more useful for people… There is one last caveat. I originally encoded the movies with MP3 audio and then half way though decided I want to keep the 5.1 audio (which the above method does). However the version of ffmpeg I used at first had a problem such that AVIs with AC3 audio played back with no sound. If you have a similar problem make sure you have the latest version of ffmpeg you can get.
Fundamentally it’s a wiki like any other. But there is a cool layer on top of it that could be revolutionary (although like many Web 2.0 concepts will probably fall short and just be “cool” - we can hope). The interface allows you to create “situational applications” that can link different components together with the ease of a wiki. It doesn’t really make much sense just reading about it so go watch the video about it.
On a related note, you can now get snapshots of PHP 6.
I’ve been sorting out exactly what needs recording for the language app (which I finally have an idea for a name for) and I was trying to decide how much extra instructor speech is needed. Situations aren’t described for instance (no “Image an English man sitting next to a French woman”) and you aren’t asked to say things explicitly (“How do you ask someone if they speak English?”). Will this harm the process at all? The best thing to do perhaps would be to avoid trying to be Pimsleur quite so exactly.
The still unnamed language learning app is almost ready for a first public viewing. I’m just trying to get some audio of some other than myself. Firstly because I don’t like really hearing my own voice (and for this purpose my less than perfect pronunciation is too obvious) and secondly I need at least two people just for it not to be confusing.
In the meantime I thought I’d share an example of the script file I’m using: EntschuldigenSie.xml. It primarily contains English translations although one phrase is done in a few more languages. It does highlight one possible issue. I had to change the German ß to ss. Although Windows seems perfectly fine with Unicode file names (internally it uses Unicode for storage (either UCS2 or UTF-16 - not sure which)) PHP refuses to open them (
file_exists for instance just don’t work) and Apache 2 seems to have issues as well. For German there are workarounds but for other languages it will get fiddly. This might not even be a problem on Linux where it will ultimately reside and it only affects file names which only have to give you a rough idea of what’s inside. But still, it’s annoying…
The most important bits of my cool language learning web app are done. Here’s quick overview of how it works.
Everything is split into modules which are XML script files and accompanying audio files. Currently one type of script is supported, a “conversation”. This contains a short (less than 10 sentences) conversation with sub elements all marked up in XML. Sub elements are phrases, terms and notes. At the moment phrases and terms are handled almost identically. Notes are little explanations or possible stumbling points (for example the test script I have alerts the listener to the difference in the ending between “Ich verstehe” and “Sie verstehe_n_” in German). Any element of a conversation that is to be repeated is named (literally - the XML tag is given a
name attribute). The system keeps track of the number of times a name phrase/term is played to the user and when it was last played so the automatic repetition system can work.
A lesson is currently very simple. A module is loaded and the conversation is played straight through. Then the named phrases/terms are played* with translations. Then any phrases/terms scheduled for repetition are played*. The repetitions are actually determined before the conversation is played however so that if too many are required then no new conversation is played.
* Played in this case means a specific format. First the native version is played, then a pause, then the translation is played twice.
Zend, the commercial endeavour of the people who brought you PHP have a produced a framework, cleverly called the “Zend Framework”. It’s basically a lightweight MVC framework for PHP. Lightweight in this case is good. It doesn’t do as much as Rails does for Ruby (although it is significantly younger) - the most notable hole is a object-relational-mapping system. But it does provide URL rewriting for Rails-esque view/controller access. I started writing my clever language thingy in it. The biggest problem I had was getting it to work with IIS. Which I couldn’t. I decided since I had IIS installed I’d give it a go. Unfortunately you require mod_rewrite which IIS doesn’t have. So I installed ISAPI_rewrite, a version for IIS. After an hour of trying to get it to work I went and downloaded Apache 2.2. Which was my second mistake You see it seems PHP doesn’t work with Apache 2.2. Not sure why but I found a vague mention of it on a forum after trying for another hour to get it to work. So I got Apache 2.0 and everything worked. Of course there are reasons not to use PHP 5 with Apache 2, but meh. There is one little problem with the Zend Framework, I think. It seems to be printing a space somewhere before any other output. It wouldn’t be a problem except I need it to output XML and a space at the beginning makes Firefox (and probably Internet Explorer) explode. Apache, IIS, Zend, Zend Framework, MVC
My main goal is a Pimsleur style system but with the repetition handled by computer - i.e. with just the individual phrases (and words and syllables for earlier lessons) as audio files, the program should generate complete conversations with sensible parts repeated and useful instructor comments in between. That sounds like it requires some sort of script in some sort of markup language. Since it needs to be highly structured I guess that only leaves XML as a sensible possibility. So I marked up a conversation from Pimsleur’s German I.
There was an unexpected result. It’s fairly straight forward to have multiple source languages in one script file. Although there are certain things that would not work best this way, a lot of things in German (for instance) would be taught the same regardless of what language you are learning from. Ultimately source-language-specific scripts would have to be supported though.
To counteract the fact that I’m using Kubrick, the single most common Wordpress theme (by virtue of being the default) I’ve started to develop different themes for different categories. The first I’ve implemented is PHP which now has a different colour scheme and a different header image. Wordpress, themes