When MoNav learned to speak

A couple of months ago I opened a branch named »ui-ng« in MoNav‘s repository to develop a new user interface. Meanwhile this branch became a playground for several hacks, and some of them are just spikes to check out whether some ideas work or not.

I denied to work on speech output this spring, as I knew this would become a rather lengthy task. But it was obvious that I won’t resist the temptation during the cold and dark winter months. So I added the first code to generate spoken turn instructions recently. It is an ugly hack, just to determine what was possible. The code as found in the »ui-ng« repo works, well, at least a bit. The instructions are far away from being useful, except for lonesome crossings or roundabouts without other crossings nearby.

I’m using prerecorded samples for now, as the current open source speech synthesis systems did not show the results I desired. OTOH, this means I cannot generate instructions with dynamic contents like distances or street names. Additionally, those samples currently are included in the application binary, which will pollute the memory of handset devices as soon I’ll add further localized samples (currently speech is available in german and english language only).

The main issue is that MoNav recalculates the route each time a GPS position update is received – which means each second on the N900. As the GPS position always is a bit inaccurate, MoNav will speak »Please turn right« when you pass, for example, a gas station and the GPS position is off the road by a couple of meters.

As a consequence, I’ve just written some code that detects whether the vehicle is still “on track”. The route only gets recalculated in case the vehicle left it for more than x meters. This makes the route much more persistent and will cause far less headache than the previous approach. As this happened just a couple of minutes ago, the code is not in the repo yet.

There’s still a lot of work to do, and I’m curious whether I’ll stay motivated enough to do all of the work ahead. On the other hand it’s just plain fun to explore the secrets of a usable routing application.

So stay tuned. The next week(s) will show whether I’ll »head straightforward« :) .

2 thoughts on “When MoNav learned to speak

  1. Mark

    Hi,

    I’ve found that you can get free and/or open voices for the open source speech synthesis offerings that are of reasonable quality, but you need to look around (I followed a guide at http://ubuntuforums.org/showthread.php?t=677277 with some success). Also some platforms offer this built in (e.g. Android).

    The ability to speak street names and other dynamic instructions is worth the trouble of interpreting poor voices, or obtaining better voices in my opinion. I’ve always enjoyed this feature in Navit (which relies on running a command to provide speech synthesis rather than using an API call). In terms of routing Monav is far superior than Navit, and monav’s address search tends to be better than Navit as well. But sometimes the voice guidance is particularly handy.

    Is it possible to write the code so that it can generate text phrases, that are also associated with a sample name? This way each phrase could be handled by either a voice synthesis plugin, or a sample playback plugin. I’m thinking a phrase like “Turn right onto George Street in 200 meters” would be associated with the sample “turn_right_soon.wav”. The instruction logic would generate a structure with both the text and the sample and hand it to the voice guidance plugin. You could also include a “loose match” sample, and a “strict match” sample, and this would allow the engine to use a pre-recorded sample if there was an exact match, or fall back to speech synthesis where no sample exists.

    In any case it’s good to see the development of monav is ongoing.

    Regards,
    Mark

  2. ce Post author

    During the weekend, I had the occasion to see three different commercial navigation devices in action. What I’ve learned is that distance and name instructions are not »nice to have« but »must have« features. OTOH I’d be interested in a speech system which I can link statically into monav so it works “right out of the box” on all supported platforms. I will dive deeper into it as soon as I have solved the most basic issues. The use of samples as a fallback is a nice suggestion.

    Thanks for the encouraging feedback, Mark :) .

Comments are closed.