Jo Jaquinta is a software developer at TsaTsaTzu, a company dedicated to Amazon Alexa and Google Assistant development. At VoiceCon, he talks about his experience in developing audio games. We asked him in advance what works well for voice recognition in gaming – and what doesn’t.

JAXenter: At the VoiceCon you will talk about audio games that are connected to Alexa, Siri, or the Google Assistant. I have a hard time imagining how such a game would work, so could you please elaborate on how audio games function?

Jo Jaquinta: There are many kinds of audio games. The easiest kind of audio game to imagine is similar to an old text adventure game. You are given a description of a location and you respond with some sort of action. You then get an updated description and can then take further action. Examples would include Six Swords and Star Lanes.

Another kind would be a game where you have a certain situation that you manipulate via voice. Card games are good examples of these. We’ve done 21 Blackjack and Friday Night Poker card games, but it could also be more abstract. For example, our Mind Maze lets you solve mazes. You start with two-dimensional mazes, and when you are good with that, you can upgrade to three-dimensional mazes. And, because it is via voice and you have no visual limitation, you can even do four or five-dimensional mazes.

But the most pervasive audio game is the quiz. It’s easy to implement and there are lots of topics. So many, many, many people have created them.

JAXenter: Which things work best with voice aspect in games?

Jo Jaquinta: The concepts that work best with voice games have to do with the limitations of the current platforms. They all require an immediate response or they go dead. So games that are high pressure and require quick answers fit with this. Most people have trouble holding more than two or three things in their head without a visual aid. So games that involve a small number of manipulatives work well.

Lastly, most people don’t have a lot of patience for audio games. If you have a few hours to play a game, most people will choose something that is a full sensory experience. So short play, casual games have the best chance of gaining mainstream appeal.

JAXenter: Which things have proven to be too difficult to include in an audio game?

Jo Jaquinta: There is nothing that I cannot make into an audio game! We’ve done massively multiplayer games and real-time shooters. I even wrote up a high-level description of a Fortnite clone audio game. As with many things in voice, the questions is not so much of what could be done, but what should be done.

And that comes down to design. Most discussion focuses on technical limitations. We’ve invented some very remarkable ways to push the technology as far as it can go in TsaTsaTzu, but that’s nothing without a svelte, easy-to-learn, easy-to-remember interface. What I’ve seen fail hardest is when people to try to create a “human-like” interface that you can “just talk to”. Although this is the “holy grail” among observers of the audio space, no one yet has succeeded in this. We just do not have the tools and technology to make this successful. All but the most trivial fall short in embarrassing ways. When a user can “say anything” they have no idea what to say. It’s just bad design. But if you give them a simple, constrained vocabulary, then they have the tools to interact with your application effectively.

JAXenter: Can you share with our readers which video game you love the most and why?

Jo Jaquinta: I’m going to have to say “nethack”, (which you may have to look up). It’s been in continuous development for 30+ years. There are no graphics. There are no special effects. But it has a richness absent from almost every other game I have seen. You are given a fairly modest number of options you can do, but they can combine in a multitude of ways. And almost every combination that makes sense, does something. You can engrave on the ground with a wand. You can dip a sword into a potion. You can probably even eat a scroll! Having this huge space of weird combinations that work makes it like continually discovering Easter eggs. Even after 30 years, I’m still finding new things each time I play. That’s hard to beat.

JAXenter: What was the worst game you ever had to play?

Jo Jaquinta: I’m going to have to go with the classic video game “Defender”. Not because it was a bad game. But because what you had to do is fly around and save people. But, each level got harder, and so your people would die! No matter how good I was, people would die! The people I was supposed to save! It was too much for me. I have a sensitive soul.

JAXenter: What is the key take-away from your session?

Jo Jaquinta: Our presentation is a litany of everything we’ve done that hasn’t worked. We’re not trying to discourage people from doing things. We’re just trying to discourage people from doing things that will fail.

Success in the audio space is elusive and has evaded almost everyone who has tried to be active here. Most people enter just based on hope. But hope won’t carry you through. Do the research. Look hard at what hasn’t worked, and think hard about what makes your idea different. There are lots of portfolio pieces in voice, but very few repeatable successes. The biggest game in voice is how to do voice itself. If you want to get the leaderboard there, you are going to do a better job than what’s come before.

JAXenter: Thank you very much!