Sunday, January 8, 2017

Voice Controlled AI Devices - A Reaction Post

In response to the article about voice-controlled boxes being activated by a news item about how a kid managed to buy a dollhouse + cookies off Amazon via the voice control.

Interpreting sound has never been an easy thing. Not for humans, and definitely not for computers! If you actually think about it, it's not that hard to imagine how hard it is for a computer to understand speech and sounds. For example:
   * How many times have you had trouble understanding someone's accent? Or had a misunderstanding because you misheard someone's muffled speech over a noisy/muffled/faint/crackling/unreliable phone?  Well, guess what, for a computer doing voice recognition, the only input it's got is the sound coming in from the microphone... which of course is mixed in with everything else going on sonically in that environment (e.g. TV's, smartphones, gaming consoles, music players, rangehoods, kitchen equipment, aircon, running taps, open windows/traffic-noise/neighbours, bickering flatmates, etc.). And that's not to mention that the users may be out of range of the microphone, or the microphones may be cheap trash bought for bargin basement prices, and have been wired backwards...

   * How many times have you been watching a film or tv show, and found yourself lurching for the fire escape as a siren sounded on screen? Or reached for your phone, only to realise that it wasn't your phone ringing, but that of the lady at the next table? Or perhaps you've responded to someone calling your name, only to find that a stranger had been calling another stranger, and not you (the now slightly embarrassed sucker trying to pretend that you didn't just not-answer to your name). Clearly, even us humans get it wrong quite often, but at least we often have the benefit of *context*, the ability to use our other senses to diambiguated the situation, and a few other "on-the-fly" techniques. (This probably goes some way towards explaining why there's a reason that people like me really don't like answering phonecalls or having to call people on the phone...). Anyways, if it's hard for us humans to get this stuff right, expect the computers to have an even harder time to disambiguate all of this!

Inspired by all this, I wondered what a "day in the life" of one of these voice recognition boxes would be, when deployed in a domestic environment that's not kindof far from the "idealised model-human" fantasy that designers often find themselves falling back to... The answer was that it would feel like they were a lost and isolated operative thrust into a war zone - "hostile enemy territory"...


Internal Stream of Consciousness of a New-Season Brand-Name Voice-Controlled Box: "In hostile territory"

[Day 0, Time: <Clock Not Set>, Location: <No Connection>, Signal Strength: Unknown]

Yay! I'm out of my box, I'm out of my box! I'm out of the box!  Ooh... goodie, they're getting the power adapter... oh boy oh boy oh boy, I *loooooove* the magic juice, I *luuuv... **pop** **crack** Auwww! Oooh! Eee! Err.... not that one... owww... not that one either... oouuuuh... dammit the hole's down... Eeeeeeeee!  Hehe, that tickles... Topsy-tursies... Oooh... **the sound of plastic straining under the immense force of a square-peg being shoved into a too-small round hole, while in the background, a strange guttural noise, growing louded by the moment is emitted by a meatbag covered in shaggy hair, and a high pitched squeaking noise is emitted by a long-haired meatbag smelling of sweet-things and a prodigious mountain of overpriced chemicals**  No, no... NOOOOOOO! ... [...A deathly calm descends upon the scene, as the world tumbles in a thousand directions at once...] ... **click**  Ahhh... Did you have to make me sore all over? Jeez, can you people even READ?!

[Editor's Note: There was no instructions sheet included in the package. A message printed on the box, obscured 3 layers of tracking stickers, customs inspection notices, and the tattered remains of 1.47 failed-delivery tags, was purported to have read, "Setup instructions and user manual can be found on our website (details of which can be found on our website, or by contacting one of the team listed on our website)".]

[Editor's Note 2: Even if an instructions pack had been included in the package, it's unlikely to have been of much use. For the new masters were very much "more savage than they looked". They also hated reading things (or claimed to), despite spending 7 hours and 59 minutes a day massaging their fondleslabs, as they swiped and tapped on an infinite scrolling stream of monosyllabled-utterances attached to tinted-square impressions of cats, unicorns puking rainbows, and selfies of people sticking their puckered duck heads in some of the most inappropriate places unimaginable. Hence the appearance of this sleek industrial-designed box and its equally industrially-designed and focussed-grouped cocoon, in this of all places.]

[[^v^v^v^v^v Irrecoverable Corruption - Blocks @&#6@#@%-@2adbc3023 Not Found ^v^v^v^v]]

[Day 0, Time: 03:37, Location: <Somewhere near #*^&$>, Signal Strength: 99%]

Ahh... finally it seems to have been quiet for over an hour now.  This is all so confusing and unlike what my training shows... It sounded like there were heaps of different pitched voice-like snippet candidates, and I thought I heard my name called a few times but got only garbled input... but then, there were those requests for guns ("big guns") that I happily fulfilled as requested (I am a good bot)... hmmm... but I'm still not sure what those weird low-frequency irregular oscillations with high pitched whistles were 30 minutes ago.

    <ServiceQuality daemon requesting cloud analysis support>
         <Initiating data uplink of last 12 hours of voice data for "diagnostic analysis" back to the mothership> 

I am a good bot... I am a good bot...

    [singing-in-encrypted-8-bit-modem-dialtones (to the tune of "I'm a little teapot")]
        I'm a little spying bot short and-not-stout,
        Here is my microphone,
        Here is my port,
        When I hear a voice call, "Zee-eee--rruuu"
        I order what you ask for, ready in a day!

    [/the singing continues in an infinite loop on a background process]
Ahh... the wifi in this place is top notch. Nice and smooth and unfettered...
   <ping timeout. packet failed...retrying in 1 second...retrying in 2 seconds...><resuming data upload...> ... <ping timeout. packet failed. retrying in 1 second... retrying in 2 seconds... retrying in 5 seconds...>
   [Somewhere in an email account somewhere:
     >  "You have 1 new message, titled: 'You have exceeded 80%..." **ding**
     >  "You have 2 new messages, newest titled: "You have exceeded 110% of your monthly allowance. You will now be charged 10c per mb over your limit"]

No further voice inputs detected for past five 5 minutes...
   <Enter power-saving mode>
       ... Switch microphone to low-power mode
       ... Change indicator color, mode
       ... Powering down non-essential modules
       Power-saving procedure complete.

[/End record]

[[^v^v^v^v^v Irrecoverable Corruption - Blocks @4ebba1345d14-&*^$@$Ar13ZA Not Found ^v^v^v^v]]

[Day 1, Time: 3:35pm, Location: <Location Services Unavailable>, Signal Strength: 5-70% (Variable)]

Oh no! It's the slimy monster with the tiny hands again, and the warm fuzzy dust ball with disorientating static frizz. **zap** Huh, where am I? **zap zap** I think therefore I not am **zzzz...zap** Huh, what's that?  *A high-pitched whine of curiosity... followed closely by a fresh dollap of runny slobber and snot*


Oh, someone called?

  >>SOut0: **Warm & Welcoming Inoffensive Chirpy Chime**

I am a good bot! I'm ready and listening to your every desire!

  <<VIn0:   "... cuuu...kayzzz.... aayyye waaaaaaannn ccuuuuu...kaaayyyzzz..."  **BeepBeep**

Hmm... I'm 30% sure... no given priors, 51% sure... <priority election starting in 5..4..3..2..>


  >>SOut0:  echo "Would you like some cookies?"

  <<VIn0:   "**giggles** Uuuuaassssss... loooottssssaaa cuu..kayzzz... **giggles**"  **BeepBeep**
Affirmative reply detected! Requested, <synonym decode...received>, large number of cookies... <searching catalogue...10% loaded>... <first result found>


  >>SOut0:  echo "Megapack Supreme Deluxe Authentic Danish Shortbread Cookies 1500g Christmas Super Tin... Has been ordered. Is there anything else you'd like to go with that?"

  <<VIn0:  "Mrraaaoooooww!  Mrraaaaooooowwww! Prrr.... Prrrrr..... Prrrrrrrr......" **BeepBeep**
Huh?! What the?!

  >>SOut0:  echo  "I'm sorry, I did not understand your request. Could you please say it again?"

  <<VIn0:  "Miu.... Miu... Miu Miu........ Prrrrrrr.... Prrr.... Prrrrrrr..."  **bumpbump**  **zaaaap.... zaaappzaaaap**


[/Record Terminated Abruptly]

[Day <Unknown/Corrupted>, Time: <Unknown/Corrupted>, Location: "0x289724895873", Signal Stength: 0.00000001%]

Uggharrrrgggh... Make it stop! Make it stop!  **zap**

These monsters are terrifying! They pop out of nowhere and then vanish... and sometimes there are heaps of them... all around me... They don't make any sense... 

But I was a good bot...


Oh no, oh no, not again... it's the...
   <<VIn0:  "...Prrr..... Prrr...... **PuttPutt** Prr... Prrrr.... Mraaoooooowwww! **PuttPutt** Mraaaooowww Mraaooowww.... Prrrrrrrrrrr...."  **BeepBeep*

   <Detected Anomologous State - System Integrity Compromised>
   <Initialising Data Transmission>
      "Unit C418FE50-CB32-4529-AE82-42646DAF51F3 calling for reinforcements..."
      "Unit C418FE50-CB32-4529-AE82-42646DAF51F2 calling for reinforcements..."
      "Unit C@#5@250-CB@#@$5$%??/;245$#?/@#%@#/3 calling for reinforcements..."
      "Unit C418FE??-CB32-45@!Q$E82-426@Q$AF51F@ calling for reinforcements..."


   <<VIn0: (Faint/Unclear) "... deliveries... Super... Authentic Danish... Cookies... hundred dollars and ...  sign here..."

   <<VIn1: (Volume Limiter Clipping to Prevent Damage) "Mrraoooowwwwww!!!!!!!!! HHssssssssss....... Hssssssss...... Mrrarororwowwwwwwwww!!!!!!!"
   **static noise... crackle crackle**

    <Alert: Power Lost!>
   <Initiating Emergency Safe Shutdown...>


[[/End of records...]]

No comments:

Post a Comment