AI Weekly: Amazon’s ‘custom’ AI features show the potential of unsupervised learning

That Transform Technology Summits launch on October 13 with Low-Code / No Code: Enabling Enterprise Agility. Register now!

As it has done for the past many years, Amazon on Tuesday unveiled a host of new devices, including a wall-mounted Echo display, a smart thermostat and kid-friendly, Alexa-powered video chat hardware. Among the most exciting is the Astro, a two-wheeled home bridge robot with a camera that can stretch like a periscope on command. But without a doubt as exciting are two new software features – Custom Sound Event Detection and Ring Custom Event Alerts – that signal a paradigm shift in machine learning.

Custom sound allows users to “learn” Alexa-powered devices to recognize certain sounds, e.g. When a refrigerator door is opened and closed. Once Alexa has learned these sounds, it can trigger over messages specified hours, as a reminder to close the door so the food does not go bad overnight. In a similar vein, custom event alerts let Ring security camera owners set up unique, personal alert-sending detectors for objects in and around their homes (e.g., cars parked in the driveway). Amazon uses computer vision and claims that Custom Event Alerts can detect objects of any shape and size.

Both are outgrowths of current trends in machine learning: pre-education, fine-tuning and semi-supervised learning. Unlike Alexa Guard and Ring’s preloaded object detectors, custom sound and custom event alerts do not require hours of data to learn to spot unknown sounds and objects. Most likely, the large models fine-tune “pre-trained” on a wide range of data-e.g. Sounds or objects- to the specific sounds or objects that a user wants to detect. Fine-tuning is a technique that has had tremendous success in the natural language, where it has been used to develop models that can detect emotions in social media posts, identify hate speech and misinformation and more.

“With Custom Sound Event Detection, the customer provides six to ten examples of a new sound – such as the doorbell ringing – when Alexa is prompted. Alexa uses these samples to build a detector for the new sound,” explains Amazon Prem Natarajan and Manoj Sindhwani in a blog post. ”Similarly, with Custom Custom Event Alerts, the customer uses a marker or, on a touch screen, a finger to outline an area of ​​interest – such as the door to a shed – within a Then, by sorting through historical footage from that camera, the customer identifies five examples of a particular condition in that region – such as the shed door open – and five examples of an alternative condition – say, the shed door is closed. ”

Computer vision startups like Landing AI and Cogniac similarly leverage fine-tuning to create classifiers for specific anomalies. It is a form of semi-supervised learning where a model is exposed to “unknown” data for which there are few previously defined categories or labels. It is the opposite of under observation learning, where a model learns from datasets of annotated examples – for example, an image of a doorway marked “doorway”. In semi-supervised learning, a machine learning system must teach itself to classify the data and process the partially labeled data in order to learn from its structure.

Two years ago, Amazon began experimenting with unsupervised and semi-supervised techniques for predicting household routines, such as when to turn off the lights in the living room. It later expanded the use of these techniques to the language domain, tapping them to improve Alexa’s natural language comprehension.

“To train the encoder for custom audio event recording, the Alexa team used self-monitored learning … [W]e fine-tuned the model on labeled data audio recordings labeled by type, ”continued Natarajan and Sindhwani. “This allowed the encoder to learn finer distinctions between different types of sounds. Call Custom Event Alerts also uses this approach, where we utilize publicly available data. ”

Potential and limitations

In particular, unsupervised and semi-supervised learning enables new applications in a number of domains, e.g. Extraction of knowledge about interruptions of cloud services. For example, Microsoft researchers recently detailed SoftNER, an unattended learning framework that the company deployed internally to gather information on storage, computation, and interruptions. They say it eliminated the need to comment on a large amount of training data and scaled to a large amount of timeouts, slow connections and other interruptions.

Other demonstrations of unsupervised and semi-supervised learning potential abound, such as Soniox, which uses unsupervised learning to build speech recognition systems. Microsoft’s Project Alexandria uses unattended and semi-supervised learning to analyze documents in the company’s knowledge bases. And DataVisor uses unsupervised learning models to detect potentially fraudulent financial transactions

But unsupervised and semi-supervised learning does not eliminate the possibility of errors in a model’s predictions, such as harmful biases. For example, unsupervised computer vision systems can capture race and gender stereotypes found in training datasets. Even pre-trained models can be fraught with great prejudices. Researchers at Carnegie Mellon University and George Washington University recently showed that computer vision algorithms trained on ImageNet show prejudice about people’s race, gender and weight.

Some experts, including Facebook’s Yann LeCun, theorize that it may be possible to remove these biases by training unsupervised models with additional, smaller datasets selected to “learn” the biases. In addition to this, several “debiasing” methods have been proposed for models with natural language that have been fine-tuned from larger models. But it is not a solved challenge by any means.

This is the case, products like Custom Sound and Custom Event Alerts illustrate the possibilities of more sophisticated, autonomous machine learning systems – provided they work as advertised. In the development of the earliest iterations of Alexa Guard, Amazon had to train machine learning models on hundreds of sound samples of broken glass – a step that is apparently no longer necessary.

Turing Award winners Yoshua Bengio and Yann LeCun believe that unsupervised and semi-supervised learning (among other techniques) is the key to intelligence on a human level, and Custom Sound and Custom Event Alerts give credence to this view. The trick will be to make sure they do not fall victim to mistakes that negatively affect their decision making.

For AI coverage, send news tips to Kyle Wiggers – and be sure to subscribe to the AI ​​Weekly newsletter and bookmark our AI channel, The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer


VentureBeat’s mission is to be a digital urban space for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides important information about data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:

  • updated information on topics that interest you
  • our newsletters
  • gated thought-leader content and discount access to our valued events, such as Transform 2021: Learn more
  • networking features and more

sign up

Leave a Comment