A few years ago Sony created the first image sensor with on-chip RAM, which enabled slow-motion video capture at very high frame rates. Now the company has created the first sensor with on-board AI hardware, which can do high-speed image recognition.

The Intelligent Vision Sensor, IMX500, has a stacked design, consisting of a pixel chip (1/2.3”, 12.3MP resolution, 1.55µm pixels) and a logic chip that has a Sony-developed DSP and memory for the AI models.

The sensor can capture 4K video at 60fps, but more importantly it needs only 3.1 milliseconds to analyze the image (using the MobileNet V1 model from Google). It doesn’t even need to output images at all, it can just send out metadata to be processed.

Sony's Intelligent Vision Sensor is the first to have AI processing hardware on board

As Sony says, this is a boon for privacy as the image data never leaves the chip (and it certainly doesn’t need to be sent over the Internet to the cloud). Also, since the AI models can be configured, the same hardware can be used for various tasks.

A simple example is putting the IMX500 in a car and pointing it at the driver. If the sensor detects the driver is distracted or asleep, it can send a warning to the car. Even though an external processor can do the same task, using just the image sensor is simpler (and thus more reliable) and cheaper.

Stores can use the Intelligent Vision Sensor for many tasks. For example, one could be at the door, counting how many people went in. Another can keep an eye on shelves, sending a notification when stock is running low. Yet another can determine the areas where most people go and the products they pick up.

The sensor has possible applications at the cash registerThe sensor has possible applications at the cash register

Sony's Intelligent Vision Sensor is the first to have AI processing hardware on board

The IMX500 is unlikely to find its way in smartphones. But it can enter your home as part of a smart speaker – e.g. it can see who is asking the question, which will help the system provide a more relevant answer. And, again, privacy concerns are lowered since the image sensor can output just “Tom is speaking” instead of capturing a photo and sending out for processing.

It’s not just privacy either, a design using this sensor will have much lower latency, will need very little bandwidth and it reduces power usage to boot.

3.1 milliseconds to analyze an image is much faster than traditional approaches, enabling applications that need lightning fast reaction times (e.g. industrial robots). Also, metadata is tiny compared to the 4K/60fps footage it is based on, so hundreds of cameras can share a relatively slow data connection.

The IMX500 (just the sensor) and IMX501 (the sensor in an LGA package) will be sampled to companies soon and Sony thinks the first products to use them will come out next year. For now, these two cost JPY 10,000 and JPY 20,000, respectively (that’s $94 and $187).

Source | Via

Source Article