Jump to content

Nvidia Jarvis—a multi-modal AI SDK—fuses speech, vision, and other sensors into one system


zanderthunder

Recommended Posts

Today, at the 5G Mobile World Conference, Nvidia co-founder and CEO Jensen Huang, announced Nvidia Jarvis, a multi-modal AI software development kit, that combines speech, vision, and other sensors in one AI system.

 

Quote

NVIDIA Jarvis is an SDK for building and deploying AI applications that fuse vision, speech and other sensors. It offers a complete workflow to build, train and deploy GPU-accelerated AI systems that can use visual cues such as gestures and gaze along with speech in context.

 

Here's a YouTube video of the presentation:

 

 

 

As stated before, Jarvis is the company's attempt to process multiple inputs from different sensors simultaneously. The wisdom behind this approach is that it will help build context for accurately predicting and generating responses in conversation-based AI applications. To preface this, Nvidia exemplified situations where this might help on its blog post:

 

Quote

...lip movement can be fused with speech input to identify the active speaker. Gaze can be used to understand if the speaker is engaging the AI agent or other people in the scene. Such multi-modal fusion enables simultaneous multi-user, multi-context conversations with the AI agent that need deeper understanding of the context.

 

In Jarvis, Nvidia has included modules that can be tweaked according to the user's requirements. For vision, Jarvis has modules for person detection and tracking, detection of gestures, lip activity, gaze, and body pose. While for speech, the system has sentiment analysis, dialog modeling, domain and intent, and entity classification. For integration into the system, fusion algorithms have been employed to synchronize the working of these models.

 

1571727860_jarvis-banner-tw-li-2048x1024

 

Moreover, the firm claims that Jarvis-based applications work best when used in conjunction with Nvidia Neural Modules (NeMo), which is a framework-agnostic toolkit for creating AI applications built around neural modules. For cloud-based applications, services developed using Jarvis can be deployed using the EGX platform, which Nvidia is touting as the world's first edge supercomputer. For edge and Internet of Things use cases, Jarvis runs on the Nvidia EGX stack, which is compatible with a large swath of Kubernetes infrastructure available today.

 

Jarvis is now open for early access. If you are interested, you can log in to your Nvidia account and sign up for early access to it here.

 

Source: Nvidia Jarvis—a multi-modal AI SDK—fuses speech, vision, and other sensors into one system (via Neowin)

  • Like 2
Link to post
Share on other sites
  • Replies 0
  • Created
  • Last Reply

Top Posters In This Topic

  • zanderthunder

    1

Popular Days

Top Posters In This Topic

Popular Posts

Today, at the 5G Mobile World Conference, Nvidia co-founder and CEO Jensen Huang, announced Nvidia Jarvis, a multi-modal AI software development kit, that combines speech, vision, and other sensors in

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...