Published 9/2024
MP4 | Video: h264, 1280×720 | Audio: AAC, 44.1 KHz, 2 Ch
Language: English | Duration: 4h 10m | Size: 947 MB
Sponsored by Mobily
While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we’ve also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, and even video. Multimodal AI models, such as GPT-4, are capable of working across different data formats, for example, to generate speech from text, text from images, or text from audio. By combining different modalities, multimodal AI can interact with humans in more natural, intuitive ways, mimicking how humans perceive and understand the world around them. The possibilities from processing inputs more holistically and providing more intuitive outputs are already nudging us closer to true artificial general intelligence.
What you’ll learn and how you can apply it
Design more natural, human-like interactions between AI systems and users by leveraging multimodal capabilities
Explore fundamental mathematical concepts like multimodal alignment and fusion, heterogeneous representation learning, and multistream temporal modeling
Review practical applications such as advanced voice assistants, smart home systems, and virtual shopping experiences
This live course is for you because…
You’re a current or future AI product owner or AI/machine learning practitioner.
You want to learn about the state of the art in artificial intelligence and how large language models can be leveraged to build new applications and solve your organizational challenges.