5. Multi-modal interfaces are the future of interaction
My final insight from the summit was the clear and growing prominence of multi-modal interfaces in AI.
The ongoing cycle of smarter, faster, cheaper AI models has enabled interfaces that integrate various forms of input and output, including audio, video, text, and images, creating a more seamless and efficient user experience. The ability to handle multiple modalities through single API calls to AI systems is also reducing the latency of applications and interactions, enhancing overall performance. ChatGPT 4o is a perfect example of this, mixing textual and voice inputs and outputs in near real time.
Imagine an assistant capable of understanding and responding to voice queries, analyzing images or video for context, and providing textual summaries - all within a single interaction. This level of sophistication not only improves user engagement but also broadens the applicability of AI across diverse sectors, from healthcare to customer service and beyond.
We were fortunate enough to see a fantastic demonstration of this multi-modal capability when record producer Fernando Garibay and singer-songwriter Daniel Bedingfield created several original pieces of music live on stage, based purely on stories and inputs from the audience!
This capability will continue to improve rapidly, and will significantly disrupt the music industry, which has already seen machine learning provide an explosion in personalization over the past 10 years.
Which other industries will see a similar impact, I wonder?