Regionalgruppe München

Titel der Veranstaltung
Next Generation Multimedia Fingerprints: Better Recognition and Lower Computational Cost through Artificial Intelligence
Ort und Zeitpunkt der Veranstaltung
Ort der Veranstaltung: 

Institut für Rundfunktechnik

Floriansmühlstraße 60

80939 München




23. September 2019 - 15:30
Dr. Kunio Kashino, Senior Distinguished Researcher at NTT’s Communication Science Laboratories, Japan
Über die Veranstaltung

Vortragssprache: Englisch

AI applications for the media industry have been introduced by Internet giants, niche players, and open source projects. Truth is, few of these applications provide perfect solutions out of the box. They typically require extensive training for each human-defined target or object. Then, when definitions or tasks change, the AI technology has to be re-trained and re-applied to the whole audio/video library in order to be used for effective media search.

Dr. Kunio Kashino, a Senior Distinguished Researcher at NTT’s Communication Science Laboratories, will present a radically different approach. For over 20 years he and his colleagues in Japan have been developing successive generations of digital fingerprinting technologies, to the point where today NTT’s Robust Media Search (RMS) has been successfully tested to be superior against competitive products. Now, Dr. Kashino and his colleagues have developed RMS+, a new generation of robust media search technologies based on their own AI algorithms. Unlike traditional digital fingerprints or existing AI technologies, RMS+ combines two different types of media content descriptors – media-dependent and media-independent representations. This novel architecture will allow owners of large media libraries to greatly reduce computationally expensive AI re-analysis of audio/video archives. RMS+ also enables unsupervised learning of search targets or objects to further reduce labor-intensive human metadata tagging.

Dr. Kashino’s presentation will review the evolution of NTT’s digital fingerprinting research and introduce the new RMS+ technology, including laboratory demonstrations of its core functions. For example, RMS+ media-independent representations enable audio-to-description conversion that supports media search using natural-language queries such as “find a loud crash that sounds like a car accident.” Alternatively, RMS+ unsupervised concept learning function can find images containing “Christmas scenes with people skating” without relying on human-generated metadata. While these early RMS+ demonstrations can only hint at its full capabilities, NTT’s new approach could well revolutionize AI strategies for description, search and management of ever-expanding audio/video archives.