SORA IS HERE: It is an A.I. Rich Media Game Changer

Reading Time: 4 minutes

Now the playbook is we build AI tools to go find these fake accounts, find coordinated networks of inauthentic activity, and take them down; we make it much harder for anyone to advertise in ways that they shouldn’t be.
Mark Zuckerberg

On this date (February 16, 2024), OpenAI launched Sora, its latest artificial intelligence tool. Sora is a new tool to create rich media elements with Text-to-Video and Text-to-Animation. In simple terms, Sora is an AI model that can create realistic and imaginative scenes from text instructions.

Sora, is capable of generating a minute of high fidelity video. The results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world. Time will tell, but this initial effort is definitely impressive to see.

OpenAI is teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

Sora, is the latest and greatest text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s text prompt. What that means is that it can create realistic and imaginative scenes from text instructions. The model can also generate a video based on a still image, as well as fill in missing frames on an existing video or extend it.

Sora is currently only available to “red teamers” who are assessing the model for potential harms and risks. OpenAI is also offering access to some visual artists, designers, and filmmakers to get feedback. It notes that the existing model might not accurately simulate the physics of a complex scene and may not properly interpret certain instances of cause and effect. Sora is not publicly available yet, but the implications of technology this powerful precede its expected public launch at some future date. OpenAI stated that it had no plans to release Sora to the public. Concerned about Sora’s potential of being misused, they provided limited access to a small red team that included academics and researchers. Sora-generated videos are tagged with C2PA Metadata to indicate that they were AI-generated.

It can generate videos based on short descriptive prompts as well as extend existing videos forwards or backwards in time. It can generate videos with resolution up to 1920×1080 or 1080×1920. The maximal length of generated videos is unknown even though one minute seems to be cited in today’s media reports. It is anticipated that much longer and convoluted clips will eventually be possible.

The team that developed Sora named it after the Japanese word for sky to signify its “limitless creative potential”.

Source: Wall Street Journal. February 16, 2024

The “technical report” contains essentially no technical information, other than that Sora is a denoising diffusion in latent space with (at least) one Transformer as denoiser. This design is standard for diffusion image generators like Stable Diffusion (except the Stable Diffusion uses an U-net instead of Transformer). A video is generated in latent space by denoising 3D “patches” (2D of space and 1D of time), then transformed to standard space by a video decompressor. Re-captioning is used during training to create good captions on videos that do not have good captions.

So, as of today’s release, the video clips look very impressive, but may be heavily curated to demonstrate the tool in the best light. So, how sustainable will it be for anyone to generate similar results?

And, if and when it is in general release, what are the inherent risks for abuse? The potential to create disinformation online is great. Will the Metadata tags flagging the video clips as AI generated help to ensure that fake propaganda is not distributed as truth? Already fake AI photos are everywhere exploiting celebrities such as Taylor Swift and Amelia Clarke to the max. Will these fake video further amplify that misinformation bandwagon? And, what about election fraud? Can these fake videos be used against political opponents too? Yes, there is a lot to worry about with these new AI tools. So, OpenAI must get it right before they release Sora to the wilds of the internet.

About the Author:

Michael Martin is the Vice President of Technology with Metercor Inc., a Smart Meter, IoT, and Smart City systems integrator based in Canada. He has more than 40 years of experience in systems design for applications that use broadband networks, optical fibre, wireless, and digital communications technologies. He is a business and technology consultant. He was a senior executive consultant for 15 years with IBM, where he worked in the GBS Global Center of Competency for Energy and Utilities and the GTS Global Center of Excellence for Energy and Utilities. He is a founding partner and President of MICAN Communications and before that was President of Comlink Systems Limited and Ensat Broadcast Services, Inc., both divisions of Cygnal Technologies Corporation (CYN: TSX). Martin served on the Board of Directors for TeraGo Inc (TGO: TSX) and on the Board of Directors for Avante Logixx Inc. (XX: TSX.V). He has served as a Member, SCC ISO-IEC JTC 1/SC-41 – Internet of Things and related technologies, ISO – International Organization for Standardization, and as a member of the NIST SP 500-325 Fog Computing Conceptual Model, National Institute of Standards and Technology. He served on the Board of Governors of the University of Ontario Institute of Technology (UOIT) [now OntarioTech University] and on the Board of Advisers of five different Colleges in Ontario. For 16 years he served on the Board of the Society of Motion Picture and Television Engineers (SMPTE), Toronto Section. He holds three master’s degrees, in business (MBA), communication (MA), and education (MEd). As well, he has three undergraduate diplomas and five certifications in business, computer programming, internetworking, project management, media, photography, and communication technology. He has completed over 30 next generation MOOC continuous education in IoT, Cloud, AI and Cognitive systems, Blockchain, Agile, Big Data, Design Thinking, Security, Indigenous Canada awareness, and more.

Vividcomm

Advanced Technology in Action

Leave a ReplyCancel reply

About the Author:

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Vividcomm