I tried out OpenAI's Sora, but I'm struggling with its learning curve

Ryan Haines / Android Authority

After months of waiting, it finally happened — OpenAI launched its video generator, Sora. Or, at least, it opened up access to the tool, only for the entire internet to jump on board simultaneously, forcing OpenAI to pump the brakes on account creation. Thanks to a little bit of patience and determination, I’ve made my way through the waitlist, and now I have the power to generate just about anything I can think up — within some well-defined limits.

With that great power and responsibility has come something else, though: a great learning curve. Even though I’m enjoying Sora and am impressed by its capabilities, I’m having trouble nailing down the perfect prompts to get videos I’m pleased with. I’m sure it’s just a matter of practice, but here’s how my first few days with Sora have gone.

Have you used OpenAI’s Sora to generate videos yet?

34 votes

Video creation at your fingertips?

Ryan Haines / Android Authority

First, let’s talk about how Sora works — or at least how to access the powerful video generation tool. Although it comes from OpenAI, and you need to be a ChatGPT Plus or Pro member to start creating, you can’t get to Sora through the main ChatGPT interface. Instead, you have to head directly to the Sora website (sora.com), where you’re met with a gallery of Featured clips that set the bar incredibly high.

At least, they set the bar high in my own head. I scrolled through a few of them, looked at their prompts, watched them run smoothly, and figured I could do the same. After all, my prompts would be run through the same adaptation of DALL-E 3 that theirs had been, so they should look just as good, right? It’s not quite that easy. Sure, typing in prompts is quite easy, but figuring out what Sora responds best to is a bit harder.

Creating videos is as easy as typing out what you want to see… or at least it seems that way on paper.

Before we get to the challenges, though, I should probably clarify some of Sora’s current limitations. Unlike Google’s Pixel Studio or another basic image generator, you can’t simply sit and run Sora to your heart’s content — at least not as a ChatGPT Plus member for $20 per month.

Instead, you’re given a bank of 1,000 credits, which you can spend on video generation as you see fit. Everything you change within your prompt, from the aspect ratio to the duration to the resolution, will cost a certain number of those credits until you run out for the month. Brand new videos cost anywhere from 20 credits to 2,000 credits, and you can check out a helpful table of costs here — something I wish I’d found before I spent 260 credits in about 20 minutes. You’re also limited to one video generation at a time and a maximum resolution of 720p as a Plus member.

If you spring for a ChatGPT Pro membership, the limits are much looser but the price is much, much higher at $200 per month. Instead of 1,000 credits, you get 10,000 credits for priority videos, after which you get unlimited video generations; they just take a bit longer — OpenAI calls them “relaxed videos.” Pro members can also generate five videos at a time, bump them up to 1080p, and let them run as long as 20 seconds.

Unfortunately, though, no matter what tier of ChatGPT you pay for, none of Sora’s videos have audio to them, so you’ll have to download your clips and sync music or sound effects after you’ve nailed down the visuals. OpenAI has suggested that support for audio will reach Sora eventually, it’s just not there yet.

How hard could it be?

With that basic introduction out of the way, the rest of using Sora to generate videos should be easy, right? Well, yes and no. Although typing in your prompt, choosing your settings from the menu at the bottom, and waiting for your video to generate is that easy, it’s much harder to come up with something worthy of Sora’s ever-changing Featured feed.

In an attempt to share my limited cache of tokens for the month, as soon as I got access to Sora, I reached out to my colleague Mitja. He and I had been discussing how quickly we might get access to the platform, so I figured he might have some good ideas for generations right off the bat. As it turned out, his first thought was something I never could have imagined: Ten zebras in suits dancing to a Michael Jackson song in front of the Sydney Opera House while eating pesto ravioli. It may seem like a weird video to make, but if Sora can handle that amount of detail, then it’s definitely the real deal.

Sora will take a stab at almost anything you ask for, but you have to describe it just right.

Once I finished laughing at the idea, I ran it through Sora and waited for the result. Technically, the final product got most things right. It put a group of zebras in suits in front of the Sydney Opera House, and they all had green plates in their hands. However, the number fluctuated between eight and about 12 zebras, there was no indication that it was a Michael Jackson song, and the pesto ravioli was definitely just a green plate — close, but not quite right. More worryingly, I had bumped the cost of the video up to 100 tokens because I hoped a ten-second clip would show more dancing. It had not.

I’ve since learned, however, that Sora’s Storyboard tool is a must-have for pretty much anything involving complex motion. It allows you to drag and drop clips along your five- or ten-second timeline, helping Sora break up the action and flow from one direction to another. So, in an attempt to draw a little bit more action out of my zebra friends, I jumped into the storyboard and split the dancing and the pesto ravioli into two separate actions spaced out over the five-second clip, then I used ChatGPT to punch up my description — yet another built-in feature of the Storyboard.

Once again… It kind of worked, but it kind of didn’t. Yes, I got the zebras, and they were in front of the Sydney Opera House, but they had given up on dancing, and when asked to eat some of their ravioli, they suddenly grew human hands to hold their forks. Sorry, Featured feed, but I think I’m a long way off.

I’ve also tried more natural prompts, like macaroni penguins sliding down icebergs into the sea, and more fantastical prompts, like a piece of toast with a Pixar-like face leaping out of a toaster, and the story has mostly been the same. Sora handles some pieces of each prompt incredibly well, but you have to describe your scene with just the right amount of detail. Too much, and Sora begins to merge different elements. Too little, and you get a relatively boring finished product.

And yet, somehow, there’s even more to Sora than I’ve touched, especially when it comes to editing. The video generator also packs the ability to re-cut, remix, and blend clips to expand on an idea, join one video to another, or clip out elements that don’t work well. But again, I’d still like to nail down a video that looks good the first time.

Challenges aside, I’m excited for the future

Ryan Haines / Android Authority

Overall, it’s fair to call my first few days using Sora a mixed bag. Has the video generator been perfect? No, but I can’t entirely blame OpenAI for that. This is my first shot at generating videos based purely on text, so I’m not surprised that I’ve struggled to nail down the right level of detail. So far, I’ve given Sora too much information, and I’ve given it too little, which means that nailing just the right prompt has to be around the corner.

More importantly, though, I’ve been thoroughly impressed by what Sora promises to do. The videos I can create as a ChatGPT Plus member take mere moments to conjure, and I imagine they’ll get faster as the model continues its training. I’m not entirely sure that I’d use any of the speedy clips that Sora has cooked up just yet — many of them still suffer from weird artifacts like the human arms appearing on my zebras — but the clips that do make it to Sora’s Featured collection give me hope that it’s just a matter of learning how to ask for the right elements.

I’m impressed by Sora, but I have a lot of learning to do.

Along with that, I won’t be surprised if the way OpenAI handles prompts and creations opens up, too. Right now, when you burn through your 1,000 credits as a ChatGPT Plus member, that’s it — there’s no way to buy a few more until your billing period rolls over. Likewise, there’s no way to roll unused credits from one month to the next, so you have to find the right balance of spending and saving to make it through the month.

If it were up to me, I’d sure like to reclaim a few of the sillier credits I’ve spent, but that’s not an option. Instead, I’ll call it the cost of learning, and I’ll just have to take a little bit more time to fine-tune my prompts before I send them off to Sora. Maybe one day, I’ll come up with something worth featuring.

Source link