Taking too long? Close loading screen.
Connect with us

Science

WellSaid Labs research takes synthetic speech from seconds-long clips to hours

Published

on

Millions of homes have voice-enabled devices, but when was the last time you heard a piece of synthesized speech longer than a handful of seconds? WellSaid Labs has pushed the field ahead with a voice engine that can easily and quickly generate hours of voice content that sounds just as good or better than the snippets we hear every day.

The company has been working since its public debut last year to advance its tech from impressive demo to commercial product, and in the process found a lucrative niche that it can build from.

CTO Michael Petrochuk explained that early on, the company had essentially based its technology on prior research — Google’s Tacotron project, which established a new standard for realism in artificial speech.

“Despite being released two years ago, Tacotron 2 is still state of the art. But it has a couple issues,” explained Petrochuk. “One, it’s not fast — it takes 3 minutes to produce 1 second of audio. And it’s built to model 15 seconds of audio. Imagine that in a workflow where you’re generating 10 minutes of content — it’s orders of magnitude off where we want to be.”

WellSaid completely rebuilt their model with a focus on speed, quality, and length, which sounds like “focusing” on everything at once, but there are always plenty more parameters to optimize for. The result is a model that can generate extremely high quality speech with any of 15 voices (and several languages) at about half real time — so a minute-long clip would take about 36 seconds to generate instead of a couple hours.

This seemingly basic capability has plenty of benefits. Not only is it faster, but it makes working with the results simpler and easier. As a producer of audio content, you can just drop in a script hundreds of words long, listen to what it puts out, then tweak its pronunciation or cadence with a few keystrokes. Tacotron changed the synthetic speech space, but it has never really been a product. WellSaid builds on its advances with its own to create both a usable piece of software, and arguably a better speech system overall.

As evidence, clips generated by the model — 15-second ones, so they can compete with Tacotron and others — reached a milestone of being equally well rated as human voices in tests organized by WellSaid. There’s no objective measure for this kind of thing, but asking lots of humans to weigh in on how human something sounds is a good place to start.

As part of the team’s work to achieve “human parity” under these conditions, they also released a number of audio clips demonstrating how the model can produce much more demanding content.


It generated plausible-sounding speech in Spanish, French, and German (I’m not a native speaker of any of them, so can’t say more than that), showed off its facility with complex and linguistically difficult words (like stoichiometry and halogenation), words that differ depending on context (buffet, desert), and so on. The crowning achievement must be a continuous 8-hour reading of the entirety of Mary Shelley’s Frankenstein.

But audiobooks aren’t the industry that WellSaid is using as a stepladder to further advances. Instead, they’re making a bundle working in the tremendously boring but necessary field of corporate training. You know, the sorts of videos that explain policies, document the use of internal tools, and explain best practices for sales, management, development tools, and so on.

Corporate learning stuff is generally unique or at least tailored to each company, and can involve hours of audio — an alternative to saying “here, read this packet” or gathering everyone in a room to watch a decades-old DVD on office conduct. Not the most exciting place to put such a powerful technology to work, but the truth is with startups that no matter how transformative you think your tech is, if you don’t make any money, you’re sunk.

A screenshot of WellSaid Labs' synthetic speech interface.

Image Credits: WellSaid Labs

“We found a sweet spot in the corporate training field, but for product development it has helped us build these foundational elements for a bigger and greater space,” explained head of growth Martin Ramirez. “Voice is everywhere, but we have to be pragmatic about who we build for today. Eventually we’ll deliver the infrastructure where any voice can be created and distributed.”

At first that may look like expanding the corporate offerings slowly, in directions like other languages — WellSaid’s system doesn’t have English “baked in,” and given training data in other languages should perform equally well in them. So that’s an easy way forward. But other industries could use improved voice capability as well: podcasting, games, radio shows, advertising, governance.

One significant limitation to the company’s approach is that the system is meant to be operated by a person and used for, essentially, recording a virtual voice actor. This means it’s not useful to the groups for whom an improved synthetic voice is desirable — many people with disabilities that affect their own voice, blind people who use voice-based interfaces all day long, or even people traveling in a foreign country and using real-time translation tools.

“I see WellSaid servicing that use case in the near future,” said Ramirez, though he and the others were careful not to make any promises. “But today, the way it’s built, we truly believe a human producer should be interacting with the engine, to render it at a natural, a human parity level. The dynamic rendering scenario is approaching quite fast, and we want to be prepared for it, but we’re not ready to do it today.”

The company has “plenty of runway and customers” and is growing fast — so no need for funding just now, thank you, venture capital firms.

Source

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Science

Too bright to breed

Published

on

Night light from coastal cities overpowers natural signals for coral spawning from neighboring reefs.

PHOTO: NOKURO/ALAMY STOCK PHOTO

Most coral species reproduce through broadcast spawning. For such a strategy to be successful, coordination has had to evolve such that gametes across clones are released simultaneously. Over millennia, lunar cycles have facilitated this coordination, but the recent development of bright artificial light has led to an overpowering of these natural signals. Ayalon et al. tested for the direct impact of different kinds of artificial light on different species of corals. The authors found that multiple lighting types, including cold and warm light-emitting diode (LED) lamps, led to loss of synchrony and spawning failure. Further, coastal maps of artificial lighting globally suggest that it threatens to interfere with coral reproduction worldwide and that the deployment of LED lights, the blue light of which penetrates deeper into the water column, is likely to make the situation even worse.

Curr. Biol. 10.1016/j.cub.2020.10.039 (2020).

Source

Continue Reading

Science

SpaceX launches Starlink app and provides pricing and service info to early beta testers

Published

on

SpaceX has debuted an official app for its Starlink satellite broadband internet service, for both iOS and Android devices. The Starlink app allows users to manage their connection – but to take part you’ll have to be part of the official beta program, and the initial public rollout of that is only just about to begin, according to emails SpaceX sent to potential beta testers this week.

The Starlink app provides guidance on how to install the Starlink receiver dish, as well as connection status (including signal quality), a device overview for seeing what’s connected to your network, and a speed test tool. It’s similar to other mobile apps for managing home wifi connections and routers. Meanwhile, the emails to potential testers that CNBC obtained detail what users can expect in terms of pricing, speeds and latency.

The initial Starlink public beta test is called the “Better than Nothing Beta Program,” SpaceX confirms in their app description, and will be rolled out across the U.S. and Canada before the end of the year – which matches up with earlier stated timelines. As per the name, SpaceX is hoping to set expectations for early customers, with speeds users can expect ranging from between 50Mb/s to 150Mb/s, and latency of 20ms to 40ms according to the customer emails, with some periods including no connectivity at all. Even with expectations set low, if those values prove accurate, it should be a big improvement for users in some hard-to-reach areas where service is currently costly, unreliable and operating at roughly dial-up equivalent speeds.

Image Credits: SpaceX

In terms of pricing, SpaceX says in the emails that the cost for participants in this beta program will be $99 per moth, plus a one-time cost of $499 initially to pay for the hardware, which includes the mounting kit and receiver dish, as well as a router with wifi networking capabilities.

The goal eventually is offer reliably, low-latency broadband that provides consistent connection by handing off connectivity between a large constellation of small satellites circling the globe in low Earth orbit. Already, SpaceX has nearly 1,000 of those launched, but it hopes to launch many thousands more before it reaches global coverage and offers general availability of its services.

SpaceX has already announced some initial commercial partnerships and pilot programs for Starlink, too, including a team-up with Microsoft to connect that company’s mobile Azure data centers, and a project with an East Texas school board to connect the local community.

Source

Continue Reading

Science

Erratum for the Report “Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances” by R. Van Klink, D. E. Bowler, K. B. Gongalsky, A. B. Swengel, A. Gentile, J. M. Chase

Published

on

S. Rennie, J. Adamson, R. Anderson, C. Andrews, J. Bater, N. Bayfield, K. Beaton, D. Beaumont, S. Benham, V. Bowmaker, C. Britt, R. Brooker, D. Brooks, J. Brunt, G. Common, R. Cooper, S. Corbett, N. Critchley, P. Dennis, J. Dick, B. Dodd, N. Dodd, N. Donovan, J. Easter, M. Flexen, A. Gardiner, D. Hamilton, P. Hargreaves, M. Hatton-Ellis, M. Howe, J. Kahl, M. Lane, S. Langan, D. Lloyd, B. McCarney, Y. McElarney, C. McKenna, S. McMillan, F. Milne, L. Milne, M. Morecroft, M. Murphy, A. Nelson, H. Nicholson, D. Pallett, D. Parry, I. Pearce, G. Pozsgai, A. Riley, R. Rose, S. Schafer, T. Scott, L. Sherrin, C. Shortall, R. Smith, P. Smith, R. Tait, C. Taylor, M. Taylor, M. Thurlow, A. Turner, K. Tyson, H. Watson, M. Whittaker, I. Woiwod, C. Wood, UK Environmental Change Network (ECN) Moth Data: 1992-2015, NERC Environmental Information Data Centre (2018); .

Source

Continue Reading

Trending