컨텐츠 바로가기

11.29 (금)

이슈 인공지능 시대가 열린다

Open source AI platforms gain attention as AI voice market grows

댓글 첫 댓글을 작성해보세요
주소복사가 완료되었습니다
매일경제

(Yonhap)

<이미지를 클릭하시면 크게 보실 수 있습니다>


Open source artificial intelligence (AI) platforms that distribute software for free are expanding their presence, with OpenAI responding by updating its voice AI features.

Voice AI is not a highly regarded market like large language models (LLMs) yet. But it is considered essential for the upcoming multimodal era, where various AIs, including text, images, and voice, are integrated.

According to multiple sources from the information technology (IT) industry, Kyutai, a non-profit AI research lab based in France, recently unveiled its self-developed voice AI model. Moshi is available in a free version, along with its code. The model is based on a language model called Helium, which has 7 billion parameters and is akin to human brain synapses.

It can even be used without an internet connection, allowing it to be stored and used on smartphones or tablets, in contrast with OpenAI’s voice AI, which is cloud-based. Moshi’s voice generation time is only 0.2 seconds, faster than OpenAI’s GPT-4, which takes 0.23 to 0.32 seconds.

Kyutai Chief Executive Officer Patrick Perez emphasized in a recent interview with Maeil Business Newspaper that his company will make AI easily accessible for everyone, while noting that research on Moshi and other multimodal foundation models will continue.

Kyutai is currently viewed as the French counterpart to OpenAI. It was co-founded in November 2023 by the iliad Group, CMA CGM Group, and Schmidt Futures, led by former Google CEO Eric Schmidt, with a total investment of 300 million euros. A core team of eight developed voice AI that rivals OpenAI’s, capable of very natural conversations and available for online trials, within six months.

Other companies have also released voice AI as open source, with notable examples including Meta, Coqui, Mozilla, and Kaldi.

Meta earlier unveiled MMS, capable of recognizing and generating over 4,000 languages. A significant advantage of MMS is its ability to learn from data without needing labeled training tags. For their parts, Mozilla’s DeepSpeech has improved GPU efficiency and Coqui has launched fast real-time voice recognition and text-to-speech conversion.

Both DeepSpeech and Coqui are open source, and the rationale for distributing AI in this format is to gain a first-mover advantage. Unlike closed models like OpenAI’s GPT or Anthropic’s Claude, open source allows anyone to access and use the source code for free. This increases technological accessibility for a broader use base and helps avoid dependence on certain closed models. Developing companies can build ecosystems around open source, encouraging many developers to adopt the technology and lead in standardizing it.

“The AI market is not solely driven by closed models like OpenAI or Anthropic,” an industry insider said. “Open source models are also demonstrating sufficiently good performance.”

The closed sector is also actively developing voice AI. OpenAI recently launching an updated voice mode for ChatGPT that improves usage in 50 languages, including Korean and Japanese, and is currently available to paid users in Korea.

OpenAI’s voice mode allows for adjustment of AI speech speed and can recognize the speaker’s emotions, with the company refining the Korean voice output to sound more natural and support nine different voice versions. Google also unveiled its AI voice assistant, Gemini, in August 2024. The assistant has been optimized for mobile environments, offering ten voices to choose from regarding tone and style.

According to market research firm Mordor Intelligence, the voice recognition market is projected to grow to $42.08 billion in 2029 from $14.95 billion in 2024. With the advancement of AI, it is expected to be widely adopted across various sectors, including smart homes and IoT, customer service and call centers, healthcare, automotive and navigation, educational tools, gaming and entertainment, banking and finance, legal and administrative services, accessibility support, and translation services.
기사가 속한 카테고리는 언론사가 분류합니다.
언론사는 한 기사를 두 개 이상의 카테고리로 분류할 수 있습니다.