Undertone - Offline Whisper AI Voice Recognition

Undertone is an offline voice recognition asset for UE5. Enhance your game with 99 languages, translation, efficient performance, and cross-platform compatibility for immersive player experiences.

Getting Started

Welcome to the Undertone documentation! In this section, we’ll walk you through the initial steps to start using the tools. We will explain the various features of Undertone, how to set it up, and provide guidance on using the different models for voice recognition.

Models

Undertone offers both English-only and multilingual models. The plugin comes with a default English-only model, tiny.en. Available model types include tiny, base, small, medium, and large. Smaller models are more suitable for devices with limited resources, like phones, while larger models can be used on computers with more processing power.

Model comparison

The following table provides a comparison of the various models in terms of disk space and memory usage:

Model Disk Mem
tiny 75 MB ~125 MB
base 142 MB ~210 MB
small 466 MB ~600 MB
medium 1.5 GB ~1.7 GB

The tiny model is the fastest and less accurate, while the medium model is the slowest and most accurate.

How to download models

Undertone provides a convenient interface for downloading the models with just a click.

UUndertoneRuntimeSubsystem

The UUndertoneRuntimeSubsystem serves as the central point for managing speech-to-text operations, including generating text from speech and managing models.

Reference
UFUNCTION(BlueprintCallable, Category = "Undertone")
UNeuralModelWrapper* LoadModelFor(const FModelMetadata& ModelIdentity) const;

UFUNCTION(BlueprintCallable, Category = "Undertone")
TArray<FModelMetadata> GetAllModelMetadatas() const;

UFUNCTION(BlueprintCallable, Category = "Undertone")
UNeuralModelWrapper* LoadModelByName(const FString& Name) const;

UFUNCTION(BlueprintCallable, Category = "Undertone")
FUndertoneResult UndertoneSpeechToText(const UNeuralModelWrapper* Model, const TArray<float>& Samples, const FString& Language, bool TranslateToEnglish, const FString& InitialPrompt);
Example usage: Getting all the models downloaded and loading one
#include "Subsystems/UndertoneRuntimeSubsystem.h"
#include "Structs/ModelInformation.h"

UUndertoneRuntimeSubsystem* UndertoneRuntimeSubsystem = GetWorld()->GetGameInstance()->GetSubsystem<UUndertoneRuntimeSubsystem>();
/* Array of model metadatas */
TArray<FModelMetadata> AvailableModels = UndertoneRuntimeSubsystem->GetAllModelMetadatas();
/* Load the first model */
auto Model = UndertoneRuntimeSubsystem->LoadModelFor(AvailableModels[0]);

URealtimeSpeechTranscriberComponent

The URealtimeSpeechTranscriberComponent is a component that can be added to any actor to enable realtime transcription functionality. It provides a simple interface for generating text from speech.

Reference
UFUNCTION(BlueprintCallable, Category = "Undertone")
void SetModel(UNeuralModelWrapper* Model);

UFUNCTION(BlueprintCallable, Category = "Undertone")
UNeuralModelWrapper* GetModel() const;

UPROPERTY(BlueprintReadWrite, EditAnywhere, Category = "Undertone")
FString Language;

UPROPERTY(BlueprintReadWrite, EditAnywhere, Category = "Undertone")
bool TranslateToEnglish;

UFUNCTION(BlueprintCallable, Category = "Undertone")
void StartListening();

UFUNCTION(BlueprintCallable, Category = "Undertone")
void StopListening();

UPROPERTY(BlueprintAssignable, Category = "Undertone")
FOnTextTranscribed OnTextTranscribed;

UPROPERTY(BlueprintReadOnly, Category = "Undertone")
bool IsListening;

URecordedSpeechTranscriberComponent

Reference
UFUNCTION(BlueprintCallable, Category = "Undertone")
void SetModel(UNeuralModelWrapper* Model);

UFUNCTION(BlueprintCallable, Category = "Undertone")
UNeuralModelWrapper* GetModel() const;

UPROPERTY(BlueprintReadWrite, EditAnywhere, Category = "Undertone")
FString Language;

UPROPERTY(BlueprintReadWrite, EditAnywhere, Category = "Undertone")
bool TranslateToEnglish;

UPROPERTY(BlueprintReadOnly, EditAnywhere, Category = "Undertone")
bool IsRecording;
	
UFUNCTION(BlueprintCallable, Category = "Undertone")
void StartRecording();

UFUNCTION(BlueprintCallable, Category = "Undertone")
FText StopRecording();
Example usage: Transcribing speech from a file

Include the following headers

#include "Subsystems/UndertoneRuntimeSubsystem.h"

Load the smallest model

UUndertoneRuntimeSubsystem* UndertoneRuntimeSubsystem = GetWorld()->GetGameInstance()->GetSubsystem<UndertoneRuntimeSubsystem>();
auto Model = UndertoneRuntimeSubsystem->LoadModelByName("tiny.en");

Load some audio from a wav file

TArray<float> Buffer;
int32 SampleRate = 0;

TArray <uint8> rawFile;
const FString filePath = "This is a test recording.wav";// Path to your wav file here
FFileHelper::LoadFileToArray(rawFile, filePath.GetCharArray().GetData());
FWaveModInfo WaveInfo;

if (WaveInfo.ReadWaveInfo(rawFile.GetData(), rawFile.Num()))
{
    SampleRate = *WaveInfo.pSamplesPerSec;
    const uint8* PCMData = WaveInfo.SampleDataStart;
    int32 NumSamples = WaveInfo.SampleDataSize / (*WaveInfo.pBitsPerSample / 8);

    // Resize the Samples array to hold the PCM16 float data
    Buffer.SetNumUninitialized(NumSamples);

    // Convert PCM16 data to float
    for (int32 i = 0; i < NumSamples; ++i)
    {
        int16 SampleValue = *((int16*)(PCMData + i * 2));
        Buffer[i] = SampleValue / 32768.0f; // Normalize to the range [-1.0, 1.0]
    }
} else
{
    UE_LOG(LogUndertoneRuntime, Error, TEXT("Failed to read wave info."));
}

Transcribe it using Undertone

TArray<float> FinalSamples;
ResampleIfNecessary(Buffer, SampleRate, *WaveInfo.pChannels, FinalSamples);

auto Model = LoadModelByName("tiny.en");
auto Transcription = UndertoneSpeechToText(Model, FinalSamples, "en", false, "");
UE_LOG(LogUndertoneRuntime, Log, TEXT("Transcription: %s"), *Transcription.text.ToString());

Output

LogUndertoneRuntime: NeuralModelConfigAsset.IsNull() = 0
LogUndertoneRuntime: Loading model from memory
LogUndertoneRuntime: Loading names from memory
LogUndertoneRuntime: Finished loading model
LogUndertoneRuntime: Loaded tokenizer from memory
LogUndertoneRuntime: decoder_input_ids: 50257
LogUndertoneRuntime: decoder_input_ids: 50362
LogUndertoneRuntime: Inference took 190 ms
LogUndertoneRuntime: Output count: 1
LogUndertoneRuntime: Transcription:  Hello World, this is a test recording.
LogUndertoneRuntime: Transcription:  Hello World, this is a test recording.

Demos

The plugin contains two demos to demonstrate transcription functionality: realtime transcription and push-to-record transcription.

The demo can be changed from the button on the right bottom corner.

Realtime transcriber

This demo showcases the realtime transcription functionality. The user can speak into the microphone, and the transcribed text will be displayed on the screen in real-time in 1s windows. Press the stop button to stop the transcription.

Push to record

This demo showcases the push-to-record functionality. The user can press the record button to start recording, and press again to stop recording. The transcribed text will be displayed on the screen.

Troubleshooting

Common issues

Transcription quality is poor

There could be several factors contributing to this issue:

  • Background noise: The model might struggle with accurate transcription when there is substantial background noise or music. Try reducing the noise for better results.
  • Small model: While small models offer portability and speed, their transcription quality may not be as high. Consider using base or larger models for improved accuracy.
  • Multilingual for English: If your application is primarily focused on supporting English, it is advisable to use an English-specific model. These models typically perform better on English tasks compared to their multilingual counterparts.

Other

For any questions, issues or feature requests don’t hesitate to email us at help@leastsquares.io or join the discord. Very are happy to help and have very fast response times :)

Appendix

Supported Platforms

Undertone supports the following platforms:

Platform Supported
Windows
Android
iOS
MacOS
Linux

If interested in any other platforms, please reach out.

Supported languages

Undertone multilingual models support the following languages:

  • english
  • chinese
  • german
  • spanish
  • russian
  • korean
  • french
  • japanese
  • portuguese
  • turkish
  • polish
  • catalan
  • dutch
  • arabic
  • swedish
  • italian
  • indonesian
  • hindi
  • finnish
  • vietnamese
  • hebrew
  • ukrainian
  • greek
  • malay
  • czech
  • romanian
  • danish
  • hungarian
  • tamil
  • norwegian
  • thai
  • urdu
  • croatian
  • bulgarian
  • lithuanian
  • latin
  • maori
  • malayalam
  • welsh
  • slovak
  • telugu
  • persian
  • latvian
  • bengali
  • serbian
  • azerbaijani
  • slovenian
  • kannada
  • estonian
  • macedonian
  • breton
  • basque
  • icelandic
  • armenian
  • nepali
  • mongolian
  • bosnian
  • kazakh
  • albanian
  • swahili
  • galician
  • marathi
  • punjabi
  • sinhala
  • khmer
  • shona
  • yoruba
  • somali
  • afrikaans
  • occitan
  • georgian
  • belarusian
  • tajik
  • sindhi
  • gujarati
  • amharic
  • yiddish
  • lao
  • uzbek
  • faroese
  • haitian creole
  • pashto
  • turkmen
  • nynorsk
  • maltese
  • sanskrit
  • luxembourgish
  • myanmar
  • tibetan
  • tagalog
  • malagasy
  • assamese
  • tatar
  • hawaiian
  • lingala
  • hausa
  • bashkir
  • javanese
  • sundanese


About us

We are a small company focused on building tools for game developers. Send us an email to careers@leastsquares.io if interested in working with us. For any other inquiries, feel free to contact us at hello@leastsquares.io or contact us on the discord





Sign up to our newsletter.

Want to receive news about discounts, new products and updates?