SpeechToText

SpeechToText

Speech to text on Aurora OS

RU | EN

SpeechToText

SpeechToText provides an example of a local mobile speech-to-text application for Aurora OS using bundled GigaAM RNNT ONNX models. It demonstrates a minimal offline recognition flow: model loading, microphone recording, background transcription, and result display on a single screen.

Table of Contents

Compatibility

Conan is required to resolve and install project dependencies during build. The application is intended for Aurora OS 5.x and uses the Sailfish-based Qt 5 stack.

Build features

The project is built using the CMake build system. The project uses the Conan package manager to obtain dependencies, including ONNX Runtime and its runtime libraries. For x86_64, rpm/ru.auroraos.SpeechToText.spec contains a temporary workaround for Conan 2.7: it creates an ldd wrapper that runs /lib64/ld-linux-x86-64.so.2 --list so conan-deploy-libraries can correctly detect and copy shared libraries for the target executable. Newer Conan versions such as 2.9 do not need this workaround, and a plain conan-deploy-libraries "$EXECUTABLE" "$CONAN_LIB_DIR" "$SHARED_LIBRARIES" call is enough. The application also requires Qt5Multimedia and the Microphone permission to record audio before offline recognition.

Branch info

Application versions conform to the branch naming convention

Install and running

Installation and build are performed according to the Build example instruction.

Screenshots

Screenshots

Use cases

The application demonstrates local speech-to-text functionality:

Audio Capture

  • Recording speech from the device microphone with one button

Offline Recognition

  • Running local GigaAM RNNT inference through ONNX Runtime without network access

Result Display

  • Showing recognition status, errors, and final transcript on a single screen

Bundled Models

  • Shipping encoder, decoder, joint, and vocabulary files inside the application package

Project Structure

The project has a standard structure of an application based on C++ and QML for Aurora OS with offline speech recognition capabilities.

  • CMakeLists.txt file describes the project structure for the CMake build system and ONNX Runtime integration.
  • conanfile.py file declares the Conan dependency on onnxruntime.
  • icons directory contains the application icons for different screen resolutions.
  • qml directory contains the QML source code and the UI resources.
    • cover directory contains the application cover implementations.
    • icons directory contains the additional custom UI icons.
    • pages directory contains the application pages.
    • SpeechToText.qml file provides the application window implementation.
  • rpm directory contains the rpm-package build settings.
  • src directory contains the C++ source code with audio capture and ONNX-based recognition.
  • translations directory contains the UI translation files.
  • models directory contains bundled GigaAM RNNT model files and the accompanying LICENSE.
  • ru.auroraos.SpeechToText.desktop file defines the display and parameters for launching the application.

Terms of Use and Participation in Development

The source code of the project is provided under license, which allows to use it in third-party applications.

To participate in the development of the project, please read the member agreement. If you plan to submit your own source code for inclusion in the project, you will need to accept the CLA terms and conditions.

Participant information is listed in the AUTHORS file.

The Code of Conduct is the current set of rules of the Open Mobile Platform Company, which informs about the expectations for interaction between community members when communicating and working on projects.