SpeechToText
SpeechToText provides an example of a local mobile speech-to-text application for Aurora OS using bundled GigaAM RNNT ONNX models. It demonstrates a minimal offline recognition flow: model loading, microphone recording, background transcription, and result display on a single screen.
Table of Contents
- Compatibility
- Build features
- Branch info
- Install and running
- Screenshots
- Use cases
- Project Structure
- Terms of Use and Participation in Development
Compatibility
Conan is required to resolve and install project dependencies during build. The application is intended for Aurora OS 5.x and uses the Sailfish-based Qt 5 stack.
Build features
The project is built using the CMake build system. The project uses the Conan package manager to obtain dependencies, including ONNX Runtime and its runtime libraries.
For x86_64, rpm/ru.auroraos.SpeechToText.spec contains a temporary workaround for Conan 2.7: it creates an ldd wrapper that runs /lib64/ld-linux-x86-64.so.2 --list so conan-deploy-libraries can correctly detect and copy shared libraries for the target executable. Newer Conan versions such as 2.9 do not need this workaround, and a plain conan-deploy-libraries "$EXECUTABLE" "$CONAN_LIB_DIR" "$SHARED_LIBRARIES" call is enough.
The application also requires Qt5Multimedia and the Microphone permission to record audio before offline recognition.
Branch info
Application versions conform to the branch naming convention
Install and running
Installation and build are performed according to the Build example instruction.
Screenshots
Use cases
The application demonstrates local speech-to-text functionality:
Audio Capture
- Recording speech from the device microphone with one button
Offline Recognition
- Running local GigaAM RNNT inference through ONNX Runtime without network access
Result Display
- Showing recognition status, errors, and final transcript on a single screen
Bundled Models
- Shipping encoder, decoder, joint, and vocabulary files inside the application package
Project Structure
The project has a standard structure of an application based on C++ and QML for Aurora OS with offline speech recognition capabilities.
- CMakeLists.txt file describes the project structure for the CMake build system and ONNX Runtime integration.
-
conanfile.py file declares the Conan dependency on
onnxruntime. - icons directory contains the application icons for different screen resolutions.
-
qml directory contains the QML source code and the UI resources.
- cover directory contains the application cover implementations.
- icons directory contains the additional custom UI icons.
- pages directory contains the application pages.
- SpeechToText.qml file provides the application window implementation.
-
rpm directory contains the rpm-package build settings.
- ru.auroraos.SpeechToText.spec file is used by rpmbuild tool.
-
src directory contains the C++ source code with audio capture and ONNX-based recognition.
- speechtotextrunner.h and speechtotextrunner.cpp expose the main QML-facing API and coordinate model loading, audio capture, and recognition state.
- asrworker.h and asrworker.cpp load the bundled model and run transcription in a worker thread.
- rnntrecognizer.h and rnntrecognizer.cpp implement GigaAM RNNT preprocessing and ONNX Runtime inference.
- audiobuffer.h contains the in-memory audio buffer structure for preprocessing.
-
main.cpp is the application entry point and registers
SpeechToTextRunnerin the QML context.
- translations directory contains the UI translation files.
- models directory contains bundled GigaAM RNNT model files and the accompanying LICENSE.
- ru.auroraos.SpeechToText.desktop file defines the display and parameters for launching the application.
Terms of Use and Participation in Development
The source code of the project is provided under license, which allows to use it in third-party applications.
To participate in the development of the project, please read the member agreement. If you plan to submit your own source code for inclusion in the project, you will need to accept the CLA terms and conditions.
Participant information is listed in the AUTHORS file.
The Code of Conduct is the current set of rules of the Open Mobile Platform Company, which informs about the expectations for interaction between community members when communicating and working on projects.
