Relevant links for the Brille project
Conceptual
- Sensory substitution on wikipedia
- Acoustic Models description from VoxForge?
- Speech Terms glossary
People
- David Huggins-Daines - Sphinx developer
Libraries and Tools
Sphinx-based systems
- CMUSphinx - for more information see CMUSphinx wiki page
- PocketSphinx
- Julius Large Vocabulary CSR Engine based on Sphinx (and source browser with diagrams)
- Sphinx Open Discussion forum
- Sphinx Help forum
Docs
- documentation of Sphinx-2 codebooks, senones and HMM formats and processes
- Sphinx2 phoneset
- Sphinx2 allphone API
- PocketSphinx and SphinxBase doxygen docs
C/C++ based feature analysis
Snack
A multi-platform real-time sound acquisition package (C++) that can perform formant frequency and pitch analysis.
Snack Docs
CLAM
This seems like quite an advanced package.
It includes Python bindings:
Somebody has recently added a formant frequency analysis module:
SPRACH
Neural network (connectionist) speech recognition tools and speech feature extraction tools. Source code. Separate core and GUI code. neural networks, feature-calculation, sound/audio interface, conversion, etc.
- The SPRACH project summary - hybrid speech recognition in 3 languages, including phoneme probability classification
- The SPRACH core library - separate GUI and internal code.
- FAQ on feature recognition tools and descriptions of features that are extracted.
- An overview of the tools, FAQ, description of methodology,
The structure of the neural net system is rather simple. Whereas Gaussian mixture systems typically rely on a rather fine sub-phonetic state division, and in consequence have complicated state-tying infrastructure to maximise training efficiency, a connectionist acoustic model can be a single neural net with a few tens of outputs, each of which has a direct interpretation as a particular phone.
SFS
Speech Filing System or SFS - unfortunately not open source, but source code available. The key winner:
- formant estimates from speech waveforms in a simple tool - man page
- download page and FTP site - the key source code is in win32
ISPI Automatic Speech Recognition library
http://www.isip.piconepress.com/projects/speech/index.html is a freely available, modular, state-of-the-art speech recognition system that can be easily modified to suit your research needs. The system is built on top of a vast hierarchy of general purpose C++ classes that implement generic math, data structure, and signal processing concepts.
CSLU toolkit
The CSLU Toolkit was created to provide the basic framework and tools for people to build, investigate and use interactive language systems. These systems incorporate leading-edge speech recognition, natural language understanding, speech synthesis and facial animation technologies. The toolkit provides a comprehensive, powerful and flexible environment for building interactive language systems that use these technologies, and for conducting research to improve them.
Python based formant frequency analysis
ESPS
Kyle Gorman has developed a python based interface between python and Praat. Features include:
- F_0 analysis
- Signal intensity
- Spectral slices
- Formant analysis
Links:
ESPS requirements
Other systems
- Edinburgh speech tools and manual
- EMU Speech Database System and manual including plots of formant analysis of vowels
- davidf's Brille links
Papers
Many of these papers are directly available here
Phoneme Extraction
- Wavelet Based Feature Extraction for Phoneme Recognition
- Speech Perception Using Real-Time Phoneme Detection: The BeBe System (unfortunately no code seems available)
- An approach to obtain weighted graphs of words based on phoneme detection
Formant Frequency Estimation
Formants - good article describing what formants are, different concepts
- Helpful paper on some of the background maths
- Robust Formant Tracking for Continuous Speech with Speaker Variability
- Notes on acoustics of vowel production and formant frequencies
- Adaptive control of vowel formant frequency - Evidence from real-time formant manipulation (changing what people hear of their own speech and how it affects their speech)
- YIN a fundamental frequency estimator for speech and music
- Speech formant frequency and pitch estimation using instantaneous complex frequency
Related projects
- VoxForge - freely licensed audio of speech with transcriptions
