Apple's tight integration of POSIX-compliant file paths and a command line interface, and their historically strong hardware and manufacturing standards have had me on the Mac platform for years. However, Apple's recent disappointing and lackluster focus on MacOS and their hardware, and a Lenovo X1 Yoga, 2nd Generation, has caught my attention.
Switching OS's, however, has two seemingly insurmountable areas which concern me. This post focuses on Text-to-Speech OS integration.
How To Use Windows Speech Recognition Buried within Windows is a viable speech recognition program that, with a little practice, could double your typing speed. By Lamont Wood.
I've been through the MicrosoftNarrator documentation, which I've found unhelpful. Granted, my use-case isn't related to being visually-impaired. One of my use cases is for Narrator to only read selected text, as I outline below. For example in this 2012 SuperUser post, the questioner has the same issue, with no satisfactory answer provided.
I also wish to emphasize that 'copy and paste into a third party TTS application' is unsatisfactory. On my Mac, I can provide an input, and get an MP3 TTS file with no user intervention in-between, for my #1 scenario, below. I perform this only with Open Source tools, too, except for the 'say' command.
I've long taken advantage of Mac's Text-to-Speech integration. I use it in three specific ways, though a combination of the below defines 90% of my use-cases.
- Converting reformatted text from emails that I wish to have read to me at a later time
- My current Mac workflow: I copy the source from my email, use a vim script that removes HTML, leaving the text I wish to have read. For example, this script inserts a 'silence'
[slnc 2000]
command that helps me identify paragraph markers when I listen to the read text. - After text markup is complete, I pass the formatted text through the 'say' command, which creates an AIFF of the text-to-speech.
- Using lame, I then convert this to an mp3 and using dropcaster, push the mp3's to a static public location where my podcast client can retrieve it.
- Thanks to bash scripts, the above takes 5 seconds of my time. The last time I switched from Mac to Windows, I dearly missed having this. I used ReadAloud's TTs software in the past, but was always more kludgy than the above.
- My current Mac workflow: I copy the source from my email, use a vim script that removes HTML, leaving the text I wish to have read. For example, this script inserts a 'silence'
- Live proof-reading of emails or documents I'm creating. I find errors more easily when I have my Mac read my written text back to me.
- Yes, I can copy and past into Notepad, but that's clumsy. Looking at Narrator's interface, I found it very difficult to figure out how to get Narrator to read selected text across applications, i.e., Outlook, Firefox, Word, and so forth.
- Using TTS to read selected browser text on long articles I wish to hear while I perform non-attention-demanding tasks.
- This is similar to #2, however, I might decide it's worth creating a file for podcast if the read text captures my attention, and I'll shift to a #1 process.
- Firefox has a 'reader' mode which largely helps and works well under Windows.
My questions are:
- Is there an equivalent way to pass a formatted text file on Win10 to an MS binary for processing, similar to the 'say' command on Mac? I see dockerimages that are TTS specific, thought that seems more kludgy.
- What is the native way to have Windows 10 Narrator read selected text in a fashion as straight-forward as selecting text in any application, invoking a keyboard command, and Win10 perform TTS services?
I'm open their may be different but similar ways to do the above. 'Copy and paste into notepad' however is a kludge as well. I'm hoping MS did their accessibility homework and deployment as well as Apple has.
Some notes to self as I continue to explore this question
- There are several python packages that enable TTS within a python script. At first this looked promising, but there are several fatal issues, focusing on the python methods outlined here: https://pythonprogramminglanguage.com/text-to-speech/
- I had problems installing pyttsx. I have brew-installed py2.7.13 and py3.6.1 and using pip3 or pip, was unable to successfully install either version. The original pyttsx is py2, with a fork for py3. This is too bad, as the design calls for the python module to use the native TTS engine. If pyttsx worked on python3, and the project were more active, I'd be more amenable to troubleshoot the module's failure. You can read my comments to a proposed answer here.
- pyTTS uses Google TTS. This sounds good, but necessarily requires an internet connection. Since I want to match native TTS capability, this moots this option.
- There is a docker option, https://github.com/parente/espeakbox works great, but the voice is where TTS was 6+ years ago. While I respect the author's desire for creating a performant TTS engine, I love Mac's native TTS and I'd like to be at par with this.
- Playing with other TTS non-native options, such as Merlin or Festival, the TTS quality is not at par with Mac or Windows native TTS.
- as per Lưu Vĩnh Phúc's suggestion, it does appear easy to automate native Windows TTS, as per this page: https://www.pdq.com/blog/powershell-text-to-speech-examples/. I step closer to a solution.
1 Answer
MS Office has supported text-to-speech long before it was integrated into Windows (since Vista). As a result you can always open MS Word and have it read the document for you. Just add the Speak
button to the ribbon/Quick Access Toolbar then select the text and click it, or assign a shortcut to the speak feature
Narrator also supports this feature. You just have to check the shortcut list
Windows 10 supports Scan Mode to help you go faster. It can be toggled by Caps lock+Spacebar
However Narrator doesn't work well will MS Office so you need to copy the text to an external application. This can be achieved with an AutoHotkey. It'll need to copy the selected text and feed to the below VBS script
How Do I Get Windows To Speak Text Windows For Mac
I don't think there's something different when reading a webpage compared to a simple text. But check this How to use narrator for reading the content of web pages?
Some other TTS applications on Windows can be found here
The text reading output can be recorded with tons of software out there. In case you don't want to hear it and just need to save the output file then use any stream mixing software like GraphStudioNext (included in K-lite codec pack) and redirect the output stream to a file; convert to mp3 before that if needed
All the things above can be automated with a script. Forget the batch file, PowerShell is very powerful and can do anything that can be done with Bash. It can strip format from text and edit it so no need for the vimscript. There's also vim for Windows. Or if needed you can always install bash on Windows or Cygwin. GUI automation can also be done with AutoHotKey.
phuclv