Translation services/strategies/costs...

On 23/12/2021 00:21, Don Y wrote:
On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved
more
OR LESS than the original British.  I\'ve read that American English
is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time.  As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound:  \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant:  \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

[snip]
I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc.  And, require a fair bit of CPU
to deliver speech in real-time.  If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

And, all suffer from requiring some level of smarts at the application
level.  Feed it \"Blue orange dog cat run\" or \"Mr Mxyzptlk\" or even
something as bland as \"abcdefghijklmnopqrstuvwxyz\" and they yield
results that are unfathomable -- without *looking* at the source text
to try to suss-out what they are *trying* to say.

Place names with irregular pronunciation or including words that
synthesizers think they know tend to catch out even the most
sophisticated voice synthesisers.

Alexa can\'t manage for example Tyne & Wear (tine and weir), dialect
Chop Gate (chop yat) and Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. For that reason the
latter was a location for sensitive military intelligence during WWII.

https://en.wikipedia.org/wiki/Cholmondeley,_Cheshire#Cholmondeley_Castle_and_Park

--
Regards,
Martin Brown
 
On 23/12/2021 01:27, Don Y wrote:
On 12/22/2021 1:20 PM, Joe Gwinn wrote:
On Wed, 22 Dec 2021 01:02:44 -0700, Don Y
blockedofcourse@foo.invalid> wrote:

I\'m looking for folks who\'ve first hand experience having
documents translated into foreign languages.  Said documents
to include diagrams (think: callouts, legends), #included
text, etc.

I\'ve a fair bit of experience with I18N/L10N for software
but the extent of the effort, there, is usually fairly limited.
And, there\'s less of a need for a cohesive approach as the
interactions are \"punctuated\" (no pun intended).

Recommendations for firms to do this?  (no, finding multilingual
\"friends\" to do same is far too unprofessional -- though they may
have value in proofing the results)  I suspect there is some
value in having a single firm handle all of the translations
(in the hope that they will create a consistent SET of
translations, even if different individuals are involved for
each)

Relative effort?  (i.e., closer to reading speed or writing speed?)

Writing speed. Fluency in the technical domain, plus native fluency in
the target language, are both necessary.

So, you are assuming there is no learning curve for the material?
Or, that the original author is conveniently available (and
communicative with translator) to resolve those issues as
they manifest?

Time frame? (is this effort-bound or business-bound)

Cost?  (and, \"unit of measure\"?)

Slow and expensive.

But what is the unit of measure?  Page?  Job?  How does it
scale?  (e.g., if you bundle two 50 page documents together,
do you see a better price than if kept separate?  Or, vs.
a 100pp document?)

There is a fixed cost per job plus some per page or thousand words
depending on who you go to.
Finally, how to check the translation for accuracy and \"feel\"
(i.e., ensuring it is true to the original intent)?

Always need a proof reader and a tech editor in the target language;
need not be capable of translation.

So, you have to ensure both the translator and the proofreader
comprehend the material (and presentation).

The translator doesn\'t necessarily need to fully understand the
technical stuff provided they can interact with someone who does.

With translations in hand, do you (thereafter) maintain
individual documents?  Or, merge them into a conditional
document?
  Same as for the original document, but in versions.  With luck, the
drawings are in common.

So, you\'re suggesting *different* documents (for each translation)?

Absolutely. There are horror stories of incompetent global edits being
made to documents containing hybrid mixed languages. Word collisions in
different languages are rare but not rare enough.

Have a script to merge them prior to publication/typesetting.

Horror stories of attempts gone horribly wrong (i.e., what to
avoid)?

Well, you guessed it -- what had happened is that Swedish pronouns
were all directly 2:1 mapped to the corresponding English pronoun,
without recasting the sentences to remove the now massive ambiguities.

So, this is a failure on the part of the translator(s).
And, likely, an \"amateurish\" one

Some languages have a lot more ambiguity than English and some are more
precise with specific orders for words in a sentence.

My advice to the President of the Danish firm was to have his
engineers write the first draft in Danish, and hire a tech editor
whose native language is English to make the translation and perform
the cleanup.  The tech writer was allowed to question the engineers
until the editor understood, so the editor in effect stood in for the
English-speaking customer audience.  This was done.  I did a full
tech-edit scan of the result, and it read very well, and was perfectly
clear.  Only needed to fix one usage problem.  It still was not large
enough to fully describe that product, but still this was great
progress.

I had an experience with a Japanese firm where the Japanese (vendor)
would simply (apparently!) update their existing documentation to reflect
my needs.  This didn\'t instill confidence -- are they really changing
the product to meet those tighter specs?  Or, just *claiming* to?

The Japanese vendor may well have tightened the specification to meet
what you had asked for or not. Hard to tell from your description. My
boss could never get his head round the fact that in Japanese
negotiations \"yes\" means little more than \"I hear what you say\".

And if you were unable to measure the difference between the product
before and after they \"Improved\" it then I think they have a point.

--
Regards,
Martin Brown
 
On 22/12/2021 21:30, Don Y wrote:
On 12/22/2021 1:57 PM, Martin Brown wrote:
On 22/12/2021 20:21, Don Y wrote:

My wife\'s name contains phonemes that are all but impossible in
Japanese and her transliterated name overflowed the bank card field
allowed.

Kalahari?  :

Wi and Fo neither of which exist in the Japanese phonemes.
Closest and not very close are Ui and Fuo

You don\'t have to veer far from the european languages to find
gotchas in translations, odd (mis?)spellings, etc.

\"Preservative\" will raise eyebrows in french culture (\"preservatif\")

Much like asking your American secretary if she has a rubber.
(pencil eraser in British English)

--
Regards,
Martin Brown
 
On 12/23/2021 2:39 AM, Martin Brown wrote:
There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

And, all suffer from requiring some level of smarts at the application
level. Feed it \"Blue orange dog cat run\" or \"Mr Mxyzptlk\" or even
something as bland as \"abcdefghijklmnopqrstuvwxyz\" and they yield
results that are unfathomable -- without *looking* at the source text
to try to suss-out what they are *trying* to say.

Place names with irregular pronunciation or including words that synthesizers
think they know tend to catch out even the most sophisticated voice synthesisers.

Yes, but all bets are off with place names, proper nouns, etc. as
they have too many cross-language/cultural issues. I have relatives whose
*first* names I couldn\'t begin to spell!

The bigger problem is that \"text\" tends to have lots of assumptions
as to the intended reader. You\'d likely have no problem sussing out:
Service temporarily suspended 12Dec2021. Please use bkupsrvr.dom.org.
Contact Dr Frank N. Stein at x3-2001 or his sec\'y at galfriday@here.com
And, this is a relatively trivial \"message\".

But, a synthesizer needs lots more information to convey this in a
manner that doesn\'t have you pressing the \"please-spell-for-me-that-which
-you-are-trying-to-pronounce\" button.

And, trying to convey punctuation is a major chore (beyond *speaking*
each symbol).

As a result, the application has to integrate with the synthesizer
instead of treating it as a \"bolt on\" output modality.

[And, lets not even try to address spelling errors! Should the synthesizer
(or, some middleware?) try to determine the *intended* word and speak
that, instead?]

Alexa can\'t manage for example Tyne & Wear (tine and weir), dialect
Chop Gate (chop yat) and Cholmondeley (Chumlee) catch out most non-native
English speakers in fact most non-locals. For that reason the latter was a
location for sensitive military intelligence during WWII.

Worcester (WUSS-ter), Billerica (bill-RICK-a), Berlin (BURR-lin, not burr-LIN),
etc. Or, words that folks often mispronounce (almond, salmon).

I can identify folks who are from my home *town* (not \"state\"!) by their
speech habits -- highly localized.

A neighbor claimed her firstname to be \"Lara\" -- though she spelled it
L-A-U-R-A (\"Isn\'t that Laura??\").

[BTW, I\'m still waiting for a pointer to the code you want compiled...]
 
On 2021-12-23 10:39, Martin Brown wrote:
[...]
Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. [...]

English is well known for its complete disconnect between
pronunciation and spelling, but this is ridiculous.

Jeroen Belleman
 
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
<blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

And, all suffer from requiring some level of smarts at the application
level. Feed it \"Blue orange dog cat run\" or \"Mr Mxyzptlk\" or even
something as bland as \"abcdefghijklmnopqrstuvwxyz\" and they yield
results that are unfathomable -- without *looking* at the source text
to try to suss-out what they are *trying* to say.

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever
but it does need a net connection, but mp3s are small.
[B
Here is an other one using google translate:

#/bin/bash
echo \"english text document to audio or to mp3\"
echo \"Usage: gst6_en filename.txt [1]\"
echo \"if second argument present output to mp3 file, one mp3 file per line, else to audio\"
input=$1
lines=1
while IFS= read -r line
do
echo \"line $lines\"
if [ \"$2\" == \"\" ]
then
/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols \"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\";
else
wget -O $1_$lines.mp3 \"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\"
fi
let lines=lines+1
done < $1


So this will speak a whole english text file line by line or, if you call it with an extra argument,
make numbered mp3 files from a text file, one per line.
You can then play the numbered mp3 files in [any] sequence with a similar script,
and even edit and add comments by adding extra lines or deleting lines.

Was just a quick hack....

OTOH I have \'festival\' speech synthesizer on the PC for 20 years or so, not that bad either.
 
On 23/12/21 11:23, Jeroen Belleman wrote:
On 2021-12-23 10:39, Martin Brown wrote:
[...]
Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. [...]

English is well known for its complete disconnect between
pronunciation and spelling, but this is ridiculous.

How do you pronounce \"invalid\"; there are two in common
everyday speech!

Then there\'s Featherstonehaugh (Fanshaw), Alnwick (Annick),
Cirencester (Sissiter), Almondsbury (Almsbury),
Leicester (Lester), Congresbury (Coonsbury), and many more.

I suppose I ought to trot out this old chestnut again (complete with
a bang address path!)...


From raymond@pepto-bismol.berkeley.edu Wed Dec 19 09:50:07 1990
Date: Wed, 19 Dec 1990 09:50:07 GMT
Date-Received: Wed, 19 Dec 1990 13:17:32 GMT
Subject: Pronunciation in the English language (How do you pronounce kilometer?)
Message-ID: <1990Dec19.095007.11611@agate.berkeley.edu>
Organization: U.C. Berkeley
Path: otter!hpltoad!hpopd!hplabs!ucbvax!agate!pepto-bismol.berkeley.edu!raymond
Newsgroups: alt.folklore.urban

Subject: English poem

Multi-national personnel at North Atlantic Treaty Organization headquarters
near Paris found English to be an easy language ... until they tried to
pronounce it. To help them discard an array of accents, the verses below
were devised. After trying them, a Frenchman said he\'d prefer six months at
hard labor to reading six lines aloud. Try them yourself.

ENGLISH IS TOUGH STUFF
======================

Dearest creature in creation,
Study English pronunciation.
I will teach you in my verse
Sounds like corpse, corps, horse, and worse.
I will keep you, Suzy, busy,
Make your head with heat grow dizzy.
Tear in eye, your dress will tear.
So shall I! Oh hear my prayer.

Just compare heart, beard, and heard,
Dies and diet, lord and word,
Sword and sward, retain and Britain.
(Mind the latter, how it\'s written.)
Now I surely will not plague you
With such words as plaque and ague.
But be careful how you speak:
Say break and steak, but bleak and streak;
Cloven, oven, how and low,
Script, receipt, show, poem, and toe.

Hear me say, devoid of trickery,
Daughter, laughter, and Terpsichore,
Typhoid, measles, topsails, aisles,
Exiles, similes, and reviles;
Scholar, vicar, and cigar,
Solar, mica, war and far;
One, anemone, Balmoral,
Kitchen, lichen, laundry, laurel;
Gertrude, German, wind and mind,
Scene, Melpomene, mankind.

Billet does not rhyme with ballet,
Bouquet, wallet, mallet, chalet.
Blood and flood are not like food,
Nor is mould like should and would.
Viscous, viscount, load and broad,
Toward, to forward, to reward.
And your pronunciation\'s OK
When you correctly say croquet,
Rounded, wounded, grieve and sieve,
Friend and fiend, alive and live.

Ivy, privy, famous; clamour
And enamour rhyme with hammer.
River, rival, tomb, bomb, comb,
Doll and roll and some and home.
Stranger does not rhyme with anger,
Neither does devour with clangour.
Souls but foul, haunt but aunt,
Font, front, wont, want, grand, and grant,
Shoes, goes, does. Now first say finger,
And then singer, ginger, linger,
Real, zeal, mauve, gauze, gouge and gauge,
Marriage, foliage, mirage, and age.

Query does not rhyme with very,
Nor does fury sound like bury.
Dost, lost, post and doth, cloth, loth.
Job, nob, bosom, transom, oath.
Though the differences seem little,
We say actual but victual.
Refer does not rhyme with deafer.
Foeffer does, and zephyr, heifer.
Mint, pint, senate and sedate;
Dull, bull, and George ate late.
Scenic, Arabic, Pacific,
Science, conscience, scientific.

Liberty, library, heave and heaven,
Rachel, ache, moustache, eleven.
We say hallowed, but allowed,
People, leopard, towed, but vowed.
Mark the differences, moreover,
Between mover, cover, clover;
Leeches, breeches, wise, precise,
Chalice, but police and lice;
Camel, constable, unstable,
Principle, disciple, label.

Petal, panel, and canal,
Wait, surprise, plait, promise, pal.
Worm and storm, chaise, chaos, chair,
Senator, spectator, mayor.
Tour, but our and succour, four.
Gas, alas, and Arkansas.
Sea, idea, Korea, area,
Psalm, Maria, but malaria.
Youth, south, southern, cleanse and clean.
Doctrine, turpentine, marine.

Compare alien with Italian,
Dandelion and battalion.
Sally with ally, yea, ye,
Eye, I, ay, aye, whey, and key.
Say aver, but ever, fever,
Neither, leisure, skein, deceiver.
Heron, granary, canary.
Crevice and device and aerie.

Face, but preface, not efface.
Phlegm, phlegmatic, ass, glass, bass.
Large, but target, gin, give, verging,
Ought, out, joust and scour, scourging.
Ear, but earn and wear and tear
Do not rhyme with here but ere.
Seven is right, but so is even,
Hyphen, roughen, nephew Stephen,
Monkey, donkey, Turk and jerk,
Ask, grasp, wasp, and cork and work.

Pronunciation -- think of Psyche!
Is a paling stout and spikey?
Won\'t it make you lose your wits,
Writing groats and saying grits?
It\'s a dark abyss or tunnel:
Strewn with stones, stowed, solace, gunwale,
Islington and Isle of Wight,
Housewife, verdict and indict.

Finally, which rhymes with enough --
Though, through, plough, or dough, or cough?
Hiccough has the sound of cup.
My advice is to give up!!!


-- Author Unknown
 
On 23/12/2021 12:23, Jeroen Belleman wrote:
On 2021-12-23 10:39, Martin Brown wrote:
[...]
Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. [...]

English is well known for its complete disconnect between
pronunciation and spelling, but this is ridiculous.

It is not a \"complete disconnect\" - not by a long way. Despite some of
the common oddities of spelling in English, and some particularly
unusual cases, there are far worse languages. Look at verb endings in
French - many different spellings have different meanings, but are
pronounced the same. Mongolian and Gaelic have a very much bigger
separation between the phonetic values of the written spellings and the
actual pronunciation.

And of course, there are regional dialects and local words and names
that are often very different. These cause plenty of trouble for speech
recognition systems:

<https://www.youtube.com/watch?v=NMS2VnDveP8>


I love watching Kevin Bridges on YouTube and watching the auto-generated
captions completely fail to interpret his perfectly clear English with a
Scottish accent.
 
On 12/23/2021 2:52 AM, Martin Brown wrote:
I had an experience with a Japanese firm where the Japanese (vendor)
would simply (apparently!) update their existing documentation to reflect
my needs. This didn\'t instill confidence -- are they really changing
the product to meet those tighter specs? Or, just *claiming* to?

The Japanese vendor may well have tightened the specification to meet what you
had asked for or not. Hard to tell from your description. My boss could never
get his head round the fact that in Japanese negotiations \"yes\" means little
more than \"I hear what you say\".

And if you were unable to measure the difference between the product before and
after they \"Improved\" it then I think they have a point.

It\'s a cultural difference. They were oriented towards giving us what we
*wanted*...

We would try to determine their *existing* \"sweet spot\" for production
and use that to adjust our requirements. If it was acceptable to our
design, then life was good.

If not, we would see what THEIR cost would be to improving the
specification/performance from that EXISTING sweet spot to a
spot more in line with our needs. If the cost was prohibitive,
then we have a problem and need to see how much we can fudge
our requirements to meet their capabilities.

They, OTOH, would aim to deliver a product that fit our stated
goal and adjust their pricing to reflect the cost of doing so.
Wonderful! But, we may decide our product isn\'t marketable
at that new price point.

You can get damn near *anything* you want, when you\'re buying
in big quantities. But, that doesn\'t mean that there aren\'t some
select things that are less expensive than others!

\"Tell us what you\'ve *got*\" vs. \"Tell me what you *want*\"
 
On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

[No, I don\'t care if it can also serve up web pages or log errors
to remote hosts or handle TELNET connections... I just want it to
*speak*! You\'ll get no \"credit\" for supporting those other things.]

but it does need a net connection, but mp3s are small.
[B
Here is an other one using google translate:

#/bin/bash
echo \"english text document to audio or to mp3\"
echo \"Usage: gst6_en filename.txt [1]\"
echo \"if second argument present output to mp3 file, one mp3 file per line, else to audio\"
input=$1
lines=1
while IFS= read -r line
do
echo \"line $lines\"
if [ \"$2\" == \"\" ]
then
/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols \"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\";
else
wget -O $1_$lines.mp3 \"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\"
fi
let lines=lines+1
done < $1

So this will speak a whole english text file line by line or, if you call it with an extra argument,
make numbered mp3 files from a text file, one per line.
You can then play the numbered mp3 files in [any] sequence with a similar script,
and even edit and add comments by adding extra lines or deleting lines.

Was just a quick hack....

OTOH I have \'festival\' speech synthesizer on the PC for 20 years or so, not that bad either.

Festival is a prime example of that bloat.
 
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
<blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all that.
All extra it needs in an audio amp and speaker.
The rest you can do via ssh, I was just making an animated christmas tree for my laser video projector
on it via ssh, that old pi1 has analog video out that then goes to the analog video in of the i-connect picop
laser projector, works,
Of course it does all that while running my xgpspc server naviagation program and a GPS processing and....
panteltje20: ~ # ssh -Y 192.168.178.73
root@192.168.178.73\'s password:
Linux raspi73 3.6.11+ #371 PREEMPT Thu Feb 7 16:31:35 GMT 2013 armv6l
Last login: Thu Dec 23 17:20:08 2021 from panteltje10
unaroot@raspi73:~# uname -a
Linux raspi73 3.6.11+ #371 PREEMPT Thu Feb 7 16:31:35 GMT 2013 armv6l GNU/Linux
root@raspi73:~#

just keeps working and working and working 24/7
root@raspi73:~# cat /dev/ttyAMA0
$GPRMC,164612.00,V,,,,,,,231221,,,N*7A
$GPVTG,,,,,,,,,N*30
$GPGGA,164612.00,,,,,0,00,99.99,,,,,,*60
$GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99*30
$GPGSV,3,1,12,02,22,113,,03,04,001,,06,24,066,,11,17,106,*7E
$GPGSV,3,2,12,12,81,067,,19,16,041,,22,04,339,,24,41,140,*71
$GPGSV,3,3,12,25,59,259,,29,20,195,,31,10,302,,32,41,281,*76
$GPGL.......

root@raspi73:~#TOP
Top - 17:49:22 up 66 days, 5:21, 11 users, load average: 1.66, 1.69, 1.86
Tasks: 88 total, 2 running, 86 sleeping, 0 stopped, 0 zombie
%Cpu(s): 61.6 us, 37.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem: 448776 total, 283472 used, 165304 free, 40276 buffers
KiB Swap: 102396 total, 0 used, 102396 free, 47700 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2625 root 20 0 128m 40m 1568 R 74.3 9.3 67539:59 xgpspc
2420 root 20 0 5904 1728 880 S 3.1 0.4 2744:00 rxvt
2423 root 20 0 5904 1732 880 S 3.1 0.4 2744:07 rxvt

Nice stuff, raspberries, IF you know how to use those.
Uptime 66 days, last record was 256 days or so, moved house,
needed to rewire some power here in the new place a few times, now it is on UPS.

Don\'t complain, write code.







[No, I don\'t care if it can also serve up web pages or log errors
to remote hosts or handle TELNET connections... I just want it to
*speak*! You\'ll get no \"credit\" for supporting those other things.]

but it does need a net connection, but mp3s are small.
[B
Here is an other one using google translate:

#/bin/bash
echo \"english text document to audio or to mp3\"
echo \"Usage: gst6_en filename.txt [1]\"
echo \"if second argument present output to mp3 file, one mp3 file per line, else to audio\"
input=$1
lines=1
while IFS= read -r line
do
echo \"line $lines\"
if [ \"$2\" == \"\" ]
then
/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\";
else
wget -O $1_$lines.mp3 \"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$line&tl=en\"
fi
let lines=lines+1
done < $1

So this will speak a whole english text file line by line or, if you call it with an extra argument,
make numbered mp3 files from a text file, one per line.
You can then play the numbered mp3 files in [any] sequence with a similar script,
and even edit and add comments by adding extra lines or deleting lines.

Was just a quick hack....

OTOH I have \'festival\' speech synthesizer on the PC for 20 years or so, not that bad either.

Festival is a prime example of that bloat.

I thought I was rather small...
:)
 
In article <sq1srd$147f$1@gioia.aioe.org>, pNaOnStPeAlMtje@yahoo.com
says...
E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

I\'m not aware of any such rule or even pattern. On the other hand if
emphasising that there is a particular other guy that you are referring
to, the \"thee\" emphasises his singularity...
 
On 12/23/2021 12:12 PM, Mike Coon wrote:
In article <sq1srd$147f$1@gioia.aioe.org>, pNaOnStPeAlMtje@yahoo.com
says...
E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

I\'m not aware of any such rule or even pattern. On the other hand if
emphasising that there is a particular other guy that you are referring
to, the \"thee\" emphasises his singularity...

<https://www.merriam-webster.com/words-at-play/how-do-you-pronounce-the-let-us-count-the-ways>
 
Jeroen Belleman wrote:
On 2021-12-23 10:39, Martin Brown wrote:
[...]
Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. [...]

English is well known for its complete disconnect between
pronunciation and spelling, but this is ridiculous.

Jeroen Belleman

English family names, and place names in England can be confusing. It\'s
not a language issue, it\'s mostly a legacy of the Norman Conquest.

For instance

Pontefract (castle) = Pumfrey
Featherstonehaugh (family) = Fanshaw

Over here it\'s mostly Americanized pronunciations by American families
descended from immigrants

e.g.

Dubois = De Boyce
Daubert = Dowburt

Not very different from \"Parris\" or \"The Hague\" or \"Pekin\" or \"Moscow\"
(the Idaho one is pronounced \"moscoe\", but neither sounds like the
Russian pronunciation.

The French call London \"Londres\". The current fashion for aping native
pronunciations of place names that have been well known for ages is
pretty silly actually.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com
 
On 12/23/2021 9:53 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all that.

Strip all of the code off of it so you are starting with *hardware*.
Then, add back what you need to make it speak.

Otherwise, you\'re comparing apples to oranges.

I can type \"Hello, World!\" on a sheet of paper. Then, put it on the scanner
glass of my Reading Machine and hear it speak those words. Does that
mean there is *no* resource usage associated with those utterances? :>

All extra it needs in an audio amp and speaker.
The rest you can do via ssh, I was just making an animated christmas tree for my laser video projector
on it via ssh, that old pi1 has analog video out that then goes to the analog video in of the i-connect picop
laser projector, works,
Of course it does all that while running my xgpspc server naviagation program and a GPS processing and....
panteltje20: ~ # ssh -Y 192.168.178.73
root@192.168.178.73\'s password:
Linux raspi73 3.6.11+ #371 PREEMPT Thu Feb 7 16:31:35 GMT 2013 armv6l
Last login: Thu Dec 23 17:20:08 2021 from panteltje10
unaroot@raspi73:~# uname -a
Linux raspi73 3.6.11+ #371 PREEMPT Thu Feb 7 16:31:35 GMT 2013 armv6l GNU/Linux
root@raspi73:~#

just keeps working and working and working 24/7
root@raspi73:~# cat /dev/ttyAMA0
$GPRMC,164612.00,V,,,,,,,231221,,,N*7A
$GPVTG,,,,,,,,,N*30
$GPGGA,164612.00,,,,,0,00,99.99,,,,,,*60
$GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99*30
$GPGSV,3,1,12,02,22,113,,03,04,001,,06,24,066,,11,17,106,*7E
$GPGSV,3,2,12,12,81,067,,19,16,041,,22,04,339,,24,41,140,*71
$GPGSV,3,3,12,25,59,259,,29,20,195,,31,10,302,,32,41,281,*76
$GPGL.......

root@raspi73:~#TOP
Top - 17:49:22 up 66 days, 5:21, 11 users, load average: 1.66, 1.69, 1.86
Tasks: 88 total, 2 running, 86 sleeping, 0 stopped, 0 zombie
%Cpu(s): 61.6 us, 37.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem: 448776 total, 283472 used, 165304 free, 40276 buffers
KiB Swap: 102396 total, 0 used, 102396 free, 47700 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2625 root 20 0 128m 40m 1568 R 74.3 9.3 67539:59 xgpspc
2420 root 20 0 5904 1728 880 S 3.1 0.4 2744:00 rxvt
2423 root 20 0 5904 1732 880 S 3.1 0.4 2744:07 rxvt

Nice stuff, raspberries, IF you know how to use those.
Uptime 66 days, last record was 256 days or so, moved house,
needed to rewire some power here in the new place a few times, now it is on UPS.

Don\'t complain, write code.
 
On a sunny day (Thu, 23 Dec 2021 12:57:23 -0700) it happened Don Y
<blockedofcourse@foo.invalid> wrote in <sq2kbf$s5s$1@dont-email.me>:

On 12/23/2021 9:53 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all that.

Strip all of the code off of it so you are starting with *hardware*.
Then, add back what you need to make it speak.

OK, let me give you some example in this, and why the choice between apples and oranges .
Let\'s say we have nothing but a PIC 18F14k22 (because I have those).

To do the internet thing you need a TCP stack and add a Microchip ethernet ENC28J60 chip
Microchip _has_ a TCP stack, but then what fun is it, I wrote one back in the 5 1/4 inch floppy days but
those files got lost, but here is project with an UDP stack I wrote in PIC asm
http://panteltje.com/panteltje/pic/ethernet_color_pic/
controls room lighting from anywhere, been working fine 24/7 since 2013
You you will need:
1 PIC18F14K22
1 ENC28J60

Now let\'s see if we can do audio out with that
Sure I have done audio with same PIC:
http://panteltje.com/panteltje/pic/audio_pic/
that used PWM somehow, but here I would use a R2R DAC on 8 PIC output pins perhaps.
The B I G question is now \"With This chip can I decode the mp3 stream from google translate?\"
Perhaps, would have to look a the source of mpg123 (open source C mp2/mp3 decoder) to see if I can do it all with 256 bytes RAM
maybe the buffer size is too small, maybe need a bigger PIC or some external memory is needed.
Never wrote a mp3 decoder so question mark here.
Config system via RS232 in EEPROM (as in projects above) for IP, gateway, MAC etc etc.
So 2 chips (or 3 if memory), some caps, power regulator, wall wart, and now you need to make a board if it is for production.
And you need to write the asm.
And test an debug it
Estimated time: some days.
Cost per hour of qualified person?
A Raspberry Pi that has all connectors plus some, costs 47$ (went up a while ago because of chip shortages I have read)
and a small SDcard.
Runs Linux, is easy to program in minutes, and proven reliable, no board layouts needed, ready in an hour or so.
And can fall back on whatever speech synth. you installed on it if no internet connection for any reason.
The advantage of using google for speech is that THEY will do there best to make the audio as good as possible
and support several languages

So, in short, the Raspberry way is faster, cheaper, better, proven reliable, available everywhere.
Now show us what YOU did.
 
On 2021-12-23 15:37, David Brown wrote:
On 23/12/2021 12:23, Jeroen Belleman wrote:
On 2021-12-23 10:39, Martin Brown wrote:
[...]
Cholmondeley (Chumlee) catch out most
non-native English speakers in fact most non-locals. [...]

English is well known for its complete disconnect between
pronunciation and spelling, but this is ridiculous.


It is not a \"complete disconnect\" - not by a long way. Despite some of
the common oddities of spelling in English, and some particularly
unusual cases, there are far worse languages. Look at verb endings in
French - many different spellings have different meanings, but are
pronounced the same. Mongolian and Gaelic have a very much bigger
separation between the phonetic values of the written spellings and the
actual pronunciation. [...]

French spelling is pretty regular, in the sense that spelling
usually unambiguously specifies the pronunciation. The reverse
is far from true though. I should know, I live there.

Jeroen Belleman
 
On 12/24/2021 1:55 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 12:57:23 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq2kbf$s5s$1@dont-email.me>:

On 12/23/2021 9:53 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all that.

Strip all of the code off of it so you are starting with *hardware*.
Then, add back what you need to make it speak.

OK, let me give you some example in this, and why the choice between apples and oranges .
Let\'s say we have nothing but a PIC 18F14k22 (because I have those).

To do the internet thing you need a TCP stack and add a Microchip ethernet ENC28J60 chip
Microchip _has_ a TCP stack, but then what fun is it, I wrote one back in the 5 1/4 inch floppy days but
those files got lost, but here is project with an UDP stack I wrote in PIC asm
http://panteltje.com/panteltje/pic/ethernet_color_pic/
controls room lighting from anywhere, been working fine 24/7 since 2013
You you will need:
1 PIC18F14K22
1 ENC28J60

Now let\'s see if we can do audio out with that
Sure I have done audio with same PIC:
http://panteltje.com/panteltje/pic/audio_pic/
that used PWM somehow, but here I would use a R2R DAC on 8 PIC output pins perhaps.
The B I G question is now \"With This chip can I decode the mp3 stream from google translate?\"
Perhaps, would have to look a the source of mpg123 (open source C mp2/mp3 decoder) to see if I can do it all with 256 bytes RAM
maybe the buffer size is too small, maybe need a bigger PIC or some external memory is needed.
Never wrote a mp3 decoder so question mark here.
Config system via RS232 in EEPROM (as in projects above) for IP, gateway, MAC etc etc.
So 2 chips (or 3 if memory), some caps, power regulator, wall wart, and now you need to make a board if it is for production.
And you need to write the asm.
And test an debug it
Estimated time: some days.
Cost per hour of qualified person?
A Raspberry Pi that has all connectors plus some, costs 47$ (went up a while ago because of chip shortages I have read)
and a small SDcard.
Runs Linux, is easy to program in minutes, and proven reliable, no board layouts needed, ready in an hour or so.
And can fall back on whatever speech synth. you installed on it if no internet connection for any reason.
The advantage of using google for speech is that THEY will do there best to make the audio as good as possible
and support several languages

So, in short, the Raspberry way is faster, cheaper, better, proven reliable, available everywhere.
Now show us what YOU did.

Put it *in* a bluetooth earpiece and have it run off the battery that\'s
in that earpiece. Make sure that earpiece is paired with a BT host that
ultimately has internet access -- to get to your google service. And,
maintain this connectivity while I walk, drive, ride a bicycle or
any other activity -- above or below ground.

You\'re solving the wrong problem with a sledgehammer.
 
On 12/24/2021 3:36 AM, Don Y wrote:
On 12/24/2021 1:55 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 12:57:23 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq2kbf$s5s$1@dont-email.me>:

On 12/23/2021 9:53 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved
more
OR LESS than the original British. I\'ve read that American English
is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet
-noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\";
}
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\"
when preceding\"

In the script the &tl=en can be changed for the language you want,
so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer
-dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC /
raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all
that.

Strip all of the code off of it so you are starting with *hardware*.
Then, add back what you need to make it speak.

OK, let me give you some example in this, and why the choice between apples
and oranges .
Let\'s say we have nothing but a PIC 18F14k22 (because I have those).

To do the internet thing you need a TCP stack and add a Microchip ethernet
ENC28J60 chip
Microchip _has_ a TCP stack, but then what fun is it, I wrote one back in the
5 1/4 inch floppy days but
those files got lost, but here is project with an UDP stack I wrote in PIC asm
http://panteltje.com/panteltje/pic/ethernet_color_pic/
controls room lighting from anywhere, been working fine 24/7 since 2013
You you will need:
1 PIC18F14K22
1 ENC28J60

Now let\'s see if we can do audio out with that
Sure I have done audio with same PIC:
http://panteltje.com/panteltje/pic/audio_pic/
that used PWM somehow, but here I would use a R2R DAC on 8 PIC output pins
perhaps.
The B I G question is now \"With This chip can I decode the mp3 stream from
google translate?\"
Perhaps, would have to look a the source of mpg123 (open source C mp2/mp3
decoder) to see if I can do it all with 256 bytes RAM
maybe the buffer size is too small, maybe need a bigger PIC or some external
memory is needed.
Never wrote a mp3 decoder so question mark here.
Config system via RS232 in EEPROM (as in projects above) for IP, gateway, MAC
etc etc.
So 2 chips (or 3 if memory), some caps, power regulator, wall wart, and now
you need to make a board if it is for production.
And you need to write the asm.
And test an debug it
Estimated time: some days.
Cost per hour of qualified person?
A Raspberry Pi that has all connectors plus some, costs 47$ (went up a while
ago because of chip shortages I have read)
and a small SDcard.
Runs Linux, is easy to program in minutes, and proven reliable, no board
layouts needed, ready in an hour or so.
And can fall back on whatever speech synth. you installed on it if no
internet connection for any reason.
The advantage of using google for speech is that THEY will do there best to
make the audio as good as possible
and support several languages

So, in short, the Raspberry way is faster, cheaper, better, proven reliable,
available everywhere.
Now show us what YOU did.

Put it *in* a bluetooth earpiece and have it run off the battery that\'s
in that earpiece. Make sure that earpiece is paired with a BT host that
ultimately has internet access -- to get to your google service. And,
maintain this connectivity while I walk, drive, ride a bicycle or
any other activity -- above or below ground.

You\'re solving the wrong problem with a sledgehammer.

Ask yourself how your device is going to TELL the user (who lacks
eyesight) that \"I\'m sorry, I can\'t contact google.com, at the moment\".

Or, how you\'re going to tell your device (or google) to use a
voice that is richer in low frequency components (larger head
size). Or, perhaps a smaller head size that is more friendly
to a young child user.

Or, ask it how much battery time is remaining (remember,
you have to be able to do all of these things while NOT
in contact with google).

Or, tell it to speak more rapidly (without altering the pitch of
the speech). Or, slowly. Or, spell that last word because you
couldn\'t quite sort out what it was saying.

Or, tell it to try contacting a different BT host if it can\'t
establish a connection with the nominal BT host. Or, if that
host can\'t get out to the internet. Or...

You haven\'t thought out the problem space to see why your
reliance on an external service is flawed.
 
On a sunny day (Fri, 24 Dec 2021 03:36:53 -0700) it happened Don Y
<blockedofcourse@foo.invalid> wrote in <sq47sd$qn2$1@dont-email.me>:

On 12/24/2021 1:55 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 12:57:23 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq2kbf$s5s$1@dont-email.me>:

On 12/23/2021 9:53 AM, Jan Panteltje wrote:
On a sunny day (Thu, 23 Dec 2021 08:43:24 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq25f9$br9$1@dont-email.me>:

On 12/23/2021 6:16 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 17:21:06 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <sq0fdu$gjc$1@dont-email.me>:

On 12/22/2021 10:15 AM, Jan Panteltje wrote:
On a sunny day (Wed, 22 Dec 2021 09:30:43 -0700) it happened Don Y
blockedofcourse@foo.invalid> wrote in <spvjrv$tj1$1@dont-email.me>:

It would be a tough call to determine if American English had evolved more
OR LESS than the original British. I\'ve read that American English is, in
many ways, truer to its British roots than modern British English.

Pronunciations also evolve, over time. As well as speech patterns.

E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding
a word beginning with a vowel sound: \"Thee English\", \"Thee other guy\"
but with a schwa ahead of a consonant: \"The next one\", \"the Frenchman\".
This seems to no longer be the norm.

[You\'re interested in these sorts of things when you design a
speech synthesizer; the different \"wh\" sounds, etc.]

A pretty decent text to speech is google translate.

This script, called gst2_en on my system, has a female talk in english:

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols
\"http://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&q=$*&tl=en\"; }
say $*



You call it like this (with your text as example):
gst2_en \">E.g., I was taught \"the\" should be pronounced as \"thee\" when preceding\"

In the script the &tl=en can be changed for the language you want, so &tl=nl for Dutch and &tl=de for German.

If you want the output to go to a mp3 file then use mplayer -dumpstream in that script.

I find the quality better than other things I have tried.

All Linux of course

There are lots of synthesizers out there -- FOSS as well as commercial.
But, those that run on a PC tend to be bloated implementations -- large
dictionaries, unit databases, etc. And, require a fair bit of CPU
to deliver speech in real-time. If you\'re trying to run in a small
footprint consuming very little \"energy\" (think tiny battery), there
really isn\'t much choice -- esp if you want to be able to tweek the voice
to suit the listeners\' preferences (with unconstrained vocabulary)

Sure
But the advantge of this script is that it uses NO resources on the PC / raspi or whatever

Of course it uses resources! You need a network stack, the memory to
handle the packets delivered across that connection, the memory to support
the shell, the filesystem from which to load the script and other binaries,
the kernel, etc.

Sure

You just assume they cost nothing because they are already present
in your implementation. Take a *bare* rPi and see how much you have to
add to it to make it speak. *That* is the resource requirement.

Not sure what you mean by a \'bare rPi\', but even my old raspi one has all that.

Strip all of the code off of it so you are starting with *hardware*.
Then, add back what you need to make it speak.

OK, let me give you some example in this, and why the choice between apples and oranges .
Let\'s say we have nothing but a PIC 18F14k22 (because I have those).

To do the internet thing you need a TCP stack and add a Microchip ethernet ENC28J60 chip
Microchip _has_ a TCP stack, but then what fun is it, I wrote one back in the 5 1/4 inch floppy days but
those files got lost, but here is project with an UDP stack I wrote in PIC asm
http://panteltje.com/panteltje/pic/ethernet_color_pic/
controls room lighting from anywhere, been working fine 24/7 since 2013
You you will need:
1 PIC18F14K22
1 ENC28J60

Now let\'s see if we can do audio out with that
Sure I have done audio with same PIC:
http://panteltje.com/panteltje/pic/audio_pic/
that used PWM somehow, but here I would use a R2R DAC on 8 PIC output pins perhaps.
The B I G question is now \"With This chip can I decode the mp3 stream from google translate?\"
Perhaps, would have to look a the source of mpg123 (open source C mp2/mp3 decoder) to see if I can do it all with 256 bytes
RAM
maybe the buffer size is too small, maybe need a bigger PIC or some external memory is needed.
Never wrote a mp3 decoder so question mark here.
Config system via RS232 in EEPROM (as in projects above) for IP, gateway, MAC etc etc.
So 2 chips (or 3 if memory), some caps, power regulator, wall wart, and now you need to make a board if it is for production.
And you need to write the asm.
And test an debug it
Estimated time: some days.
Cost per hour of qualified person?
A Raspberry Pi that has all connectors plus some, costs 47$ (went up a while ago because of chip shortages I have read)
and a small SDcard.
Runs Linux, is easy to program in minutes, and proven reliable, no board layouts needed, ready in an hour or so.
And can fall back on whatever speech synth. you installed on it if no internet connection for any reason.
The advantage of using google for speech is that THEY will do there best to make the audio as good as possible
and support several languages

So, in short, the Raspberry way is faster, cheaper, better, proven reliable, available everywhere.
Now show us what YOU did.

Put it *in* a bluetooth earpiece and have it run off the battery that\'s
in that earpiece. Make sure that earpiece is paired with a BT host that
ultimately has internet access -- to get to your google service. And,
maintain this connectivity while I walk, drive, ride a bicycle or
any other activity -- above or below ground.

You\'re solving the wrong problem with a sledgehammer.

You should have spcified that right away.
So again, PIC, bluetooth chip, asm nothing new.
Some people here can even design it all in one chip.
But we are talking text to speech no (or did you change requirenent again)?
WTF would you get the text from?
Much simpler to use a normal bluetooth earpiece and a Raspberry Pi talking to it from a fixed place..
https://pimylifeup.com/raspberry-pi-bluetooth/
For other platforms / system bluetooth USB adaptors plenty, I have some for the PC, also bluetooth earpieces.
Like I said,the raspi can fall back on any otehr sinth. if no internet connections
AGAIN where is your text coming from?
You did not show any design or code
Just babbling?
 

Welcome to EDABoard.com

Sponsor

Back
Top