|
What Is It?
VirSyn CANTOR is a vocal synthesis instrument for Macintosh OS X and
Windows XP; it can operate in stand-alone mode or as a plug-in (VST 2.0
and Audio Unit formats, with RTAS on the way). It is eight-voice polyphonic/multitimbral,
with a built-in effects section. The voice parts are created and edited
with a piano-roll style score editor and text can be entered either in
plain English or in a type of phonetic notation. In addition to a nice
assortment of factory Voice presets, the user can create and save a custom
library. There are also 16 User phoneme sets that can be edited to supplement
the Factory set.
CANTOR is described by VirSyn as "The Vocal Machine" and I
think this is apt. As I hope to make clear, synthesis of realistic human
voices is an extremely difficult task and it is perhaps unrealistic to
expect a real-time computer program to sound like a human singer. Better
to treat a musical voice synthesizer similarly to a vocoder and to orient
one's expectations toward the creation of intriguing, disturbing, or
amusing voice-like sounds. I think software designer Harry Gohs and his
VirSyn team have accomplished this quite well and I hope that CANTOR
is well enough received that they will be encouraged to take it to the
next level.
Basics Of Vocal Synthesis
Voice synthesis has been a topic of research since the 19th century,
when Alexander Graham Bell and Hermann Helmholz used physical resonators
to simulate vowel sounds, but the production of intelligible "running
speech" required performable electronic instruments. In 1939 Homer
Dudley, a researcher at Bell Telephone Laboratories, unveiled the "voder" (a
near-cousin to Dudley's vocoder), which could generate speech in the
hands of a trained operator. The control system consisted of a few keys
and pedals, while the inner workings were based on what has become known
as the "source-filter" model of sound synthesis. The source
comes from a pair of signal generators representing the pitched vibrations
produced by the vocal folds (or cords) of the larynx and the unpitched
fricative noise produced in the mouth and throat. These signals are then
passed through a set of electronic filters that represent the resonances
(called "formants") of the vocal tract.
The timbre of most musical instruments and voices is shaped by the formants
produced by resonant chambers such as the body of a guitar or the mouth
and nasal cavities of a human or other animal. The principal difference
is that living creatures can quickly change these resonances by altering
the size and shape of their vocal cavities (this is evident if you raise
and lower your jaw while sustaining a tone to produce "ee-aa-ee-aa" or
if you purse and relax your lips to produce "oo-ee-oo-ee").
By adding to this the noise component of fricative ("sss," "fff," etc.)
or stop consonants ("t," "p," etc.) you have a rich
and varied assortment of sound elements, called "phonemes" (there
are more than 40 of these in English).
You can think of phonemes as the building blocks of spoken language,
though the "blocks" paradigm begins to seem simplistic once
you listen carefully to the way phonemes morph rapidly from one to the
next in actual speech. This fluid flow of natural speech (and song) is
in fact the key challenge in voice synthesis. Our ears are simply too
well attuned to real voices for the artificial variety to be at all convincing
without an immense amount of tweaking. A simple succession of static
phonemes must be given a more organic shape as it evolves in time --
phonemes can't simply succeed each other like beads on a string. The
trick is
at least threefold: First, the fundamental pitch of the voice must glide
naturally from pitch to pitch (this is called "prosody"). Second,
the formant frequencies must also glide from one to the next (and how
this occurs depends on the specific order of phonemes -- "oo-ee-aa" has
a different set of transitions from "aa-ee-oo"). Third, the
quality of noise in consonant sounds is also affected by the formants
of their accompanying vowels.
Given this subtlety and complexity it should be clear that the best
way to simulate a real voice is to analyze extended phrases and derive
a set of continuous pitch, noise, and formant parameter functions that
can then be used to control the voice synthesizer (this is how a vocoder
works). But this goes against the need for absolute flexibility in a
voice synthesizer; you want to be able to directly enter pitches and
accompanying text and have the software figure out "the fiddly stuff." This
is generally done by compiling a "pronouncing dictionary" of
words from which the synthesizer can convert normal English text into
a phonetic equivalent. CANTOR uses the Carnegie Mellon University Pronouncing
Dictionary, which contains over 125,000 words and uses 39 phonemes
plus their "stress factor" variations (for accented syllables).
CANTOR Voice Architecture
CANTOR's voices are constructed according to the classic "source-filter" model
described above. The pitched signal source is provided by an additive
(Fourier) waveform generator with 256 programmable partials (similar
to that found in VirSyn CUBE). The noise source features a transfer function
(filter curve) that can be hand-drawn to produce different "colors" of
noise. These source signals then pass through a formant filter with six
formants. Each phoneme uses two formant filter curves. In many cases,
such as a steady-state vowel, both curves are the same, but in the case
of a consonant (such as "ch") or a diphthong (such as "ay" in "hide")
they are different because these phonemes evolve over their duration.
CANTOR is eight-voice multiphonic/multitimbral. Each vocal line can
be created and edited in its own Score page and the phonetic qualities
of each can be modified in a Voice page (voiced and noise sources) and
Phoneme page (two formant filter curves). There is an FX page with Distortion,
Echo/Delay, Chorus, and (global) Reverb effects and there is a Mix page
for combining the eight voices in a stereo mix and also for defining
MIDI channels, keyboard splits, and voice presets.
CANTOR has two operating modes: manual and automatic. In manual mode
the pitch of the sung note is controlled by incoming MIDI notes and the
syllable being articulated is set by the current step in the score editor
sequence (the note sequence is treated like an ordered list of syllables).
In automatic mode the sequence playback is controlled by CANTOR's transport
(standalone version) or by the host application (plug-in version).
User Interface
CANTOR's user interface is organized as a collection of tabbed pages
in a single window. Tabs for Score, Voice, Phoneme, FX, and Mix editors
are used to select particular elements of the control and voice editing
hierarchy, while tabs for Part 1 through Part 8 select the specific voice
part to be edited. Because the Mix window is common to all eight Parts,
there are 33 windows total (4 x 8 + 1).
Individual parameters are edited either with a graphic knob (which displays
its current numerical value while active) or with a number box. Most
individual parameters (except those in the Mix page) can also be controlled
via MIDI Control Change messages and can be assigned through a MIDI Learn
function (just Control-click on the knob or number box to activate or
defeat this).
The spectra of the voiced and noise signals can be edited graphically,
as can the formant curves (note that the Factory formant set cannot be
edited but there are 16 User sets that can be). Control-clicking in any
of these windows pops up a menu of editing commands and other shortcuts.
Score Editor
This is a version of the familiar piano-roll editing window found in
most sequencers. There are eight Score editing windows, one for each
Part, and they are selected by clicking tabs along the right side of
the window. As you'd expect, each voice part is monophonic, but a visual
reference to all the hidden parts is made available as a set of faint "ghost" images
in the current window. The left margin of the note window holds a graphic
keyboard for reference, and the bottom pane of the window holds a strip
chart with a choice of parameters to be displayed and edited: Velocity,
Gender, Breath, Balance, Pitch, Level, Brightness, Pan, Vibrato Rate,
and Vibrato Depth.
While most of the other editing windows are straightforward, the main
control strip and the Score editor take a little getting used to. As
a long-time Macintosh user I expect certain interface guidelines to be
adhered to, so that I really don't need to think too much about how things
are supposed to work. Not so with CANTOR, which gives the appearance
of having been ported from Windows (or perhaps DOS). For instance, loading
a file should be accessed with Command-O or Alt-O and Save should be
Command-S or Alt-S (or for the memory-challenged by selecting from the
File menu). Not in CANTOR -- you need to click on a little disk icon
in the control strip to pop up a menu with these and other options. Similarly,
there are no key equivalents for Transport functions (not even Space
Bar for Play and Pause) -- you have to click on the graphic transport.
This defiance of standard conventions is continued in the score editing
window
itself. Rather than using familiar commands (Cmd-C for "Copy," Cmd-V
for "Paste," Cmd-Z for "Undo," and even "Delete" to
delete a selected note) CANTOR requires that you click on toolbar icons
to switch between these modes. This certainly slows down the editing
process and leads to a lot of mistakes. I won't belabor this point further,
but do want to put "normalization" of the interface high on
my wish list for a future update!
Once you've familiarized yourself with CANTOR's drawing tools, you can
create and edit melodic passages in its "matrix editor" window.
This step should be familiar to anyone who has used a piano-roll editor.
The next step is to add the text for CANTOR to sing. New notes are created
with a default text of "La," but this can be replaced by typing
in your own text, typically with one syllable per note. Your original
text will be displayed above the notes and the phonetic translation will
appear below. In one of the example files "my name is Cantor" this
translation appears as "M AY1 - N EY1 M - IH1 Z - K AE1 N - TER" (this
is the phonetic notation used by the Carnegie Mellon dictionary). The
numerical values are used to indicate vowel stress (none, 1, 2, and 3
appear to be valid, though differences are subtle). It is permissible,
within limits, to edit the phonetic translation directly, but if you
try to enter something that is not in the CMU lexicon your text will
be replaced with a blank.
Voice Parameters
The timbral quality of each of CANTOR's eight voices can be modified
individually using graphic knobs at the left of the Score editor, and
the altered version can be saved as a user preset. Available parameters
include Semitone transposition and Fine tuning, Legato, Bright, Ensemble,
Metallic, Humanize, Balance, Vibrato rate, Vibrato depth, MWheel, Gender,
Breath, Glide, Pan, and Volume. These settings will affect the vocal
timbre in its entirety (time-varying equivalents of many of them are
available in the Automation editor). Comments on some of these parameters
may be helpful: "Metallic" stretches the spectrum to create
an inharmonic, metallic quality. "Humanize" introduces some
randomization of pitch, level, vibrato rate and depth. "Gender" shifts
the formant frequencies to simulate the differing voice tract sizes
of men, women, and children. "Breath" adds noise to the voice.
To access the more fundamental elements of CANTOR's synthesis architecture
you must tab to the Voice window (pitched and noise sources) or the Phoneme
page (formant curves). The voiced signal displays the spectrum produced
by a set of simulated vocal cords and allows you to edit the amplitudes
of the individual partials (or groups of higher-order partials). In a
real voice the vocal tone is produced by a series of "puffs" of
air being released by the vocal cords, somewhat like the double reed
of an oboe or bassoon. This results in a buzzy sound, a pulse wave. CANTOR
allows you to choose a different starting point, such as a sawtooth,
and then to modify the spectrum at will. Similarly the noise source can
be shaped by drawing a custom noise transfer function.
The phoneme page allows you to view any of the 39 Factory phonemes and
to edit copies of them in any of 16 User sets. This can be useful in
fine-tuning vocal sounds for a more natural quality (assuming you have
developed the phonetic "chops") but for most of us techno-geeks
it is a way to do some serious quasi-vocal damage. I personally have
a long history in vocal processing and extended vocal techniques, so
this is where my interest lies. Indeed, when working with CANTOR I believe
the biggest return on your time investment will be in exploring the weird
stuff rather than in trying to replace human backup singers.
Automation Editor
The strip chart along the bottom of the Score editor window gives access
to a variety of time-varying parameters. These are divided into two
classes: Single-value parameters that affect an entire note over its
duration (Velocity, Gender, Breath, Balance) and parameters that change
over the course of a note according to an envelope function (Pitch,
Level, Brightness, Pan, Vibrato rate, Vibrato depth). This is where
a great deal of the "naturalness" of a singing voice can
be approximated. Sad to say, there are no simple rules governing how
this should be accomplished. You will have to listen closely to real
voices, think hard about what you hear, and then tweak CANTOR's parameters
until you like the result. Sorry about that, but that's the way the
Original Gangsters at Bell Labs and IRCAM did it (Joseph Olive,
Charles Dodge, Xavier Rodet, and Yves Potard to name a few) and there
really is no substitute for a musical ear and hard work!
The envelope functions are a bit unusual in that they are divided into
two sections, an Attack and a Release envelope. Each of these can have
an unlimited number of segments and the slope of each segment can be
shaped with either positive or negative curvatures.
Effects
Each of the eight voices has its own set of three effects -- Overdrive
(hard, soft, tape, tube), Delay/Echo (two delays up to 728 msec, or with
MIDI
sync as 1/32 triplet up to 1/4 note), Modulation effect (chorus, flanger,
phaser) -- plus a global Reverb with 24 presets. These effects are
fairly basic with minimal parameter control, but they sound good.
Mixer
The Mixer page is where CANTOR's eight voices come together for MIDI
control and final audio mix. Each voice can be assigned its own MIDI
channel (or several can have the same channel for chorus effects) and
each can have its own key limits for keyboard splits. This is also
where you can select a different preset for each voice.
Sound
I'm going far out on a limb to say that CANTOR sounds "interesting". This
is in part my way of being nice and sidestepping the issue of it not
producing a very realistic singing voice simulation, but it is also a
very heartfelt endorsement of CANTOR as a weird vocalic sound producer
par excellence. You can create some serious quirkiness by venturing outside
the bounds of normal parameter settings. Remember, all those real-time
controls and tweaky parameters are there for reasons beyond trying to
simulate the George O'Hara-Smith Singers.
CANTOR is at its best for creating robotic techno voices. The demo project "Wind
Sind Die Roboter" demonstrates this perfectly with its cold and
mechanical Kraftwerk quality and sing-song melody. I found this a suitable
test bed for auditioning the assortment of factory Voice presets, which
display a remarkable range of twisted personalities. Among my favorites
are "Crunch" (which derives its roughness from a 36Hz vibrato
rate), "Deep Throat" (which is pitched 12 semitones down and
has a 46Hz vibrato), "MetalJoe (which has a ring modulator quality
due to a high Metallic parameter setting), "Spectral II" (which
is reminiscent of the Ernie Kovacs character Percy Prunetonsils, courtesy
of a customized source spectrum with a big dip in its fourth octave),
and "SynthChild" (which sounds like a harmonica, being pitched
two octaves up and having a high Gender setting).
Installation, Documentation, and Support
Installation of CANTOR is simple: Insert the CD, double-click on the
installer icon, and follow the instructions. The first time you launch
the application you'll be asked to enter your registration code; after
that you'll be good to go.
The documentation is clearly written and well organized; it comes in
both printed and PDF versions. The manual contains just enough technical
background on phonetics and voice synthesis to orient the user, but I'd
have liked to see some tutorial material as well. The included demo files
serve a tutorial function to some extent, but regrettably those demos
that try to sound "natural" just don't make it (the "wacky" stuff
is much more effective). Some of the demos include parameter tweaks to
display strategies for improving on the default quality, but I think
this could be improved and extended. Also instructive are the many Voice
presets, which range in character from relatively natural to extremely "alienated."
Because VirSyn is in Germany, e-mail and their Web site are the only
support options. This has never been an issue for me; I've always received
prompt assistance via e-mail, and the company is good about alerting
registered users when updates become available.
Summary
VirSyn CANTOR is a software voice synthesizer based on the classic "source-fliter" model
whereby an oscillator and a noise source are resonated by a complex time-varying
formant filter bank. Control parameters are derived from a large phonetic
dictionary which permits the user to enter song lyrics in plain English
-- the translation into phonetic control parameters is done automatically.
A
large number of voice synthesis parameters are available for programming
and real-time control. The sonic result is more robotic or mechanical
sounding than natural; it would be unrealistic to expect this instrument
to replace human singers. However as an electronic instrument with vocalic
qualities and passable verbal intelligibility CANTOR is quite serviceable
and a good bit of fun.
Check Out Cantor Here.
Have more questions about this product? Please
write us here.
Or to be kept informed of all the latest news, reviews, articles, and more, click here to subscribe to the audioMIDI.com newsletter.
|