audioMIDI.com
audioMIDI.com  
Search
   Your WishlistLogoutAdvanced Search  Advanced Search   
Nice2Know aM-U Podcasts Seminars Videos Reviews About Us Home Your Account Your Cart
  Sales: (866)-283-4601
  (818) 993-0772
  Online Contact Form
  
Expand List
Specials
B-Stocks and Blowouts
Academic Sales
Accessories
Audio Interfaces
Computers
DJ Gear
Guitar Gear
Keyboards
MIDI Gear
Microphones
Outboard Gear
Plug-ins
Podcasting
Recording
Software
Sounds
Studio Furnishings
Training and Tutorials
Virtual Instruments
Shop By Brand
Show All Brands
Ableton
Alesis
Antares
Apogee
Apple
Behringer
Big Fish
Cakewalk
Cycling '74
Digidesign
EastWest
IK Multimedia
Korg
Line 6
MOTU
M-Audio
Native Instruments
Propellerhead
RME
Roland
Spectrasonics
Steinberg
Synthogy
Waves
Yamaha
audioMIDI.com Review    FREE Ground Shipping*
by Richard Zvonar|November 5th, 2004
Add to Cart
More Information Subscribe to eNews

audioMIDI.com Price:
$0.00
Review at a Glance
What is it? Voice synthesizer software for Mac and PC, with stand-alone and plug-in versions (VST and Audio Units).
What does it do? Synthesizes humanoid singing voice using the classic 'source-filter' model.
Who would use it? Composers of electronic and electroacoustic music, sci-fi and game sound designers, anyone who likes weird vocal effects.
How does it sound? Robotic, but in a nice way.
What is so great about it? Lots of tweakable parameters. Has both sequenced and keyboard performance modes, with real-time parameter control.
What is not so great about it? Difficult to create natural-sounding voice quality. Non-standard user interface.
Review Summary? CANTOR is a synthesizer that can produce voice-like sounds with a great deal of timbral variation and real-time control. It can serve as a plug-in or as a standalone instrument with live performance capabilities. Its sound quality is similar to a vocoder, but rather than deriving its parameter control from analysis of a live voice it uses a large phonetic dictionary to convert plain English text into phoneme control parameters. It offers good sound quality but is better for robotic singing voices than for a natural-sounding simulation of normal backup singers

What Is It?

VirSyn CANTOR is a vocal synthesis instrument for Macintosh OS X and Windows XP; it can operate in stand-alone mode or as a plug-in (VST 2.0 and Audio Unit formats, with RTAS on the way). It is eight-voice polyphonic/multitimbral, with a built-in effects section. The voice parts are created and edited with a piano-roll style score editor and text can be entered either in plain English or in a type of phonetic notation. In addition to a nice assortment of factory Voice presets, the user can create and save a custom library. There are also 16 User phoneme sets that can be edited to supplement the Factory set.

CANTOR is described by VirSyn as "The Vocal Machine" and I think this is apt. As I hope to make clear, synthesis of realistic human voices is an extremely difficult task and it is perhaps unrealistic to expect a real-time computer program to sound like a human singer. Better to treat a musical voice synthesizer similarly to a vocoder and to orient one's expectations toward the creation of intriguing, disturbing, or amusing voice-like sounds. I think software designer Harry Gohs and his VirSyn team have accomplished this quite well and I hope that CANTOR is well enough received that they will be encouraged to take it to the next level.


Basics Of Vocal Synthesis

Voice synthesis has been a topic of research since the 19th century, when Alexander Graham Bell and Hermann Helmholz used physical resonators to simulate vowel sounds, but the production of intelligible "running speech" required performable electronic instruments. In 1939 Homer Dudley, a researcher at Bell Telephone Laboratories, unveiled the "voder" (a near-cousin to Dudley's vocoder), which could generate speech in the hands of a trained operator. The control system consisted of a few keys and pedals, while the inner workings were based on what has become known as the "source-filter" model of sound synthesis. The source comes from a pair of signal generators representing the pitched vibrations produced by the vocal folds (or cords) of the larynx and the unpitched fricative noise produced in the mouth and throat. These signals are then passed through a set of electronic filters that represent the resonances (called "formants") of the vocal tract.

The timbre of most musical instruments and voices is shaped by the formants produced by resonant chambers such as the body of a guitar or the mouth and nasal cavities of a human or other animal. The principal difference is that living creatures can quickly change these resonances by altering the size and shape of their vocal cavities (this is evident if you raise and lower your jaw while sustaining a tone to produce "ee-aa-ee-aa" or if you purse and relax your lips to produce "oo-ee-oo-ee"). By adding to this the noise component of fricative ("sss," "fff," etc.) or stop consonants ("t," "p," etc.) you have a rich and varied assortment of sound elements, called "phonemes" (there are more than 40 of these in English).

You can think of phonemes as the building blocks of spoken language, though the "blocks" paradigm begins to seem simplistic once you listen carefully to the way phonemes morph rapidly from one to the next in actual speech. This fluid flow of natural speech (and song) is in fact the key challenge in voice synthesis. Our ears are simply too well attuned to real voices for the artificial variety to be at all convincing without an immense amount of tweaking. A simple succession of static phonemes must be given a more organic shape as it evolves in time -- phonemes can't simply succeed each other like beads on a string. The trick is at least threefold: First, the fundamental pitch of the voice must glide naturally from pitch to pitch (this is called "prosody"). Second, the formant frequencies must also glide from one to the next (and how this occurs depends on the specific order of phonemes -- "oo-ee-aa" has a different set of transitions from "aa-ee-oo"). Third, the quality of noise in consonant sounds is also affected by the formants of their accompanying vowels.

Given this subtlety and complexity it should be clear that the best way to simulate a real voice is to analyze extended phrases and derive a set of continuous pitch, noise, and formant parameter functions that can then be used to control the voice synthesizer (this is how a vocoder works). But this goes against the need for absolute flexibility in a voice synthesizer; you want to be able to directly enter pitches and accompanying text and have the software figure out "the fiddly stuff." This is generally done by compiling a "pronouncing dictionary" of words from which the synthesizer can convert normal English text into a phonetic equivalent. CANTOR uses the Carnegie Mellon University Pronouncing Dictionary, which contains over 125,000 words and uses 39 phonemes plus their "stress factor" variations (for accented syllables).

CANTOR Voice Architecture

CANTOR's voices are constructed according to the classic "source-filter" model described above. The pitched signal source is provided by an additive (Fourier) waveform generator with 256 programmable partials (similar to that found in VirSyn CUBE). The noise source features a transfer function (filter curve) that can be hand-drawn to produce different "colors" of noise. These source signals then pass through a formant filter with six formants. Each phoneme uses two formant filter curves. In many cases, such as a steady-state vowel, both curves are the same, but in the case of a consonant (such as "ch") or a diphthong (such as "ay" in "hide") they are different because these phonemes evolve over their duration.

CANTOR is eight-voice multiphonic/multitimbral. Each vocal line can be created and edited in its own Score page and the phonetic qualities of each can be modified in a Voice page (voiced and noise sources) and Phoneme page (two formant filter curves). There is an FX page with Distortion, Echo/Delay, Chorus, and (global) Reverb effects and there is a Mix page for combining the eight voices in a stereo mix and also for defining MIDI channels, keyboard splits, and voice presets.

CANTOR has two operating modes: manual and automatic. In manual mode the pitch of the sung note is controlled by incoming MIDI notes and the syllable being articulated is set by the current step in the score editor sequence (the note sequence is treated like an ordered list of syllables). In automatic mode the sequence playback is controlled by CANTOR's transport (standalone version) or by the host application (plug-in version).


User Interface

CANTOR's user interface is organized as a collection of tabbed pages in a single window. Tabs for Score, Voice, Phoneme, FX, and Mix editors are used to select particular elements of the control and voice editing hierarchy, while tabs for Part 1 through Part 8 select the specific voice part to be edited. Because the Mix window is common to all eight Parts, there are 33 windows total (4 x 8 + 1).

Individual parameters are edited either with a graphic knob (which displays its current numerical value while active) or with a number box. Most individual parameters (except those in the Mix page) can also be controlled via MIDI Control Change messages and can be assigned through a MIDI Learn function (just Control-click on the knob or number box to activate or defeat this).

The spectra of the voiced and noise signals can be edited graphically, as can the formant curves (note that the Factory formant set cannot be edited but there are 16 User sets that can be). Control-clicking in any of these windows pops up a menu of editing commands and other shortcuts.

Score Editor

This is a version of the familiar piano-roll editing window found in most sequencers. There are eight Score editing windows, one for each Part, and they are selected by clicking tabs along the right side of the window. As you'd expect, each voice part is monophonic, but a visual reference to all the hidden parts is made available as a set of faint "ghost" images in the current window. The left margin of the note window holds a graphic keyboard for reference, and the bottom pane of the window holds a strip chart with a choice of parameters to be displayed and edited: Velocity, Gender, Breath, Balance, Pitch, Level, Brightness, Pan, Vibrato Rate, and Vibrato Depth.

While most of the other editing windows are straightforward, the main control strip and the Score editor take a little getting used to. As a long-time Macintosh user I expect certain interface guidelines to be adhered to, so that I really don't need to think too much about how things are supposed to work. Not so with CANTOR, which gives the appearance of having been ported from Windows (or perhaps DOS). For instance, loading a file should be accessed with Command-O or Alt-O and Save should be Command-S or Alt-S (or for the memory-challenged by selecting from the File menu). Not in CANTOR -- you need to click on a little disk icon in the control strip to pop up a menu with these and other options. Similarly, there are no key equivalents for Transport functions (not even Space Bar for Play and Pause) -- you have to click on the graphic transport. This defiance of standard conventions is continued in the score editing window itself. Rather than using familiar commands (Cmd-C for "Copy," Cmd-V for "Paste," Cmd-Z for "Undo," and even "Delete" to delete a selected note) CANTOR requires that you click on toolbar icons to switch between these modes. This certainly slows down the editing process and leads to a lot of mistakes. I won't belabor this point further, but do want to put "normalization" of the interface high on my wish list for a future update!

Once you've familiarized yourself with CANTOR's drawing tools, you can create and edit melodic passages in its "matrix editor" window. This step should be familiar to anyone who has used a piano-roll editor. The next step is to add the text for CANTOR to sing. New notes are created with a default text of "La," but this can be replaced by typing in your own text, typically with one syllable per note. Your original text will be displayed above the notes and the phonetic translation will appear below. In one of the example files "my name is Cantor" this translation appears as "M AY1 - N EY1 M - IH1 Z - K AE1 N - TER" (this is the phonetic notation used by the Carnegie Mellon dictionary). The numerical values are used to indicate vowel stress (none, 1, 2, and 3 appear to be valid, though differences are subtle). It is permissible, within limits, to edit the phonetic translation directly, but if you try to enter something that is not in the CMU lexicon your text will be replaced with a blank.

Voice Parameters

The timbral quality of each of CANTOR's eight voices can be modified individually using graphic knobs at the left of the Score editor, and the altered version can be saved as a user preset. Available parameters include Semitone transposition and Fine tuning, Legato, Bright, Ensemble, Metallic, Humanize, Balance, Vibrato rate, Vibrato depth, MWheel, Gender, Breath, Glide, Pan, and Volume. These settings will affect the vocal timbre in its entirety (time-varying equivalents of many of them are available in the Automation editor). Comments on some of these parameters may be helpful: "Metallic" stretches the spectrum to create an inharmonic, metallic quality. "Humanize" introduces some randomization of pitch, level, vibrato rate and depth. "Gender" shifts the formant frequencies to simulate the differing voice tract sizes of men, women, and children. "Breath" adds noise to the voice.

To access the more fundamental elements of CANTOR's synthesis architecture you must tab to the Voice window (pitched and noise sources) or the Phoneme page (formant curves). The voiced signal displays the spectrum produced by a set of simulated vocal cords and allows you to edit the amplitudes of the individual partials (or groups of higher-order partials). In a real voice the vocal tone is produced by a series of "puffs" of air being released by the vocal cords, somewhat like the double reed of an oboe or bassoon. This results in a buzzy sound, a pulse wave. CANTOR allows you to choose a different starting point, such as a sawtooth, and then to modify the spectrum at will. Similarly the noise source can be shaped by drawing a custom noise transfer function.

The phoneme page allows you to view any of the 39 Factory phonemes and to edit copies of them in any of 16 User sets. This can be useful in fine-tuning vocal sounds for a more natural quality (assuming you have developed the phonetic "chops") but for most of us techno-geeks it is a way to do some serious quasi-vocal damage. I personally have a long history in vocal processing and extended vocal techniques, so this is where my interest lies. Indeed, when working with CANTOR I believe the biggest return on your time investment will be in exploring the weird stuff rather than in trying to replace human backup singers.

Automation Editor

The strip chart along the bottom of the Score editor window gives access to a variety of time-varying parameters. These are divided into two classes: Single-value parameters that affect an entire note over its duration (Velocity, Gender, Breath, Balance) and parameters that change over the course of a note according to an envelope function (Pitch, Level, Brightness, Pan, Vibrato rate, Vibrato depth). This is where a great deal of the "naturalness" of a singing voice can be approximated. Sad to say, there are no simple rules governing how this should be accomplished. You will have to listen closely to real voices, think hard about what you hear, and then tweak CANTOR's parameters until you like the result. Sorry about that, but that's the way the Original Gangsters at Bell Labs and IRCAM did it (Joseph Olive, Charles Dodge, Xavier Rodet, and Yves Potard to name a few) and there really is no substitute for a musical ear and hard work!

The envelope functions are a bit unusual in that they are divided into two sections, an Attack and a Release envelope. Each of these can have an unlimited number of segments and the slope of each segment can be shaped with either positive or negative curvatures.

Effects

Each of the eight voices has its own set of three effects -- Overdrive (hard, soft, tape, tube), Delay/Echo (two delays up to 728 msec, or with MIDI sync as 1/32 triplet up to 1/4 note), Modulation effect (chorus, flanger, phaser) -- plus a global Reverb with 24 presets. These effects are fairly basic with minimal parameter control, but they sound good.

Mixer

The Mixer page is where CANTOR's eight voices come together for MIDI control and final audio mix. Each voice can be assigned its own MIDI channel (or several can have the same channel for chorus effects) and each can have its own key limits for keyboard splits. This is also where you can select a different preset for each voice.

Sound

I'm going far out on a limb to say that CANTOR sounds "interesting". This is in part my way of being nice and sidestepping the issue of it not producing a very realistic singing voice simulation, but it is also a very heartfelt endorsement of CANTOR as a weird vocalic sound producer par excellence. You can create some serious quirkiness by venturing outside the bounds of normal parameter settings. Remember, all those real-time controls and tweaky parameters are there for reasons beyond trying to simulate the George O'Hara-Smith Singers.

CANTOR is at its best for creating robotic techno voices. The demo project "Wind Sind Die Roboter" demonstrates this perfectly with its cold and mechanical Kraftwerk quality and sing-song melody. I found this a suitable test bed for auditioning the assortment of factory Voice presets, which display a remarkable range of twisted personalities. Among my favorites are "Crunch" (which derives its roughness from a 36Hz vibrato rate), "Deep Throat" (which is pitched 12 semitones down and has a 46Hz vibrato), "MetalJoe (which has a ring modulator quality due to a high Metallic parameter setting), "Spectral II" (which is reminiscent of the Ernie Kovacs character Percy Prunetonsils, courtesy of a customized source spectrum with a big dip in its fourth octave), and "SynthChild" (which sounds like a harmonica, being pitched two octaves up and having a high Gender setting).


Installation, Documentation, and Support

Installation of CANTOR is simple: Insert the CD, double-click on the installer icon, and follow the instructions. The first time you launch the application you'll be asked to enter your registration code; after that you'll be good to go.

The documentation is clearly written and well organized; it comes in both printed and PDF versions. The manual contains just enough technical background on phonetics and voice synthesis to orient the user, but I'd have liked to see some tutorial material as well. The included demo files serve a tutorial function to some extent, but regrettably those demos that try to sound "natural" just don't make it (the "wacky" stuff is much more effective). Some of the demos include parameter tweaks to display strategies for improving on the default quality, but I think this could be improved and extended. Also instructive are the many Voice presets, which range in character from relatively natural to extremely "alienated."

Because VirSyn is in Germany, e-mail and their Web site are the only support options. This has never been an issue for me; I've always received prompt assistance via e-mail, and the company is good about alerting registered users when updates become available.

Summary

VirSyn CANTOR is a software voice synthesizer based on the classic "source-fliter" model whereby an oscillator and a noise source are resonated by a complex time-varying formant filter bank. Control parameters are derived from a large phonetic dictionary which permits the user to enter song lyrics in plain English -- the translation into phonetic control parameters is done automatically. A large number of voice synthesis parameters are available for programming and real-time control. The sonic result is more robotic or mechanical sounding than natural; it would be unrealistic to expect this instrument to replace human singers. However as an electronic instrument with vocalic qualities and passable verbal intelligibility CANTOR is quite serviceable and a good bit of fun.

Check Out Cantor Here.

Have more questions about this product? Please write us here.

Or to be kept informed of all the latest news, reviews, articles, and more, click here to subscribe to the audioMIDI.com newsletter.


© 2008 audioMIDI.com. All Rights Reserved.
Publisher does not accept liability for incorrect spelling, printing errors (including prices), incorrect manufacturer's specifications or changes, or grammatical inaccuracies in any product included in the audioMIDI.com Website.
Prices subject to change without notice.