Ambisonic Studio


BASIC AMBISONICS

Copyright D.G. Malham, University of York, England 1995

This file may be reproduced for private study or teaching purposes or placed on a Web Server, provided that its origin is acknowledged and that this notice and the above copyright notice always accompanies it.


Ambisonics is a method of recording information about a soundfield and reproducing it over some form of loudspeaker array so as to produce the illusion of hearing a true three dimensional sound image. I deliberately say illusion to stress the fact that if you truly wished to reproduce the soundfield present in a two metre sphere up to say 20 kHz then from information theory it is possible to show that you would need several hundred thousand channels and loudspeakers. All that it is practical to do is to determine how much information we can capture with some sensible combination of microphones and then to find some way of using that information to fool the ear into hearing a full soundfield.

Attempts to provide directional information in artificially reproduced sound images started in the late nineteenth century when a "broadcast" of a concert was made in France using multiple telephones spaced along the front of the stage, transmitting over wires to a similar number of telephone receivers. Quality was, of course, rather poor but an impression of direction was undoubtedly gained.

In the late 1920's and early 1930's a more formal basis for directional reproduction was laid down by Alan Blumlein in Britain and the RCA company in the States. The techniques they developed were for systems using only a small number of channels of information for reproduction over a pair of loudspeakers.

The technique developed by Alan Blumlein consisted of a pair of microphones with figure of eight characteristics, mounted as close together as possible and with the front lobe of one mic pointing 45° to the left of the front-back line and the front lobe of the other pointing 45° to the right. Although this does provide excellent stereo imagining it does have a problem. Because of the figure of eight characteristics sounds coming from the rear are also picked up and when reproduced over a pair of loudspeakers these sounds are folded over and mapped onto the front soundstage. This results in a sound which is too reveberant for many ears.

"Purist" recording engineers who like the simplicity and accuracy of the Blumlein technique have modified it in order to remove this perceived problem. By replacing the figure of eight microphones with ones with cardioid characteristics and changing the angle between them so that it just includes the desired soundstage, it is possible to use the cardioid mic's lack of response to rearward sounds to reduce the mapping of rear reveberant sounds onto the front reproduced soundstage. This results in a much more acceptable if less accurate sound image. (As a matter of practice the angle between the mics should not be more than about 120° or less than 90).

It does, however, seem a pity to throw away this information when we already have insufficient. The dummy head technique can be employed to utilise this lost information although only for headphone listening. (Work has been done and is still being done to get better results over loudspeakers for dummy head recordings but problems still remain to be solved). By using some form of analogue of the human head with microphones picking up sound where the ears should effectively be and then reproducing these signals over headphones very good results can be obtained with sounds appearing to come from all directions, not just the front. Unfortunately, the best results with the most stable images come from a dummy head which closely matches the listeners. However, the more closely the head matches that of any one listener, the worse the results may get with other listeners. Even if you try to generate some kind of average head you can come unstuck. One set of recordings I heard a few years ago, which were made using a head based on several years of painstaking measurements of all the colleagues and students of a Continental European researcher, gave absolutely stable and very precise results except for the fact that to me and all the other British people who listened to it, the front and back directions were transposed. The BBC approach where the head is just disk of perspex with microphones placed a few centimetres either side of it gives a more universally acceptable result at the expense of true precision.

Ambisonics, on the other hand, goes back to the original ideas of Alan Blumlein and builds on them. By just adding an omnidirectional microphone to the pair of figure eight units it can be shown that you can capture ALL the information that it is possible, with such simple low order microphones, to capture about the horizontal soundfield at that point. It is, of course, assumed that you have arranged to have the capsules TRULY coincident, that is all three capsules are acoustically at exactly the same place in the soundfield. This impossibility becomes even more difficult when you add an up-down oriented figure eight capsule in order to record height information as well. This problem has been overcome in the Soundfield microphone which uses four small capsules situated on the surface of a notional sphere to sample the incoming sounds. By some clever mathematics it is possible to generate the signals which would have been given by our four truly coincident capsules-at least up to some reasonably high frequency. (It should be noted that in Ambisonics the horizontal figure eight units are mounted front-back and side-to-side rather than at 45°).

Having got the information recorded in this form, the task of producing the illusion has to be accomplished. This is completely separate from the task of capturing the information in the first place and is based on an amalgam of various theories of hearing covering both low (below 700 Hz) and high frequency mechanisms. The decoder must be adjustable for different speaker layouts.

The question must be posed "How does this approach differ from the Quadraphonic systems?". Quadraphonics-or more properly Quadrifontal-systems were based on a very simple theory. If mono sound systems can be regarded as a hole in a concert hall wall and stereo systems as two holes AND are better then four holes MUST be better still. Unfortunately this is simply untrue since the extra information carried is partially redundant and causes considerable confusion and instability in the perceived images, particularly along the sides.

Extensive listening tests over many years show Ambisonic recordings to be at least as good as any other form of recording at capturing sound images and far better than most, but what is its applications in electro-acoustic music? To understand these we need to look at some basic theory on Ambisonics.


BASIC AMBISONIC TECHNOLOGY


The Ambisonic surround sound system is essentially a two part technological solution to the problems of encoding sound directions (and amplitudes) and reproducing them over practical loudspeaker systems in such a way as to fool the ears of listeners into thinking that they are hearing the original sounds correctly located. This can take place over a 360° horizontal only soundstage (pantophonic systems) or over the full sphere (periphonic systems). Systems using the so-called 'B' format signals to carry the recorded information require three and four channels respectively for full encoding of sounds to the kind of accuracy achievable with first order microphones (cardioid, figure eight etc.). Reproduction requires four or more loudspeakers depending on whether it is pantophonic or periphonic, size of area etc. Practical minimums are four for horizontal only, eight if you require height as well. The important thing to note is that there is no need to consider the actual details of the reproduction system when doing the original recording or synthesis, since if the B-Format specifications are followed and suitable loudspeaker/decoder setups are used then all will be well. In all other respects the two parts of the system, encoding and decoding, are completely separate.

ENCODING EQUATIONS

The position of a sound within a three dimensional soundfield is encoded in the four signals which make up the B format thus;

where A is the anti-clockwise angle from centre front, B is the elevation and SigIn is the input (monophonic) signal. These signals are equivalent to three figure-of-eight microphones at right angles to each other, together with an omnidirectional unit, all of which have to be effectively coincident over the frequency range of interest. If you limit the positions of sounds to within the unit sphere by ensuring that square root of

(x*x + y*y + z*z)

is always less than or equal to one then the equations can be more simply written as;

where x,y,z are the coordinates of the sound source. The value of W is given as 0.707 rather than 1 since this allows for a more even distribution of levels within the four channels. This convention should be adhered to as the decoder designs are predicated on this. There is a catch in this simplicity, however, since if you attempt to move off the surface of the notional unit sphere and in towards the centre, the dropping levels in the X, Y, Z channels will reduce the overall sound level, rather the there being the expected increase as the apparent position of the sound source moves nearer the centre.

One fix that will keep the overall level pretty well constant is to make the W signal vary thus;

W = SigIn(1 - 0.293(x*x + y*y + z*z))

Further modifications can be made to allow for an overall increase as sounds move to the centre position, which is a closer approach to the natural behaviour of sounds.

As well as just positioning one or more sounds within a soundfield, the whole soundfield can be modified as in the following example:

To rotate a complete soundfield, with any number of sources in it, around the vertical axis by an arbitrary angle A and tilt it about the y axis by an angle B simply (!) apply the following transform to the B-Format signals representing the soundfield;

Many other effects are available, such as mirror imagining, distortion, spread sound sources etc. These may all be contained within one matrix thus;

where k1 - k16 are coefficients formed by the matrices of the various different multipliers which appear in all the different modification equations which you wish to apply to the soundfield.

Other possibilities which open up once we move to fully digital systems include adding variable amounts of reverberation and appropriately proportion simulations of early reflections to give a better set of distance cues. This would remove one of the limitations of soundfields synthesised in the analog domain since in the analog domain it is almost impossible to give any indication of absolute sound source distance, only a relative position with respect to the notional unit sphere radius.


AMBISONICS AND STEREO


The B-Format signals are not, of course, in any sense stereo compatible. It is, however, possible to combine the three (X,W,Y) components required for horizontal work in such a way that not only is a good stereo compatible two channel system produced but with a suitable decoder much of the original surround sound image can be recovered. The resulting (horizontal) soundfield is not perfect but by careful design of the encoding equations it is possible to place the defects in areas such as the rear image where the ear is less susceptible.

This encoding method, which is called UHJ coding, is used to produce stereo compatible Ambisonic records, tapes and broadcasts. The X,Y and W signals are matrixed into two channels using the following transform;

This would all seem relatively easy if it were not for the 'j' in the equation. What this indicates is that that particular signal is phase shifted by 90° with respect to the 'normal' version of that signal, over the full audio band. In order to do that, each of these three signals must be passed through its own pair of wide-band phase shift (or all pass) networks. Within each pair, the output of one must be set up so that it has a phase shift that differs by 90° from the output of the other member of the pair at all audio frequencies. This will give the required effect of a 90° phase shift. Earlier encoders did this with analog circuitry but is entirely possible to write a computer program to do this and newer decoders, such as the Meridean unit, implement the required filter equations in a digital signal processor.

The two channel member of the UHJ family of codings can be supplemented with a third channel to remove the remaining anomalies for horizontal reproduction. This can be of reduced bandwidth without degrading things very far if it is necessary for operational reasons - for instance if transmitting it using subcarrier modulation on an FM transmitter. A fourth channel can be added to give height information. The decoding equations are such that a decoder for any of the orders will always extract the correct information from higher order inputs - in other words the system is upward compatible.

There's my list of all the main Ambisonic references here, and you will find a list of my Ambisonc papers in my area in the Department of Music's Web pages. Also, most of the recordings given there with dates of 1979 or later were done in UHJ using our Soundfield microphone. An even larger list of Ambisonic references has been provided by Michael Gerzon.


Dave Malham. 4th. April 1995
If you have any suggestions, comments or requests you can reach me at
dgm2@unix.york.ac.uk

Dave Malham is not associated with Ambisonic Studio and Daniel Courville. This page appears on Ambisonic Studio by courtesy of Dave Malham.