Whether you are setting up a bedroom studio or the next Abbey Road, choosing the right audio interface for your needs can be brain‑meltingly difficult.
In his alphabet of helpful advice, Sound On Sound's Editor In Chief Sam Inglis breaks down everything you need to know about audio interfaces into 26 bite-sized chunks!
- For further reading, check out Sam's related feature article 'How To Choose An Audio Interface' on the cover of our September 2020 editions.
Analogue
A is for analogue. Sound is changes in air pressure. Microphones convert sound into a changing electrical voltage. That’s called an analogue signal because the voltage changes are analogous to the pressure changes. Getting sound into a computer involves converting that analogue electrical signal into a stream of numbers, and that analogue-to-digital (ADC) conversion is one of the most important things that an audio interface does.
Bus power
B is for bus power. Some interfaces need a mains power supply. Some can get enough power from the USB or Thunderbolt socket. It’s more convenient but usually there’s only enough power available to drive a small interface.
Clocking
C is for clocking. Two hundred years ago every part of the country set their clocks locally. And then the railways came along, and suddenly it was quite important that 10.30 was 10.30 everywhere, not 10.15 in Oxford and 11 o’clock in Norwich. Digital audio devices need a timing reference, and if you’re just running one of them, they can set that locally. But if you want to send audio digitally between devices, they all need a common timing reference.
Driver
D is for many many things, including digital, D-Sub, Dante, direct monitoring, daisy-chaining, disco, drainage and DC coupling, but perhaps most importantly, D is for driver. An audio interface is only as good as the software that allows the computer to talk to it, and that software is called the driver. With good driver software you’ll get reliable and stable performance, you’ll be able to operate at low latency without overloading your computer’s CPU, and you’ll get support that’ll keep your interface working on tomorrow’s computers as well as today’s.
Ethernet
E is for Ethernet. Computers have several types of expansion port: USB, Thunderbolt, PCI and so on. What these all have in common is that they’re not specifically designed for audio recording, so they all have their own issues that have to be worked around. One compromise is that most of them only allow interfaces to talk to one computer at once. The big plus of Ethernet is that it allows a whole load of interfaces and computers and mixers to talk to each other in a big network.
FireWire
F is for FireWire, which used to be a popular way of connecting audio interfaces to computers until Apple made it obsolete and replaced it with Thunderbolt.
Gain
G is for gain. When you plug a mic into your interface, you need to amplify the signal to get it up to a level where it can be converted to a digital signal. The gain is the amount of amplification that can be applied. Some interfaces don’t give you enough gain for quiet signals. Others don’t let you turn loud signals down enough.
Headphone socket
H is for headphone socket. Which is a socket where you plug headphones in. Almost all interfaces have one. Many interfaces have two. No interfaces have more than two which is a bit of a shame. Things to watch out for… sometimes the headphone sockets can be addressed separately from other outputs in your DAW, sometimes they can’t. Sometimes they can drive headphones loud, sometimes they can’t.
Insert
I is for Insert. When you’re recording, sometimes you want to use a piece of studio outboard like a compressor or EQ after the mic preamp. To do that you plug it into something called an insert point. These are available on some audio interfaces but not most.
Jitter
J is for Jitter, which is a new dance craze that’s sweeping the nation. When we convert an analogue signal to digital, we measure its amplitude many thousand times a second. The more precisely we can time those measurements the more accurately we can recreate the signal. Any variation in the timing of those measurements generates a form of distortion called jitter. Probably you don’t need to worry about it. If it hadn’t begun with J there’s no way I’ve have put it in this list.
Knitting (cable management)
K is for knitting, which is a contrived way of talking about cable management. Not really a problem with small interfaces, but something to think about if you have a lot of things to plug in. Do you want all your sockets at the back of a rack unit, or would they be more useful on the front? Are you OK with having mic and line-level sources sharing a socket, or would you be better off with separate jacks and XLRs? If you need a lot of inputs and outputs, would it be neater to have them on a D-Sub connector?
Latency
L is for latency. Which ought to begin with A, because it’s super important. When we record sound it has to go into the interface, through your computer and back out the other end, and that takes time. That delay is called latency and if it’s long enough that you can hear it, it’s a problem. There are two basic ways to deal with latency. One is to write really good driver software, which is difficult. The other is to create a separate monitor path that doesn’t go through the computer, which is complicated. And doesn’t work for soft synths. So do some research and find out in advance what the low-latency performance of your chosen interface is like.
Monitor control / Metering
M is for monitor control, or what used to be called the master section. Most people who have an audio interface don’t also have a separate hardware mixer, because the audio interface has all the features they need. Except, does it allow you to switch speakers? Does it have mono and dim buttons? Does it have talkback? Can it handle surround sound? Sometimes yes, sometimes no, always worth thinking about.
Noise
N is for noise, which can be an important consideration if you’re recording quiet things with microphones. In that case you’re looking for plenty of input gain on your mic preamps and a low Equivalent Input Noise level.
Optical
O is for optical. Digital audio is a big stream of numbers, and if we want to transmit those numbers from one device to another, we can do that in lots of different ways. One common way is to encode them as pulses of light on what’s called a TOSlink connector. This can either carry a stereo audio signal in the S/PDIF format or eight channels in the ADAT Lightpipe format.
PCIe
P is for PCIe, which is another way of expanding a computer. It’s good for things that need to shovel lots of data in and out very fast and with super low latency, like graphics cards and audio interfaces. So why don’t all audio interfaces use it? Two reasons. One, only tower and desktop computers actually have PCI slots, and laptops don’t. Two, there’s not really space on a PCI card for audio connectors, so you have to have a separate breakout box as well, and that makes it awkward and expensive.
Quarter-inch jack
Q is for quarter-inch jack. Pretty much any interface with analogue inputs and outputs will have some quarter-inch sockets. They’re used for connecting line-level sources such as synths. They’re used for plugging in guitars. And they’re used to connect loudspeakers and other gear to the outputs. And they’re a quarter of an inch wide.
Rackmount
R is for rackmount. For a long time professional audio equipment has used a standard format which is 19 inches wide and multiples of 1.75 inches high. Many audio interfaces still come in this format, but if you don’t have any other rackmount equipment you might wonder whether a different form factor is more convenient. Do you really want all the controls crammed onto a tiny rectangle at the front when they could be spread out over the top panel?
Specifications
S is for specifications, which is a big list of numbers that manufacturers will try to dazzle you with in the hope that you’ll buy their product. Some of these are important. Some of them are practically the same for all interfaces. Some really important things are almost never mentioned in the specifications. Ones that might be important include dynamic range, maximum input and output levels, preamp gain range, equivalent input noise and headphone amp level.
Thunderbolt
T is for Thunderbolt, which was created when Apple and Intel got together and said, hey, wouldn’t it be great if we could have a connector that was as fast and as powerful as PCI but didn’t require you to open up your computer and fit a hulking great card. And then we could charge 50 quid for a cable. So they did, and especially on Macs, Thunderbolt is probably now the preferred way of hooking up an audio interface. And now the cables only cost 40 quid, so that’s progress.
USB
U is for Universal Serial Bus or USB, the most widely used and the most confusing connector on the planet. It’s widely used because it’s genuinely universal and that means it’s cheap, but compared with PCI or Thunderbolt there are some significant shortcomings. It’s harder to get good low-latency performance from a USB interface and you can’t usually connect more than one of them to a single computer. The cables are low-cost, though.
Voltage control
V is for voltage, and specifically, control voltage (CV), which is relevant if you’re one of the millions of people who’s got into Eurorack synths as a substitute for making music. Some audio interfaces can be used to control modular systems, and the key here is that they need to be able to generate a stable voltage that doesn’t change over time. That’s usually referred to as a DC coupled output.
Word clock
W is for word clock. As I mentioned earlier, if you connect two devices digitally they need to have a common timing reference. Sometimes that can be embedded in the signal, but in more complicated systems we need a single clock source that can be distributed to multiple devices. That’s what word clock is, and it’s usually supplied on something called a BNC connector.
XLR
X is for XLR connector. Universally used for microphone inputs. Sometimes also used for line-level audio. Also used for a digital format called AES3 and occasionally for supplying power. The L stands for latching, which means you can’t pull it out accidentally, and the R stands for rubber, which makes it flexible. The X doesn’t stand for anything at all but it’s quite useful when you’re compiling an alphabet.
You
Y is for you and what you want. Unless you’re buying something totally bargain-basement, modern audio interfaces are generally all pretty good. From a sonic point of view they are unlikely to be the weak link in the chain. But if you buy one that doesn’t have the features you need, or doesn’t have very good driver support, or it has complicated control panel software that you can’t get your head round, those things are going to be major obstacles to your music-making. So think carefully about what’s actually important to you, and be aware that it’s often the things that aren’t listed in the specs that make the most difference.
Z impedance
Z is for impedance. That’s technically true actually and not at all a contrived way of getting something beginning with Z into this alphabet. Many interfaces offer high-impedance or high-Z inputs, which are designed so you can directly plug in an electric guitar and have it not sound like crap. Except sometimes it still sounds like crap. Which may be because the person playing the guitar is me, but it’s also the case that not all of these high-impedance inputs are very well implemented. Ideally the impedance should be at least one Megaohm, that’s one million Ohms. Which compares with a mic input that would typically be two or three thousand Ohms.
Conclusion
So that’s the A-Z of Audio Interfaces. I hope you found it helpful, and if you want to learn more, browse our Glossary of technical terms and check out our How To Choose An Audio Interface guide and Choosing A Budget Audio Interface. It's also a good idea to browse through Sound On Sound past reviews of audio interfaces.