Intona products are engineered and made in Germany. We ship daily worldwide.
The Intona Asynchronous Sample Rate Converter solution comprises rate conversion to any number of uncompressed PCM audio channels as Intellectual Property (IP) core. The design is FPGA-verified and provided in human-readable Verilog-HDL. The solution excels in low latency and low logic resource allocation at professional grade audio quality.
The design consists of a polyphase FIR filter that feeds the subsequent convolution process, where the actual resampling happens, with the desired intermediate values. The polyphase is selected out of 4096 predefined coefficients using cubic interpolation with 28 bits of decimal precision. Effective applied taps are in the range of 8 to 66, depending on ratio and output samplerate. A latency counter is provided within the simulation. Because of the uniform FIFO and interpolator interface, there is no conceptual restriction in channel count.
Any arbitrary, synchronous or asynchronous upsampling or downsampling of 24 bit data in the range of 30 to 230 kHz is supported. The resulting THD+N within the audio band is typically better than -135 dB. The channel count is always static and gets not reduced in double or quad mode. A high precision ratio detector for asynchronous deployment is included.
The core is fully pipelined and designed to be as economical as possible regarding logic, RAM and multiplier use. Resource usage can be further optimized by using a fixed system sample rate and by omitting the quad speed mode.
Any FPGA that is capable to run the core logic at desired clock frequency.
Although this core does not require dedicated hardware building blocks such as multipliers or RAMs, it is strongly recommended to make use of pipelined multipliers to achieve optimum speed vs. area.
For the simulation, Verilator and a C++ compiler on macOS or Linux is required. WAV files or VCD logic files can be generated for inspection and verification.
The core uses 32x32-to-36 bit multipliers as defined in
asrc_mult32.v. This original Verilog version is primarily used for simulation. In hardware, it is recommended to use one of the pre-generated cores. Provided are cores for Spartan 6 and 7-series (all models). For other FPGA models, the designer may generate those by using the FPGA vendor tools.
Usually, the coefficient core will utilize 4xMULT18/DSP48. The convolution kernel uses 4xMULT18/DSP48 per 32 channels.
The coefficient data ROM and the FIFO buffers are described in pure Verilog-HDL and will be instanciated automatically as dedicated RAM blocks by the synthesizing process.
This IP core solution is provided under the terms of the Intona IP Core License Agreement. For full access to all HDL sources for core functionalities in simulation and in hardware you must purchase a license for the core. Evaluation licenses are available in form of binary modules with hardware timeout. Contact Intona f or information about pricing and availability.
Ordering code of this core is IN8083IP.
In a typical application, the sample rate converter consists of three parts.
asrc_fifo_parallelaccepts pipelined parallel PCM input
asrc_fifo_tdm_4x8accepts serial TDM data with four strides, up to eight channels each
asrc_core), which computes the right tap value for the convolution at the right time. The core also determines the ratio between input and output when set to asynchronous mode. One core is needed per direction. One core serves any amount of I/O channels.
asrc_convolute). Each convolution kernel applies the actual resampling to the PCM data held in the respective input FIFO for up to 32 channels. The output is always random access parallel PCM and may be double buffered. Several examples are provided for converting the results to TDM or massive-parallel data.
For bidirectional resampling, the dedicated core
asrc_core_bidir is provided. This actually instanciates two cores that share one set of coefficients as dual-ported ROM. In contrast to using two unidirectional cores in parallel, this saves LUT resources, and 8 to 12 kBytes of block RAM.
It is recommended to use the sources directly by copying or sym-linking the contents of the rtl directory to your FPGA hdl sources directory. The provided examples demonstrate how to use the core with a variety of application scenarios.
If you prefer using pre-synthesized netlists, there are some helper scripts provided (ISE only). You need to select a target in the target.cfg file and run
./make_netlist.sh in the top directory of the package. A ngc file will be generated.
If the parameter
MANUAL_RATIO is set to 0, the ratio between input and output word clocks is determined by the core. The internal moving average detector measures the time between two word clock events at 26 bits precision. The time constant is about one second. It also calculates the reciprocal that is needed to scale the output amplitude during downsampling. The
direction signal reverses the average value with the reciprocal, hence allowing resampling to upstream.
Connect the source of the foreign word clock to the
Clock Domain Crossing
Asynchronous inputs need special attention. To avoid metastability, you should always re-register them to the local high speed clock with a pipeline of at least two registers.
The asrc_core features the 8-bit output
iodiv_out[7:0] , which is a copy of
iodiv[30:23]. You may use this as an indicator in your application if you need to know at which ratio the resampler is currently working.
MANUAL_RATIO is set to 1, the core does not derive the ratio from the input word clock. A valid ratio and its reciprocal must be given in 4.28 fixed point format to
iodiv_r_manual. They may be changed arbitrarily at runtime without resetting the core.
async_lrck ( the input pin for asynchronous word clock) is ignored in synchronous mode. The core just eats up the samples that are fed into the internal FIFO by triggering
new_frame synchronously. Just like in asynchronous mode, there is no requirement on phase relationship between input and output word clock (other than being synchronous to the high frequency core clock).
iodiv_manual must represent the 4Q28 fixed point value of Fsin divided by Fsout and
iodiv_r_manual, its reciprocal.
|Speed Mode In||Speed Mode Out||sys_speedmode[1:0]||iodiv_manual[31:0]||iodiv_r_manual[31:0]|
The core is partly reset at active high of the reset input signal. You may tie this to your internal reset logic. This pin is optional. Tie to
1'b0 if not used.
In general, there is no lockup situation known to that the core would need a reset. If it gets wrong signals, it will output wrong values. If the fed signals become valid again, the core will resume with valid output. Use the
asrc_health module to mute or re-route your outputs if invalid signals are unacceptable.
asrc_health module checks some states and the "good" output goes high if the internal FIFOs are not full or empty. It can be used to enable other circuity in the design, such as mute events.
The core is designed and tested to be clocked at 122.288 MHz for 48/96/192k or at 112.896 MHz for 44.1/88.2/176.4k sample rates. It expects the local frame sync to happen every
fclk / fsamplerate, which depends on the sample speed mode, as shown in the following table:
|Sample Speed||Core Clock Ticks per Output Frame|
The core triggers at rising edge of the target frame sync signal (
In synchronous mode (MANUAL_RATIO=1) any other core clock can be accepted by the core, as long the "ticks per output frame", as stated in the table above value, is attainable. For example if your system is clocked at 130 MHz, this will work perfectly fine.
This core is designed to accept quad speed sample rates when the parameter
QUAD_AVAIL is set to 1. Setting it to 0 will save 4096 bytes of occupied RAM and the highest acceptable samplerate will drop to about 113 kHz.
There is no hysteresis in changing internal modes when changing sample rate arbitrarily. Notably the edges between double and quad modes should be avoided. It is recommended to use the core within following sample rates:
Usable ranges are:
Fsin 30..113kHz and 115..230kHz.
The ratio between input and output samplerate must not be larger than 4.999.
Downsampling from e.g. 192kHz to 48kHz is challenging because it would require 128 taps per Fsout which is beyond the maximum of possible taps in this design. It is common to skip the first half of the coefficients this case, effectively scaling it down to 64 taps. However, just skipping does not deliver enough amount of alias image rejection and this is not satisfying the standards of professional audio equipment. Hence, there is a second coefficient set available which is was optimized to 64 taps at quad downsampling rates.
The maximum number of possible taps shrink down because the core algorithm needs 32 clock cycles to fetch and interpolate a polyphase tap. At 192 kHz, this would require a core clock of 245.76 MHz. Because this clock is not possible with today's budget FPGAs, this mode is implemented to use the second coefficient set that is available in half of the original size.
The simulation has a single-shot peak detector implemented. The Fs time of the first positive peak of each input and output will be saved in a variable and the result is printed to the console when the simulation is done. This could also be implemented as zero-crossing detector but that technique suffers from false-positives when possible pre-ringing occurs, so peak detection is preferred.
See Quad Mode.
For other individual signals, see the source files for further explanation of the individual ports.
Parallel input words are fed into the resampler at rising edge of
new_word. After 2CH_BITS words have been fed, a
new_frame pulse must follow to mark the end of frame.
The double buffered output can be read through the RAM interface with
d_out_ch as address and
d_out as data.
Four lanes of eight channels, MSB first, with one-early frame sync. Also available as
asrc_system_tdm_4x2, which handles four lanes of two channels.
Maximum BCK frequency for the TDM modules is 24.576 MHz.
Same as unidirectional, but duplicated ports for additional resampling to upstream direction. This makes use of the bidirectional
asrc_core_bidir, which will share the coefficient ROM between the two resamplers.
No less than 32 parallel inputs and outputs.
Simulation is done using the high performance open source Verilog simulator Verilator, which effectively converts Verilog to C++. The output compiles to a native binary, which can be run on a PC.
Simulation presumes a Linux (or other Unix, e.g. Mac) command line terminal. On Windows, this may work using WSL (Windows Subsystem for Linux).
Find dependencies and installation instructions of the Verilator simulation suite on this web site. It is recommended to build from Git.
On Debian-flavoured systems, the installation of Verilator including dependencies is simple:
sudo apt-get install verilator
The system top module, written in C++, generates a stimulus. This is one of static sine tone, swept sine tone or Dirac impulse and it outputs a mono WAV file with the simulated result. Static sine tone is used for THD+N calculation. Swept sine can be used to show aliasing images using
sndfile-spectrogram (which is part of libsndfile sndfile-tools). The Dirac stimulus will create an impulse response that can be used to inspect the frequency response by using deconvolution.
Verilator uses a much faster simulation technique than classical simulators, such as Icarus Verilog. You can expect to simulate five seconds of signal in ten to thirty seconds on a decent machine. Classic simulators would need several hours or even days for the same task.
For signal inspection using GTK Wave or the like, Verilator can output simulation data in VCD file format. You need to set
WRITE_TRACE to 1 at the top of the corresponding C++ file.
Example session to observe the signals using GTK Wave:
This example takes the synthesizable
asrc_system_parallel_32ch.v found in the examples directory. It creates a signal and writes the resamples PCM to standard WAV files. The simulation stimulus is created in
asrc_system_parallel_32ch.cpp. The shell script
asrc_system_parallel_32ch.sh helps with building and running the simulation.
Run the simulation on the console with
./asrc_system_parallel_32ch.sh <what> <inrate> <outrate>
<What> is: 0=IR (Dirac) 1=sine 2=sweep
The example outputs 32 WAV files, following a special naming convention. Watch the console output.
Simulation output file name convention:
asrc_sim-<Fs in>-<Fs out>-<what>.wav
Inspect the WAV files with the tools of your trust.
Measured THD+N @ 0 dBFS 1 kHz sine BW 22Hz-22kHz for exemplary.
Fsout and Fsin are completely separate, asynchronous systems
|Fsout Hz||Fsin Hz||THD+N dB|
Typical frequency response is +/-0dB at 0-18kHz and +0/-1.0dB at 0-20kHz.
Gain is about -0.05dB. This has not been set to 0 dB because of potential rounding errors of the scaling gain when upsampling. The scaling gain is calculated by the reciprocal of the given iodiv value (which is Fsin divided by Fsout).
The response is optimized to get shortest group delay as possible. Having -1 dB at 20 kHz might be considered a weakness by datasheet purists but this decision was the key to reach shortest group delay while maintaining excellent aliasing rejection within the audio band.
Illustrated frequency response represents double to single speed conversion with ratio=0.5. The actual frequency response may vary over different ratios.
The phase response is linear. Hence, regardless of the frequency, the absolute latency always corresponds to the group delay.
The delay is subject to change by +- 1 sample (Fsout) because of some residual uncertain FIFO alignment owed to asynchronous systems (but it won't jitter).
The latency can be further reduced by 2-3 samples when lowering
FIFO_COUNT_MIN in asrc_convolute.v if limiting the maximum ratio is acceptable.
On the example of using TDM I/O, one direction,
QUAD_AVAIL set to 0.
For reference, XC6SLX9 has 1430 Slices.
An evaluation license with hardware timeout is available for this core in form of a ready-made bitstream. The sources are included with the purchased license.
See instructions below.
Bit numbers depicted in white digits.
|Pin||Bit||Pmod JA||Pmod JB||Pmod JC||Pmod JD|
Alls signals are LVCMOS 3.3V.
Direction (input or output) as seen from board perspective.
|JA0||Input||Input TDM Data Stride A (Channel 0..7 or 0..1)|
|JA1||Input||Input TDM Data Stride B (Channel 8..15 or 2..3)|
|JA2||Input||Input TDM Data Stride C (Channel 16..23 or 4..5)|
|JA3||Input||Input TDM Data Stride D (Channel 24..31 or 6..7)|
|JA6||Input||TDM Bitclock for inputs|
|JA7||Input||TDM FS for inputs|
|JC2||Output||Master Clock Derived divided by 512|
|JC3||Input||Asynchronous FS input (e.g. tie to JA7 or JD7 depending on the "direction" switch)|
|JD0||Output||Resampled TDM Data Stride A (Channel 0..7 or 0..1)|
|JD1||Output||Resampled TDM Data Stride B (Channel 8..15 or 2..3)|
|JD2||Output||Resampled TDM Data Stride C (Channel 16..23 or 4..5)|
|JD3||Output||Resampled TDM Data Stride D (Channel 24..31 or 6..7)|
|JD4||Input||24.576 MHz Master Clock 50% duty (fed to PLL for internal high speed clock)|
|JD6||Input||TDM Bitclock for outputs|
|JD7||Input||TDM FS for outputs|
|LD4||on state of SW3, to verify switch knob direction|
|LD7||7 Hz blink|
When using the 32-channel example, you may run into signal integrity issues because of the high bit clock frequency. Keep cables short and groundings low-z.
Pre-made binaries for evaluation are included in the src/target/Arty_xxx/build directory.
|Arty_4x8||Four TDM I/O, eight channels each|
|Arty_4x2||Four TDM I/O, two channels each|
How to flash the demo binary to the board
When using the 32-channel example, quad speed mode output is not working because this would violate the maximum bit clock frequency of 24.576 MHz.
This document makes use of the term "speed" as a reference to original 44.1k or 48k sample rates. Following table clarifies the relationship.
|Sample Rate||Resulting "Speed"|
|30000 to 56500||Single|
|57000 to 113000||Double|
|114000 to 230000||Quad|
Document version: 97 / Nov 07, 2020 15:29
Download this document as PDF