Error Detection and Control in Data Transfer

22. 11. 1998

Erno Alhoniemi
Ti
Helsinki University of Technology

eal@cc.hut.fi

Abstract

Today's data is transfered over reliable communications. This is achieved by using error detection and control. There are different communications media with different quality of service (QOS). Consequently, there are several error detection and control schemes for different signal conditions. Some fundamental methods have existed for decades. In addition, there are also new methods that can take advantage of the nature of the signal type. These methods can be used to ensure reliable or adequate communications and faster data transfer rates which are used in today's innovative applications.

1. Error sources

No errors can occur in the ideal transmission medium. However, none of the transmission media is ideal. The signal representing the data is always subject to various error sources. As the signal propagates along the transmission media its amplitude decreases. This phenomenon is called as the signal attenuation. The signal cannot be detected if it is too weak. In addition, as the length of the medium increases the waveform also changes during the transmission. This phenomenon is called as the delay distortion. The signal cannot be recognized if it is too distorted. Furthermore, the transmission media can also be a subject to interference resulting from other cables or signals caused by electromagnetic radiation. The medium itself may also cause constant white noise. All transmission errors increase as the length of the transmission medium inreases. [6]

2 Error bursts

In practice, data communications systems are designed so that the transmission errors are within acceptable rate. Under normal circumstanced there are only few errors. However, it is possible that the signal conditions can be sometimes so weak that sometimes the signal cannot be received at all. It is also possible that sometimes the interference signal is stronger than the signal to be transmitted. Consequently, the data sent during the break is lost. The contiguous blocks of data corrupted by the error signal are called error bursts. The length and frequency of the error bursts depend on the quality of the data link which in turn depends on the transmission medium and the signal conditions. Therefore, the error detection and control should be able to handle as many errors that possible. However, the applications limit which error detection and control schemes are suitable. All applications benefit from the effeciency of the sheme to be used. In addition, some applications may require short codeword length or low latency whereas some other applications might prefer extremere low error rate, for example.

3 Error detection

Error detection is a method that allows some sommunications erros to be detected. The data is encoded so that the encoded data contains additional redundant information about the data. The data is decoded so that the additional redundant information must match the original information. This allows some errors to be detected. Unfortunately, some error bursts may cause incorrectly received blocks which pass the error detection test. Therefore, good error detection schemes are designed so that this should occur as infrequently as possible.

3.1 Parity checking

Parity checking is a primitive character-based error detection method. The characters are encoded so that an additional bit is added to each character. The additional bit is to 0 or 1 according to the number of bits set in the character. The resulting number is either even or odd. There extra bit is set according to this result and according to which parity setting, either even or odd, is being used. If even parity is used, the extra bit is always set so that the codeword always contains an even number of  bits set. In even parity, the number of bits set in the codeword is always odd. The decoding is done simply by checking the codeword and removing the extra bit. The parity checking will only detect one bit error bursts in each codeword. Parity checking has been used in character-based terminals but it is not useful for today's reliable communications. However, it is being used in memory chips to ensure correct operation.

3.2 Block check

Block check is a block-based error detection method. The data is divided in blocks in the encoding process. An additional block check is added to each block of data. The check is calculated from the current block. The receiver also performs the same calculation on the block and compares the calculated result with the received result. If these checks are equal the blocks are likely to be valid. Unfortunately, the problem with all block checks is that the block check is shorter than than the block. Therefore, there are several different blocks that all have the same checksum. It is possible that the date is corrupted by a random error burst that modifies the block contents so that the block check in the corrupted frame also matches the corrupted data. In this case the error is not detected. Even best block checks cannot detect all error bursts but good block checks mimimize this propability. However, the reliability increases as the length of the block check increases.

3.2.1 Block check sum

Block check sum is a primitive block check sum that is the sum of all characters in the block. The result is a character that is equally long as the characters in the block. Therefore, the result is sometimes referred as the block check character (BCC). Unfortunately, even a long BCC may allow relatively simple errors. In other words, it is easy to find different blocks that generate the same block check sum. Calculating check sum is certainly fast and easy but the reliability of the check sum is not adequate for today's reliable communications. However, due to its speed it is used in some applications which require that the calculation is done by the software. [6]

3.2.2 Cyclic redundancy check

The Cyclic redundancy check (CRC) is an intelligent alternative for block check sum. It is calculated by dividing the bit string of the block by a generator polynomial. The value of the cyclic redundancy check is the reminder of the calculation which is one bit shorter than the generator polynomial. This value is also sometimes referred as the frame check sequence (FCS). However, the generator polynomial must be chosen carefully. CRC is a stronger check than the block check sum and it is being used in today's reliable communication. Calculating the CRC requires slighty more processing than the check sum. It can be easily implemented by using shift registers and xor-operations both in hardware and software implementations. The most common CRC is the CCITT CRC-16. The CRC is able to detect all single error bursts up to the number of bits in the CRC and most random errors [6]. Unfortunately, the error bursts in many applications are random and the CRC can only find only most of them. Therefore, a random error burst on the block might sometimes match both the original and the corrupted data. This is the case especially when CRC-16 is being used. Therefore, the number of bits in the checksum should be as high as possible or at least high enough. CRC-32 is enough for many todays applications. It has been tested that as the number of bits in the CRC increases the propability that two different blocks with the same CRC are encountered during the data transmission approaches zero [4]. The test results show that the even CRC-48 can be enough to ensure than no two different blocks having the same CRC can be found in practice [4].

4 Error Control

Error control is a method that can be used to recover the corrupted data whenever possible. There are two basic types of  error control which are backward error control and forward error control. In backward error control, the data is encoded so that the encoded data contains additional redundant information which is used to detect the corrupted blocks of data that must be resent. On the contrary, in forward error control (FEQ), the data is encoded so that it contains enough redundant information to recover from some communications errors. [6]

4.1 Backward error control

Using backward error correction requires a two-way communiction channel. The sender divides the data in blocks encodes the data with redundan additional information that is used to detect communications errors. The receiver applies error detection and if the receiver detects errors in the incoming blocks it requests the sender to resend the block. This mechanism is also called as the automatic repeat request (ARQ). The ARQ can always repair any errors it can detect but it causes a variable delay on the data transfer. The basic types of ARQ are idle RQ and continuous RQ. The backward error control is used in many data tranfer protocols. In addition, in order to use data compression, the communications errors must always be corrected. [6]

4.1.1 Idle RQ

Idle RQ is a fundamental backward correction scheme used in many protocols. The data is transfered in packets by using error detection. The receiver checks the incoming packets and sends an acknowledgement (ACK) to the sender if the packet was valid. If the sender receives an acknowledgement in the specified time it sends the next packet to the receiver. Otherwise, the sender must resend the packet. Idle RQ is very simple but it is often too inefficient. It can only send data to one direction at a time. In addition, the delay on the data transfer may result in situation where only a small fraction of the capapacity of the communications link is used. [6]

4.1.2 Continuous RQ

Continuous RQ is an improvement over Idle RQ when there is a delay on the data transfer. It allows several packets to be sent continuously. Therefore, the sender must use packet numbering. The receiver also receives packets continuously and sends an acknowledgement containing a packet number after receiveing a valid packet. In case the sender cannot get an acknowledgement it starts resending packets in either of the two ways. In selective repeat, only each block that was corrupted is resent. Selective repeat is complex but it is useful when error are common. In go-back-n, once a corrupted block is detected, the transmission continues from the corrupted block and all blocks after the corrupted blocks are discarded. Go-back-n is less effective than selective repeat but it is also very simple and it is almost equally effective if the errors are infrequent. [6]

4.1 Forward error control

Using FEQ requires a one-directional channel only. The data is encoded to contain enough additional redundant information to receover from some communication errors. Unfortunately, the FEQ cannot recover from errors only when enough information has been succesfully received. There is no way to recover from errors when this is not the case. However, the FEQ operates continuously without any interrupts and it ensures constant delay on the data transfer which is useful for real-time applications. [6]

4.1.1 Hamming single-bit code

Hamming single-bit is a block code in which each block is separate from each other. The input block size can be made as small as necessary. The number of bit errors to be corrected can be specified by using a adding enough redundant information in the encoding process. The minimum number of bits that differ on all possible codewords is called the Hamming distance [8]. This yields that the error bursts shorter than the Hamming distance can be detected. Therefore, to detect communications errors, the Hamming distance of the line code must be longer than the length of the error bursts. A N-bit error requires an encoding with a Hamming distance of  N+1 for detection and 2*N+1 for recovery. [6]

4.1.2 Convolutional forward error correction

In block codes, each block is independent of other blocks. On the contrary, in the convolutional forward error correction, the encoded data depends on both the current data and the previous data. The convolutinal encoder contains a shift register that is shifted each time a new bit is added. The length of the shift register is called as the constraint length and it contains the memory of the encoder. Each new input bit is then encoded with each bit in the shift register by using modulo-2 adders. The decoding is more difficult than the encoding. The data is decoded by using the Viterbi algorithm that tries to find the best solution for the decoding. Of course, all errors cannot be corrected but the error rate can be decreased. The convolutional error correction has an advantage of using all previous correctly received bits for error correction. [1,6]

4.1.3 Golay forward error correction

Golay codes are block codes that allow short codewords. The perfect Golay code is an encoding that encodes 12 bits into 23 bits, denoted by (23, 12). It allows the correction of three or fewer single bit errors. The extended Golay code contains an additional parity bit which allows up to four errors to be detected. The resulting code is (24,12) which is also known as half-rate Golay code. The decoding can be performed by using either soft or hard decisions. The soft decisions provide better error correction but requiere more processing. Golay codes are useful in applications that require low latency and short codeword length. Therefore, Golay codes are used in real-time applications and radio communications. [2,5,7]

4.1.4 Reed-Solomon forward error correction with interleaving

The Reed-Solomong forward error correction with interleaving is a forward error correction scheme that is intended to be used with high-quality video communications. The encoding is performed by filling a two dimensional array of 128*47 octets, that is, 128 octets in each row containing 124 octets of data and 4 octets of redundant check data. The encoding is done by filling the buffer columnwise 47 octets at a time. After this has been repeated 124 times the buffer is full and it is encoded by writing each row at a time. This encoding allows two cells to be corrected or 4 cells to be reconstructed. Two interleave buffers are required because a single buffer can only be either read or written at a time. The decoder also needs two buffers for the same reason. The encoder writes a row at a time, then performs the possible recovery and reconstruction of defective cells and reads the array columnwise. Unfortunately, the encoding and decoding both cause an additional delay on the data transfer that is equal to the transmission time of a single buffer. This type of encoding does not completely repair all error but it ensures hight-quality throughput in real time. [3]

5 Conclusion

The conventional error detection and correction methods are still being used is conventional application. Today's services can be implemented by using conventional methods. However, new methods have been discovered to increase the data transfer rate on some applications. These methods can certainly take advantage of the nature of the signal in a way that could not be achieved by using conventional methods. These new techniques are necessary for tomorrow's multimedia applications especially as the amount of the IP-based and mobile network traffic increases.

6 References

[1] 4i2i Communications Ltd, 4i2i: Convolutional Coding, 15. 9. 1998 [ref. 19. 11. 1998]
    <http://www.4i2i.com/viterbi.htm>

[2] 4i2i Communications Ltd, 4i2i: Golay Code, 14. 9. 1998 [ref. 19. 11. 1998]
    <http://www.4i2i.com/golay.htm>

[3] Cellware Broadband, AAL1 Reed-Solomon FEC submodule specification, 21. 8. 1996 [ref. 19. 11. 1998]
    <http://www.cellware.de/boards/aal1fec.html>

[4] Dillon, Matt, CRC16 - CRC64 test results on 18.2M dataset, w/program source,  14. 11. 1997 [ref. 27. 9. 1998]
    <http://www.backplane.com/diablo/crc64.html>

[5] EDN Magazine, Communication-systems error simulation resolves trade-offs, 18. 6. 1998 [ref. 19. 11. 1998]
    <http://www.ednmag.com/reg/1998/061898/13df_03.cfm>

[6] Halsal, Fred, Data Communication, Computer Networks and Open Systems, Fourth Edition, Addison-Wesley Publishhing Company Inc., United States of America, 1996, 997 sivua, [ref. 27. 9. 1998]

[7] Kantronics Co., Inc., G-TOR, 2. 11. 1998, [ref. 19. 11. 1998]
    <http://kantronics.com/gtor.htm>

[8] Togneri, Roberto, Hamming Distance, 8. 5. 1995 [ref. 27. 9. 1998]
    <http://ciips.ee.uwa.edu.au/~roberto/units/itc314/help/node21.html>

7 Further information

Overview of the Global System for Mobile Communications
    Overview of GSM