Vortex86SX Fault Tolerance Features
one. About fault tolerance
Fault tolerance is an important guarantee for stable, reliable, effective, and continuous operation of computer application systems. Many service industries must ensure 24 hours of uninterrupted operation. Otherwise, they will bring great economic and social negative impacts to the unit or company. Therefore, the normal operation of computers and the security of data are particularly important. The primary and most important security issue for computers is the security of hardware and software. At the same time, due to the security factors of PCs, the security of the business system based on the platform, the stability and reliability, and the continuous operation of the system have become the key to the system. Therefore, x86 The fault-tolerant system under the architecture has been more and more popular. The fault tolerance function of Vortex86SX (hereinafter abbreviated as VSX) is briefly described below in conjunction with the specific product conditions of our company (Zhaoying Technology).
two. Fault tolerant schematic
The picture above shows the fault tolerance schematic of our company. We divide it into arbitration zone, information exchange zone, data validation zone and other regions to explain.
1. Arbitration area:
It is to determine under what circumstances the board will let the Master transfer control to the Slave.
A, WDT1: watchdog 1 start
B, SYSTEM RESET: System Restart
C, SOFTWARE CONTRNOL: Software Control
D, WDT0: Watchdog Start 0
E, Ext system fail in: system external trigger
F, MANUAL SWITCH A: Manually switch the system
G, INVALID CODE: illegal instruction
The Master will push control to the Slave when the above seven conditions occur, because the transition time is as short as 1 microsecond, so that even if the board is stuck, it will not crash.
2, information exchange area:
The SRAM has two functions as a temporary memory storage area and a shared space for data exchange. Master can write data to its own 4K SRAM space under GPCS DECODER control, and can also write data to Slave's 4K space through GPCS0. This not only doubles Master's storage space from 4K to 8K, but also makes it easier to implement. The comparison of the two board cards achieves information synchronization.
3, data exchange area:
Through the high-speed serial port COM9, the master-slave SRAM data can be compared to master and slave information synchronization, and the data exchange between the two systems can also be realized.
4, other areas:
UART1—UART4 is serial port data suppression fault tolerance, which is two motherboards. The serial port of masterand Slave) is connected to the same serial port of the device at the same time. The serial port of Master motherboard will inhibit the serial port of the Slave motherboard. The device only accepts Master information, so the serial port is a controllable I/O.
GPIO PORT0----GPIO PORT2 refers to the fault tolerance of the GPIO port.
KBD/MS refers to the fault tolerance of the mouse and keyboard.
three. Fault-tolerant hardware setup
1. There must be two VSX CPUs as the core of the board, and the type and specification of each board must have a PC104 bus.
2. Stack two identical VSX boards with 104.
3. Use the BoR bus to connect the fault-tolerant interfaces of the two boards. As shown below
This completes the hardware setup.
four. Software settings
Press DEL to enter BIOS setup
Select Chipset SouthBridge Configuration directory as shown
There are two settings for GPCS configuration and Redundancy Control configuration under the SouthBridge Configuration directory.
The first item: GPCS settings
1. Set the GPCS function to [Enabled]
2. GPCS0 Command to control the data mapping type (memory mapping and I/O mapping). By default, the memory map is 8-bit readable and writable.
3. GPCS0 start address Sets the start address of the data map. The default value is [000c8000].
4. GPCS0 MASK compare bit to determine the number of valid fields
Second: Redundancy Control Configuration
1. a.. Dual port 4k SRAM is set to [Enabled] to open 4k SRAM register space
b. SRAM Command to control data mapping type (memory mapping and I/O mapping), the default is [Mem r/w 8 bit]
c. SRAM start Address sets the start address of the data map. The default value is [000D0000]
d. SRAM Compare Bit to determine the number of valid fields, the default is [FFFFF000]
2. com9 settings
a. SB serial port 9 select serial port address
b. IRQ9 selects COM9 interrupt address
3.wachdog 0 and 1 watchdog fault-tolerance on and off
4. Invalid opcode condition The opening and closing of the illegal instruction fault tolerance function.
Fives. About fault tolerance involves concept interpretation
1. Stacking: Stack two VSX boards together through the PC104 interface.
2. BoR: BoR (bridge of redundancy), which is a set of buses designed by our company for fault tolerance. The two ends of the bus are connected to the fault-tolerant interface (Redundancy) of two motherboards (A, B).
A SYS
B SYS
1 (GND)
1 (GND)
2 (SYSTEM-A Fail out)
3 (Ext-SYSTEM Fail in)
3 (Ext-SYSTEM Fail in)
2 (SYSTEM-A Fail out)
4 (GPCS0)
5 SYS-GPCS-in
5 SYS-GPCS-in
4 (GPCS0)
6 (TxD9)
7 (RxD9)
7 (RxD9)
6 (TxD9)
.
3. COM9: There is also a com9 on my company's board. In the BIOS: Chipset → Southbridge Configuration → Redundancy Control Configuration, set the port address and interrupt number. COM9 can be used as the comparison between the two system information and keep the master and slave board. Card information synchronization, of course, you can also use com9 as a high-speed serial port for the transfer of any information between the two cards.
as the picture shows:
4. Controllable and Uncontrollable
The so-called controllable I/O means that the same I/O is used by both devices. The host computer has control I/O rights for the working machine, and it only follows the host to run and does not participate in control.
The commonly used controllable I/Os are: serial port, parallel port, GPIO, keyboard mouse.
Uncontrolled I/O: USB, Ethernet, LCD ...
six. Fault Tolerance Test
1. The test of register address 6DH judges the system by reading the data of register address 6DH. Masteror Slave, you can use the software to read the 6DH value (00000010 master, from Slave) to determine the master and slave motherboard.
as the picture shows:
2. Determine the master/slave drive by reading/writing the SRAM address value in debug mode. The Master system can access the SRAM space of the Slave system. The access (read/write) mode can be set in the BIOS. SLAVE The system cannot access the master system's SRAM. Because Master has mastership and Slave does not.
as the picture shows:
Seven. Typical case analysis
System 1
System 2
RS-232
RS-232
BoR Cable
CRT
GPIO
GPIO
ISA
The above picture shows a ground division of the aviation division. When the host is powered on, two VSX-6154 motherboards are started at the same time. One random master and one slave are determined in chronological order. Master has mastership and suppresses Slave's various functions, so the display shows Master's content. Slave just follows the Master. The mastert receives the information and transmits the information to the display. At the same time, the information matching with the slave is completed through the BoR bus to achieve the synchronization of the information. If the Master restarts because of an accident, then the Master pushes control to the Slave at the moment of crash, Slave becomes the Master after receiving the Master's signal, and will continue to run. After the original Master lost control power by restarting or repairing, it became the new Slave until the machine took over the work of the Master again.
After the control room monitors the aircraft condition, the commander instructs the aircraft to make an orderly flight, takeoff and landing. Suddenly the Master restarts accidentally. Slave immediately takes over the work of the master, collects the radar information and displays it on the display. After the restart, the Master , Become Slave, the original Master becomes Slave, read the main information through COM9, synchronize with the main information, always ready to take over the Lord's work, and all this happened before the console commander did not know . If there is only one main board, everything that happens can be imagined.
Eight. In summary, Vortex86SX fault tolerance is very simple and powerful. It mainly has the following major features:
Master/Slave motherboard harmonic operation
Can diagnose six unpredictable system crashes
Master/Slave motherboard converts within 1 us (10 negative 6th power second)
Can suppress the ISA bus under the Slave motherboard
Programmable Slave Slave Motherboard I/O Port (Set in BIOS)
Design high-speed serial port for data transmission between Master/Slave motherboards
Master/Slave Motherboard Provides 4KB SRAM for Data Exchange and Backup
Support system crash count