38 lines
		
	
	
		
			1.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			38 lines
		
	
	
		
			1.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| What:		/sys/devices/system/machinecheck/machinecheckX/tolerant
 | |
| Contact:	Borislav Petkov <bp@suse.de>
 | |
| Date:		Dec, 2021
 | |
| Description:
 | |
| 		Unused and obsolete after the advent of recoverable machine
 | |
| 		checks (see last sentence below) and those are present since
 | |
| 		2010 (Nehalem).
 | |
| 
 | |
| 		Original description:
 | |
| 
 | |
| 		The entries appear for each CPU, but they are truly shared
 | |
| 		between all CPUs.
 | |
| 
 | |
| 		Tolerance level. When a machine check exception occurs for a
 | |
| 		non corrected machine check the kernel can take different
 | |
| 		actions.
 | |
| 
 | |
| 		Since machine check exceptions can happen any time it is
 | |
| 		sometimes risky for the kernel to kill a process because it
 | |
| 		defies normal kernel locking rules. The tolerance level
 | |
| 		configures how hard the kernel tries to recover even at some
 | |
| 		risk of	deadlock. Higher tolerant values trade potentially
 | |
| 		better uptime with the risk of a crash or even corruption
 | |
| 		(for tolerant >= 3).
 | |
| 
 | |
| 		==  ===========================================================
 | |
| 		 0  always panic on uncorrected errors, log corrected errors
 | |
| 		 1  panic or SIGBUS on uncorrected errors, log corrected errors
 | |
| 		 2  SIGBUS or log uncorrected errors, log corrected errors
 | |
| 		 3  never panic or SIGBUS, log all errors (for testing only)
 | |
| 		==  ===========================================================
 | |
| 
 | |
| 		Default: 1
 | |
| 
 | |
| 		Note this only makes a difference if the CPU allows recovery
 | |
| 		from a machine check exception. Current x86 CPUs generally
 | |
| 		do not.
 |