| ... | @@ -27,44 +27,8 @@ Fault tolerance —both hardware and software— is achieved through some kind o |
... | @@ -27,44 +27,8 @@ Fault tolerance —both hardware and software— is achieved through some kind o |
|
|
|
|
|
|
|
Reliability building block goal is to improve the reliability aspects of the universAAL platform. Therefore the reliability building block is a vertical layer cross over all layers of universAAL, especially in the Middleware. This can be done by dealing with to major challenges of reliability and enhance the system efficiency. The first action point, the creation of a framework to diagnose the system behaviour by detecting the faults that might occur during the systems operation, and take decisions to overcome such cases. Taking into consideration the existing components of the Middleware, the following components will be reused in the Diagnosis Framework: Context Events, Context Bus and the Situation Reasoner [https://github.com/universAAL/context/wiki| (see Context Group wikipages for more details)]. The Diagnosis Framework, should not create further effort on the operational load of the platform or interrupt other services. The Middleware has a message based communication.Hence, fault detection mechanisms is also using message classification algorithms in order to categorize messages and differentiate all message types interacting in the platform. The diagnosis framework uses a knowledge base of rules that determine the behaviour of the system and define possible solutions. This knowledge base has to be fed continuously with new knowledge and cases to be able to decide in more and more use cases. A Fault injection framework has been implemented to create a high effort testing scenarios for a number of nodes in an AAL space, after the end of this check, a file of feedback results can be used in the knowledge base that is used in the Diagnosis Framework. The Fault Injection Framework in its final version will be fully independent bundle from the Middleware. This will also give universAAL administrators the ability to test the functionality of any uAAL space remotely. The third bundle in the Reliability building block is the Time Triggered patch, this patch is giving the users of universAAL platform the possibility to have an advantage of using a time triggered communication in there uAAL spaces where many reliability aspects are taken already into consideration in the infrastructure used in such communication (e.g. global time synchronization, reliable communication of critical events in the system).
|
|
Reliability building block goal is to improve the reliability aspects of the universAAL platform. Therefore the reliability building block is a vertical layer cross over all layers of universAAL, especially in the Middleware. This can be done by dealing with to major challenges of reliability and enhance the system efficiency. The first action point, the creation of a framework to diagnose the system behaviour by detecting the faults that might occur during the systems operation, and take decisions to overcome such cases. Taking into consideration the existing components of the Middleware, the following components will be reused in the Diagnosis Framework: Context Events, Context Bus and the Situation Reasoner [https://github.com/universAAL/context/wiki| (see Context Group wikipages for more details)]. The Diagnosis Framework, should not create further effort on the operational load of the platform or interrupt other services. The Middleware has a message based communication.Hence, fault detection mechanisms is also using message classification algorithms in order to categorize messages and differentiate all message types interacting in the platform. The diagnosis framework uses a knowledge base of rules that determine the behaviour of the system and define possible solutions. This knowledge base has to be fed continuously with new knowledge and cases to be able to decide in more and more use cases. A Fault injection framework has been implemented to create a high effort testing scenarios for a number of nodes in an AAL space, after the end of this check, a file of feedback results can be used in the knowledge base that is used in the Diagnosis Framework. The Fault Injection Framework in its final version will be fully independent bundle from the Middleware. This will also give universAAL administrators the ability to test the functionality of any uAAL space remotely. The third bundle in the Reliability building block is the Time Triggered patch, this patch is giving the users of universAAL platform the possibility to have an advantage of using a time triggered communication in there uAAL spaces where many reliability aspects are taken already into consideration in the infrastructure used in such communication (e.g. global time synchronization, reliable communication of critical events in the system).
|
|
|
|
|
|
|
|
== Requirements ==
|
|
|
|
|
|
|
|
|
|
''High-level requirements''
|
|
|
|
|
|
|
|
|
|
* '''RC9_R1''' ''Dependability:'' The universAAL architecture shall support the delivery of services that can justifiably be trusted, where the service is the intended behavior of the system. The system must be resilient with respect to unanticipated behavior from the environment or of subsystems (e.g., transient and permanent hardware faults, design faults).
|
|
|
|
|
|
|
|
|
|
''Technical requirements''
|
|
|
|
|
|
|
|
|
|
* '''RC9_TR1''' ''Modular Certification of Subsystems:'' It must be possible to certify different subsystems individually.
|
|
|
|
|
* '''RC9_TR2''' ''Design for Testability:'' Testability shall be supported by the architecture (design testing, system-integration testing, manufacturing testing and assembly testing).
|
|
|
|
|
* '''RC9_TR3''' ''Correctness-by-Construction:'' Provably correct design methods shall be supported by the architecture with which a specification is transformed step by step into a correct design.
|
|
|
|
|
* '''RC9_TR4''': ''Delay/Disruption-tolerant Networking:'' Communication services that tolerate delays/disruption shall be provided by the architecture.
|
|
|
|
|
* '''RC9_TR5''' ''Communication Resource Guarantees:'' For messages that are exchanged within a certain subsystem, guarantees of the lower bound on the communication bandwidth, upper bounds on the latency and jitter shall be determinable.
|
|
|
|
|
* '''RC9_TR6''' ''Unreliable Components:'' The architecture must be capable to tolerate the failure of individual devices and inter-connects.
|
|
|
|
|
* '''RC9_TR7''' ''Fault Hypothesis:'' Assumptions shall be identified that define the type and frequency of faults that the sys-tem has to be able to tolerate
|
|
|
|
|
* '''RC9_TR8''' ''Error-Containment:'' The architecture must support the establishment of error containment regions, where errors can be detected with defined error-containment coverage.
|
|
|
|
|
* '''RC9_TR9''' ''Minimum of two Fault-Containment Regions:'' In case the occurrence of arbitrary (byzantine) failures within one fault containment region cannot be eliminated, an error containment region must be built of at least two fault containment regions.
|
|
|
|
|
* '''RC9_TR10''' ''Consistent membership Service:'' A membership service shall exist within the architecture that consistently provides sub-systems with the health state of other subsystems.
|
|
|
|
|
* '''RC9_TR11''' ''Generic Fault-tolerance Layer:'' A common API shall transparently mask fault-tolerance mechanisms of the environment to the application.
|
|
|
|
|
* '''RC9_TR12''' ''Tolerance of Software Errors:'' Protection mechanisms within the architecture shall be able to handle software errors.
|
|
|
|
|
* '''RC9_TR13''' ''Bounded Start-up and Restart Time:'' A known, bounded and minimal start-up time of system components has to be assured by the architecture.
|
|
|
|
|
* '''RC9_TR14''' ''Fault Classification:'' Error-detection mechanisms provided by the architecture have to distinguish between transient and permanent faults.
|
|
|
|
|
* '''RC9_TR15''' ''Pre-emptive Resource Allocation:'' The architecture must ensure that individual subsystems cannot dominate/block shared communication resources.
|
|
|
|
|
* '''RC9_TR16''' ''Worst Case Execution Time Analysis:'' The calculation of the worst-case execution time (WCET) of software modules with feasible effort shall be supported by the architecture.
|
|
|
|
|
* '''RC9_TR17''' ''Mixed-Criticality Subsystems:'' It shall be possible to use subsystems with different levels of criticality within the one system.
|
|
|
|
|
* '''RC9_TR18''' ''Diagnostic Service:'' Identification of faulty subsystems for maintenance should be supported by the architecture. The diagnostic service needs a holistic view on the system, so that correlated failures and anomalies can be detected.
|
|
|
|
|
* '''RC9_TR19''' ''No Probe Effect:'' There must be no interference from the diagnostic service on the subsystems that are diagnosed.
|
|
|
|
|
* '''RC9_TR20''' ''Systematic Diagnostic Methods:'' The detection of application-independent failures modes (e.g., communication errors) should be supported by providing systematic diagnostic methods.
|
|
|
|
|
* '''RC9_TR21''' ''Application-specific Diagnostic Methods:'' Diagnostic services should be configurable to enable the detection of application-specific failures.
|
|
|
|
|
* '''RC9_TR22''' ''State Enforcement:'' It shall be possible to set the history state of a subsystem.
|
|
|
|
|
* '''RC9_TR23''' ''Different Levels of Reliability:'' The architecture shall provide different levels of reliability of the communication service.
|
|
|
|
|
* '''RC9_TR24''' ''Handling of Changing Reliability:'' Fault tolerance mechanisms shall be capable of adapting to changed reliability of subsystems over lifetime.
|
|
|
|
|
* '''RC9_TR25''' ''Replication:'' Replicas and voting mechanisms (e.g., triple-modular redundancy) shall be provided for error detection and error masking.
|
|
|
|
|
* '''RC9_TR26''' ''Replica Determinism:'' For replicated components, replica determinism has to be assured (i.e., replicated components are in the same state and produce the same output within a defined interval of time).
|
|
|
|
|
|
|
|
|
|
==Artefact #1 : Failure Diagnosis Module in universAAL==
|
|
==Artefact #1 : Failure Diagnosis Module in universAAL==
|
|
|
|
|
|
|
|
|
|
|
|
|
==== Blackbox Description ====
|
|
==== Blackbox Description ====
|
|
|
|
|
|
|
|
Fault Diagnosis is the process of determining the type, size and location of the most possible fault together with the temporal specification of the fault. Diagnosis is the reasoning process for detection, isolation, analysis and recovery of occurring faults. A Symptom is the subjective evidence of a failure that indicates the existence of fault.
|
|
Fault Diagnosis is the process of determining the type, size and location of the most possible fault together with the temporal specification of the fault. Diagnosis is the reasoning process for detection, isolation, analysis and recovery of occurring faults. A Symptom is the subjective evidence of a failure that indicates the existence of fault.
|
| ... | @@ -94,24 +58,6 @@ The main components for the diagnosis infrastructure for universAAL are as follo |
... | @@ -94,24 +58,6 @@ The main components for the diagnosis infrastructure for universAAL are as follo |
|
|
|-
|
|
|-
|
|
|
|}
|
|
|}
|
|
|
|
|
|
|
|
==== Requirements ====
|
|
|
|
|
* '''RC9_TR1''' ''Modular Certification of Subsystems''
|
|
|
|
|
* '''RC9_TR6''' ''Unreliable Components''
|
|
|
|
|
* '''RC9_TR7''' ''Fault Hypothesis''
|
|
|
|
|
* '''RC9_TR8''' ''Error-Containment''
|
|
|
|
|
* '''RC9_TR9''' ''Minimum of two Fault-Containment Regions''
|
|
|
|
|
* '''RC9_TR12''' ''Tolerance of Software Errors''
|
|
|
|
|
* '''RC9_TR14''' ''Fault Classification''
|
|
|
|
|
* '''RC9_TR17''' ''Mixed-Criticality Subsystems''
|
|
|
|
|
* '''RC9_TR18''' ''Diagnostic Service''
|
|
|
|
|
* '''RC9_TR19''' ''No Probe Effect''
|
|
|
|
|
* '''RC9_TR20''' ''Systematic Diagnostic Methods''
|
|
|
|
|
* '''RC9_TR21''' ''Application-specific Diagnostic Methods''
|
|
|
|
|
* '''RC9_TR22''' ''State Enforcement''
|
|
|
|
|
* '''RC9_TR23''' ''Different Levels of Reliability''
|
|
|
|
|
* '''RC9_TR24''' ''Handling of Changing Reliability''
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
=== Features ===
|
|
|
This artefact offers the following features.
|
|
This artefact offers the following features.
|
|
|
|
|
|
| ... | @@ -489,16 +435,6 @@ Because of its importance in fault tolerance operation, an Error detection frame |
... | @@ -489,16 +435,6 @@ Because of its importance in fault tolerance operation, an Error detection frame |
|
|
|-
|
|
|-
|
|
|
|}
|
|
|}
|
|
|
|
|
|
|
|
=== Requirements ===
|
|
|
|
|
|
|
|
|
|
* '''RC9_TR1''' ''Modular Certification of Subsystems.''
|
|
|
|
|
* '''RC9_TR11''' ''Generic Fault-tolerance Layer.''
|
|
|
|
|
* '''RC9_TR12''' ''Tolerance of Software Errors.''
|
|
|
|
|
* '''RC9_TR13''' ''Bounded Start-up and Restart Time.''
|
|
|
|
|
* '''RC9_TR14''' ''Fault Classification.''
|
|
|
|
|
* '''RC9_TR21''' ''Application-specific Diagnostic Methods.''
|
|
|
|
|
* '''RC9_TR22''' ''State Enforcement.''
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
=== Features ===
|
|
|
EDU comes to enhance the reliability of the universal platform by discovering the faults of the exchanged messages in different domains. The discovered faults can then be forwarded to the diagnostic unit to take the suitable action. Several fault detection methods has been implemented in order to cover a wide range of faults. These methods may be classified as follow:
|
|
EDU comes to enhance the reliability of the universal platform by discovering the faults of the exchanged messages in different domains. The discovered faults can then be forwarded to the diagnostic unit to take the suitable action. Several fault detection methods has been implemented in order to cover a wide range of faults. These methods may be classified as follow:
|
|
|
*Detecting the faults in time domain for both of the periodic and sporadic messages. These methods have the ability to detect the temporary and the permanent faults in time domain
|
|
*Detecting the faults in time domain for both of the periodic and sporadic messages. These methods have the ability to detect the temporary and the permanent faults in time domain
|
| ... | @@ -647,16 +583,6 @@ The framework includes the following: |
... | @@ -647,16 +583,6 @@ The framework includes the following: |
|
|
|-
|
|
|-
|
|
|
|}
|
|
|}
|
|
|
|
|
|
|
|
=== Requirements ===
|
|
|
|
|
In the flowing, the list of the related requirements and there status is presented:
|
|
|
|
|
* ''' RC9_TR2''' ''Design for Testability''
|
|
|
|
|
* ''' RC9_TR3''' ''Correctness-by-Construction''
|
|
|
|
|
* ''' RC9_TR7''' ''Fault Hypothesis''
|
|
|
|
|
* ''' RC9_TR12''' ''Tolerance of Software Errors''
|
|
|
|
|
* ''' RC9_TR14''' ''Fault Classification''
|
|
|
|
|
* ''' RC9_TR16''' ''Worst Case Execution''
|
|
|
|
|
* ''' RC9_TR18''' ''Diagnostic Service''
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
=== Features ===
|
|
|
|
|
|
|
|
Enhance the systems testability by using a fault injection framework to ensure the platform reliability and tolerant to faults that may occur during run time.
|
|
Enhance the systems testability by using a fault injection framework to ensure the platform reliability and tolerant to faults that may occur during run time.
|
| ... | @@ -928,16 +854,6 @@ According to the mentioned facts above , dealing with AAL environments mean deal |
... | @@ -928,16 +854,6 @@ According to the mentioned facts above , dealing with AAL environments mean deal |
|
|
|-
|
|
|-
|
|
|
|}
|
|
|}
|
|
|
|
|
|
|
|
=== Requirements ===
|
|
|
|
|
* '''RC9_TR3''' ''Correctness-by-Construction''
|
|
|
|
|
* '''RC9_TR4''' ''Delay/Disruption-tolerant''
|
|
|
|
|
* '''RC9_TR5''' ''Communication Resource''
|
|
|
|
|
* '''RC9_TR6''' ''Unreliable Components''
|
|
|
|
|
* '''RC9_TR10''' ''Consistent membership Service''
|
|
|
|
|
* '''RC9_TR11''' ''Generic Fault-tolerance Layer''
|
|
|
|
|
* '''RC9_TR13''' ''Bounded Start-up and Restart Time''
|
|
|
|
|
* '''RC9_TR15''' ''Pre-emptive Resource Allocation''
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
=== Features ===
|
|
|
This uAAL based extension module provides Time-triggered temporal guarantees and facilitates the use of fault tolerance specifications to support the reliability of the uAAL system.
|
|
This uAAL based extension module provides Time-triggered temporal guarantees and facilitates the use of fault tolerance specifications to support the reliability of the uAAL system.
|
|
|
|
|
|
| ... | @@ -1132,17 +1048,6 @@ There are two functions of redundancy to prevent performance failure from exceed |
... | @@ -1132,17 +1048,6 @@ There are two functions of redundancy to prevent performance failure from exceed |
|
|
|-
|
|
|-
|
|
|
|}
|
|
|}
|
|
|
|
|
|
|
|
=== Requirements ===
|
|
|
|
|
|
|
|
|
|
* '''RC9_TR2''' ''Design for Testability,''
|
|
|
|
|
* '''RC9_TR3''' ''Correctness-by-Construction.''
|
|
|
|
|
* '''RC9_TR5''' ''Communication Resource Guarantees.''
|
|
|
|
|
* '''RC9_TR12''' ''Tolerance of Software Errors.''
|
|
|
|
|
* '''RC9_TR23''' ''Different Levels of Reliability.''
|
|
|
|
|
* '''RC9_TR24''' ''Handling of Changing Reliability.''
|
|
|
|
|
* '''RC9_TR25''' ''Replication.''
|
|
|
|
|
* '''RC9_TR26''' ''Replica Determinism.''
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
=== Features ===
|
|
|
Provide replication and voting mechanisms for the uAAL application and message transmitted in the uAAL space for error detection and error masking.
|
|
Provide replication and voting mechanisms for the uAAL application and message transmitted in the uAAL space for error detection and error masking.
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |