| ... | @@ -542,532 +542,5 @@ Three fault detection methods based on directly measured signal have been realiz |
... | @@ -542,532 +542,5 @@ Three fault detection methods based on directly measured signal have been realiz |
|
|
|
|
|
|
|
The fault types that have been covered by the EDU are classified as general faults that are not related to a specific application. Many other application specific fault detection methods can be inserted to the EDU by the developer of uAAL components by utilizing the data structure of the EDU.
|
|
The fault types that have been covered by the EDU are classified as general faults that are not related to a specific application. Many other application specific fault detection methods can be inserted to the EDU by the developer of uAAL components by utilizing the data structure of the EDU.
|
|
|
|
|
|
|
|
==Artefact #3 : Testing Module of systems fault-tolerant using Fault Injection Framework ==
|
|
|
|
|
|
|
|
|
|
=== Blackbox Description ===
|
|
|
|
|
The Fault Injection Framework is used on a distributed universAAL Platform to test the system’s reliability and safety. The test case [https://raw.githubusercontent.com/wiki/universAAL/middleware/1.png Cluster] consists of 5 nodes that implement AAL application on UniversAAL platform. The nodes are connected by two main communication channels; i.e. Ethernet communication and Time Triggered Ethernet communication channels, and controlled by server.
|
|
|
|
|
|
|
|
|
|
The framework includes the following:
|
|
|
|
|
* Development of experimental framework with the following specifications:
|
|
|
|
|
** Linux Based Execution Environment that is able to work within real time communication. The nodes must operate under real time Linux operating system. This type of operating systems with special kernel modifications facilitates real time communication between the node and its environment.
|
|
|
|
|
** Ethernet Communication between server and nodes: the system must serve the Ethernet communication between the nodes and the server. As the server needs to control the nodes within the network, the Ethernet communication will allow the server to send instructions to the nodes and receive results from them
|
|
|
|
|
* Execution environment on nodes for experiment: The framework must be equipped with special configuration that represents a suitable environment for experiments on the UniversAAL platform. This includes:
|
|
|
|
|
** TTEthernet configuration on both nodes and switch
|
|
|
|
|
** Ethernet Configuration on both server and nodes
|
|
|
|
|
* Experimental process on server and nodes: The system construction must facilitate the controlling process of the server on the nodes using special tools. The server should be able to:
|
|
|
|
|
** Assign tasks and transfer it to the nodes
|
|
|
|
|
** Run the tasks on the nodes
|
|
|
|
|
** Receive logs for results
|
|
|
|
|
* Real time experimental test application: the application must give an example of real time communication between nodes. It should be able to perform the following:
|
|
|
|
|
** Task Assignment from the server to the nodes.
|
|
|
|
|
** Real time communication between the nodes during the task execution.
|
|
|
|
|
** Collecting and sending of results from the nodes to the server.
|
|
|
|
|
* AAL application: The nodes must run AAL components under the universAAL platform.
|
|
|
|
|
* Fault Injection: The system must be tested under software fault injection to deal with safety and reliability issues. Several experiments must be implemented under different fault injection scenarios.
|
|
|
|
|
|
|
|
|
|
=== Bundles ===
|
|
|
|
|
{| border="1" style="cellspacing=0; bordercolor=gray; align=left; valign=top;"
|
|
|
|
|
! align="left" bgcolor="#DDDDDD" colspan="2" | Artifact: '' Testing Module of systems fault-tolerant using Fault Injection Framework ''
|
|
|
|
|
|-
|
|
|
|
|
| GIT Address
|
|
|
|
|
| http://github.com/universAAL/support/tree/master/reliability/Fault%20Injection%20Framework
|
|
|
|
|
|-
|
|
|
|
|
| Javadoc
|
|
|
|
|
|
|
|
|
|
|
|-
|
|
|
|
|
| Design Diagrams
|
|
|
|
|
| [https://raw.githubusercontent.com/wiki/universAAL/middleware/8.png Sending Algorithm], [https://raw.githubusercontent.com/wiki/universAAL/middleware/9.png Receiving Algorithm], [https://raw.githubusercontent.com/wiki/universAAL/middleware/11.png Framework Launch script], [https://raw.githubusercontent.com/wiki/universAAL/middleware/12.png/400px-12.png Server Script], [https://raw.githubusercontent.com/wiki/universAAL/middleware/13.png Node Side Script]
|
|
|
|
|
|-
|
|
|
|
|
| Reference Documentation
|
|
|
|
|
| https://github.com/universAAL/support/wiki/RD-Fault-Injection
|
|
|
|
|
|-
|
|
|
|
|
|}
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
|
|
|
|
|
|
|
|
Enhance the systems testability by using a fault injection framework to ensure the platform reliability and tolerant to faults that may occur during run time.
|
|
|
|
|
|
|
|
|
|
=== Design Decisions ===
|
|
|
|
|
|
|
|
|
|
===== Model Concept =====
|
|
|
|
|
In this section the model concept will be discussed together with its main construction and functionality. In following figure the model concept is illustrated:
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/2.png|400px|center]]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Controller: The controller rule is to instruct and direct the fault injection tool. The controller will initiate the fault injection process, and will receive the results of each experiment implanted.
|
|
|
|
|
* Fault injection tool: This part is responsible of initiating the faults and injects them into the AAL components. It will collect the results and send it back to the controller.
|
|
|
|
|
* Communication Environment: This part is responsible of the communication between the AAL components. It could be Ethernet connection or Real-Time communication system such as Time Triggered Ethernet.
|
|
|
|
|
* AAL components: These are the components of the system. A node can run one or more components on UniversAAL platform.
|
|
|
|
|
|
|
|
|
|
==== Model specifications ====
|
|
|
|
|
In this section the model which is used in our framework is presented. The framework uses the universAAL platform as it is an AAL platform. Thus the framework is designed in such a way that it serves for the AAL applications within the universAAL platform which differentiates it from the other existing frame works. It has its own specification that facilitates the integration of the universAAL platform.
|
|
|
|
|
# Distributed system: The AAL applications can be implemented on distributed system where each part of this system runs a different AAL application and communicate with the other parts. Thus the used framework consists of several nodes that run AAL applications communicating with each other.
|
|
|
|
|
# Ethernet Communication: The universAAL platform components have their own communication environment that uses the Ethernet protocol for its communication. The nodes within the model are able to communicate between each other using the Ethernet communication.
|
|
|
|
|
# Real-Time Communication: The development for the universAAL communication system and applications introduce the use of the real time communication system within the communication environment. The model serves this functionality by providing the ability to use real time communication within the communication system between its components.
|
|
|
|
|
# Fault Injection: The model must be able to apply fault injection application within the universAAL platform to serve for fault tolerance application.
|
|
|
|
|
|
|
|
|
|
==== The Model Node Construction ====
|
|
|
|
|
The nodes used in the model consist of one single component at each node, The AAL components are nodes that are configured to run the UniversAAL platform. These nodes must contain important tools in order to achieve the specified services.
|
|
|
|
|
* Real Time Operating System: The model was constructed in such a way that it considers the Real-Time Communication. In order to serve this functionality the nodes must run on a real time operating system, which will provide the operating system requirements for the real time communication system.
|
|
|
|
|
* Real Time Configuration Tool: This tool is used to change the mode of communication between the nodes to Real-Time Communication mode whenever this needed; it will load the Real-Time Configuration file to the operating system kernel and will configure the network interfaces.
|
|
|
|
|
* Ethernet Controlling Tool: This tool will facilitate the communication between the nodes and the Controller, and will provide the ability to control the operations and instructions running inside the nodes by the server.
|
|
|
|
|
* Real Time Communication Tool: It is used in order to facilitate the real time communication between the nodes and to provide the real time specifications.
|
|
|
|
|
* AAL Platform: The universAAL platform is used as standard AAL platform at which the nodes will run its AAL applications.
|
|
|
|
|
|
|
|
|
|
==== Sequence of Actions of the universAAL-Based Fault Injection Framework ====
|
|
|
|
|
In this section the proposed and implemented UniversAAL-Based Fault Injection Framework is illustrated with its concept and sequence of phases. Programmed Software is used in order to inject faults in the system. Our framework has the ability to run within the universAAL platform. It injects faults, collects the results and sends it to the server. Details regarding all Fault Injection sequence phases are presented in the following figure:
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/f3.png|600px|center]]
|
|
|
|
|
|
|
|
|
|
* Start up: The Server (Controller) sends a startup command to the nodes synchronously, in order to initiate the fault Injection implementation inside the nodes.
|
|
|
|
|
* Updating the specified task and clean last results files: Each node has a list of tasks that are assigned from the server, and a log of files created after the tasks execution containing the results. After receiving the start up command form the server, the nodes will start to prepare for the implementation. This includes updating the new list of tasks from the server and cleaning all the log files to be ready for the new results.
|
|
|
|
|
* Initiate the universAAL platform and join the bus: The model run within the universAAL platform and uses its bus model for communication. At this stage each node will start to initiate the universAAL Platform that is required to run the AAL application. This includes initiating the universAAL layers and bundles, joining the platform bus and discovering the other nodes connected to the bus.
|
|
|
|
|
* Run the application and save the results on the local nodes: At this phase, the nodes will startup the updated task. During this task the nodes will communicate with each other and will be disturbed by fault injecting process and this fault injection process is done programmatically inside the specified task. During this period the nodes will run several experiments and produce several results. These results will be saved synchronously within the running phase in the local nodes.
|
|
|
|
|
* Exit the bus and shutdown the universAAL platform: After finishing the implementation, each node will leave the bus followed by closing the connection. Finally the universAAL platform layers and bundles will be shutdown.
|
|
|
|
|
* Send the results to the server for each node: The nodes will send the results of the implementation to the server as files.
|
|
|
|
|
* Collect and arrange all the results collected from the nodes in one folder: The server will start to collect the received results from the nodes. Then it will rename, arrange and process them according to the nodes number.
|
|
|
|
|
|
|
|
|
|
==== Usage of the Fault Injection Framework for Diagnosis ====
|
|
|
|
|
This fault injection framework provides the starting point for the current development of the diagnosis framework for universAAL. Based on the components’ behavior and fault statistics provided by the fault injection framework, the knowledge of the symptoms that lead to fault can be formulated and based on these symptoms behavior, the fault can be classified by the diagnosis system. The relationship between the outcome of the fault injection framework and the diagnosis framework can be depicted in the following figure.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/14.png|400px|center]]
|
|
|
|
|
|
|
|
|
|
The results that we get from the fault injection provide observation that builds up the symptom for diagnosis. The intermediate events between fault and symptoms are not always visible based on the symptoms behavior. So this symptom-event-fault chain will provide the path to diagnosis and the current diagnosis framework will handle these symptoms in such a way that the fault can be correctly classified and so the diagnostic measure for that specific fault can be realized in universAAL. This will also facilitate the use of situation reasoner to create, publish and consume the context event related to different faults. Future work includes the identification of the different fault scenarios, implementing different error detectors and implementing the diagnostics framework for universAAL.
|
|
|
|
|
|
|
|
|
|
=== Implementation ===
|
|
|
|
|
In order to build the model that was presented before, a certain procedure must be followed. In this chapter the framework’s implementation procedure is presented including all the information about the required hardware and software construction. To be able to test and use the framework, UniversAAL-based cluster is built.
|
|
|
|
|
|
|
|
|
|
==== Initial implementation from selected input projects ====
|
|
|
|
|
There were no initial implementation from the input projects.
|
|
|
|
|
|
|
|
|
|
==== Implementation Plan ====
|
|
|
|
|
'''Setup an Emulation Environment on a Distributed Cluster:'''
|
|
|
|
|
|
|
|
|
|
In this sub-section, all the required information about the distributed system that is used is provided; it includes all the hardware and software description and implementation.
|
|
|
|
|
|
|
|
|
|
===== Hardware =====
|
|
|
|
|
Please refer to the Reference Documentation for the Hardware set up of the test case.
|
|
|
|
|
|
|
|
|
|
===== Software Construction =====
|
|
|
|
|
In our system, the nodes must be configured with special software utilities in order to make the system able to achieve the specified tasks (i.e. run the universAAL Platform, run AAL applications).
|
|
|
|
|
The Real-Time operating system is one of the most important parts, for the system that we will use, the Linux real time operating system (RT-Linux) was chosen, even though a lot of efficient real time operating systems can be used such as RTAI, but since; the used Time Triggered switch’s drivers and configuration instruction are compatible for RT-Linux, the installation and configuration of RT-Linux are much easier, the RT-Linux is more used and has often updates and maintenance service , the RT-Linux is chosen.
|
|
|
|
|
|
|
|
|
|
The Secure Shell protocol (SSH server) utility is used to have a direct and secure access to the nodes, which makes it easy to enter, control and modify from the server part. The Network File System (NFS) utility is used to achieve the mount process between the nodes and the server, the mount process facilitates sharing folders between the nodes and the server which can be used to transfer files between them. The Real Time communication require special configuration files to be loaded to the Linux kernel in the nodes part, and special configuration on the TTEthernet Switch.
|
|
|
|
|
Finally the nodes run AAL components under the universAAL platform which must be included within the nodes
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/5.png|300px|center]]
|
|
|
|
|
|
|
|
|
|
==== Fault Injection Mechanism ====
|
|
|
|
|
The Software fault injection application is used in order to execute and enhance fault tolerant mechanisms. The application is designed to run using the universAAL platform; it uses the platform for its operation and communication. This application is designed in such a a way that allows the user to implement fault injection process. For this purpose, the program specifications must be designed in a way that will make it easier to the developer to inject faults in the application to test the nodes behavior.
|
|
|
|
|
|
|
|
|
|
===== Track the Application Behavior =====
|
|
|
|
|
The application execution will be initiated from the server; the server will send the instructions synchronously to the nodes which will start running the application. At the end, the results will be collected and sent to a specified results folder in the server in a log file.
|
|
|
|
|
|
|
|
|
|
The time line for the application as it is illustrated in section 3.2.4 shows the main steps that are implemented:
|
|
|
|
|
# The developer can start the application through a specified command in the server, the server will connect to the nodes using “Secure Shell Server (ssh server)” utility and run and certain scripts inside the nodes.
|
|
|
|
|
# The nodes will start updating the task assigned from the server using “Network File System (NFS server)” utility.
|
|
|
|
|
# The universAAL platform will be initiated and the nodes will join the bus and start recognizing other nods on the bus.
|
|
|
|
|
# After stabilizing the bus service, the Send-Receive application will start up and the results for each iteration will be saved locally by the nodes in a log file.
|
|
|
|
|
# When the execution is finished the node will exit the bus and shutdown the platform.
|
|
|
|
|
# The nodes will send the results to the server using NFS server utility.
|
|
|
|
|
# The server will collect the results in one folder.
|
|
|
|
|
|
|
|
|
|
===== Application Specifications =====
|
|
|
|
|
The application with special specifications that allows the use of different testing scenarios for fault injection process, and facilitates the application execution.
|
|
|
|
|
|
|
|
|
|
''Message Contents''
|
|
|
|
|
|
|
|
|
|
According to the universAAL platform, the data transfer between nodes depends on events, the event contains information about a device; like the status of the device or changes in the device properties.etc. For this reason a virtual device should be created, this device represents the source of the event, the event now gives information about this device. For simplicity the device was chosen to be gauge, and the property that we want to send its status by the event, is the change of battery level.
|
|
|
|
|
The device should have unique property which differentiates each device from another, this property can be used in the receiving part nodes to express interest to specific events type that are sent on the bus; If this event is within the receiver interest, the node will receive this event.
|
|
|
|
|
|
|
|
|
|
Multiple devices must be used, to create more than one type of events, these events are:
|
|
|
|
|
* Startup event: to notify the receiving nodes that the process of sending events is starting up or a new loop of sending events is starting up.
|
|
|
|
|
* Actual event: it is the event that will be counted and considered for the fault Injection application. This event will be sent according to the time and rate specification.
|
|
|
|
|
* End event: this event is used to notify the receivers that the event sending loop is. And according to that, the receivers will initiate new receiving session.
|
|
|
|
|
* Exit event: this event is used to end the testing operation.
|
|
|
|
|
Each of the devices has a unique source label. This label will be used to differentiate the source of the event in the received part.
|
|
|
|
|
|
|
|
|
|
''Send-Receive methodology''
|
|
|
|
|
|
|
|
|
|
According to the universAAL platform, sending events to the bus is done using (Context Publisher) , the application must initiate this object and connect it to specific provider, this provider represents the source of the publisher. Once the Publisher is initiated, it will be connected to the bus and it will be ready to send events. After finishing the task, the context publisher must be closed and disconnected from the bus.
|
|
|
|
|
Receiving the events is done using (Context Subscriber) object, the application must initiate the context subscriber and define its restrictions; these restrictions are defining the interest of the subscriber, so if the events sent to the bus are in the subscriber interest, the event will be received in the node, otherwise nothing will be received.
|
|
|
|
|
To make the application more efficient, the sending and receiving procedure is done on different threads, which allows the developer to control each thread separately.
|
|
|
|
|
|
|
|
|
|
''Timing''
|
|
|
|
|
|
|
|
|
|
In the application there are different timing aspects depending on the sending and receiving procedure:
|
|
|
|
|
* Event Sending Rate:
|
|
|
|
|
The rate at which the node sends events has a major effect on the reliability and efficiency of the sending-receiving process, and it is one of the fault injection scenarios that can be used. This rate must be defined as a variable in order to be controlled by the developer. In our application, where we have two sending nodes, the rate of sending events can be controlled separately and adjusted according to the test specifications.
|
|
|
|
|
* Minimum Event Sending Rate:
|
|
|
|
|
Sending events to the bus depends not only on the sending instruction execution time, but also the delay of the bus; once the event sending command executed the time for the execution consists of the actual execution time and the delay caused by the bus.
|
|
|
|
|
In order to choose the effective and suitable sending events rate, several experiments were executed on the nodes. Through these experiments the number of the nodes connected to the bus were changed with every trail; starting with 1 node and ending with 5 nodes, while node1 is sender and other node are receivers. The average of the event sending rate was calculated and the results are shown in the following graph.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/7.png|500px|center]]
|
|
|
|
|
|
|
|
|
|
According to these results the minimum event sending rate must be 4.5 event/ms.
|
|
|
|
|
* Inter burst time :
|
|
|
|
|
Between each loop of sending events, there must be a pause time; this time is a safety time period which used to give flushing time to clean the buses, to ensure that all the events that are waiting it the bus are sent and received. For each event there is a waiting time where the events waits in the bus queue, if the sending publisher were terminated before finishing receiving these events, some of the events will be lost. This time is very important in the fault injection procedure and can be tolerated to test the nodes behavior.
|
|
|
|
|
* Iteration Time :
|
|
|
|
|
Send events loop is controlled by a predefined running time. In the application, this time is set to be variable and can be adjusted according to the test specification.
|
|
|
|
|
|
|
|
|
|
''Results Output''
|
|
|
|
|
|
|
|
|
|
In order to make it easier to collect and process the results of the sending-receiving operation, the results must be saved in a separate file that contains the results of each loop. Each node has its own results file which contains information according to its behaviour (Sending or receiving):
|
|
|
|
|
* Sending Results: the results information that must be included in the results file are:
|
|
|
|
|
** Iteration Number: specifies the sending loop number, and it is increased by each new loop. The start and the end of the iteration are specified by the sender node.
|
|
|
|
|
** Start Sending time (only for the event sending nodes): this time is used to stamp the loop starting time in the sender node.
|
|
|
|
|
** End Sending Time (only for the event sending nodes): this time is used to stamp the loop ending time.
|
|
|
|
|
** Number of Events: this number indicates the number of events that was sent by the node.
|
|
|
|
|
* Receiving Results: the results information that must be included in the results file are:
|
|
|
|
|
** Iteration Number: specifies the receiving iteration number, and it is increased by each new iteration. The start and the end of the iteration are specified by the sender node.
|
|
|
|
|
** Start Receiving time: this time is used to stamp the receiving loop starting time in the receiving node.
|
|
|
|
|
** End Receiving time: This time is used to stamp the receiving loop end time in the receiving node.
|
|
|
|
|
** Number Of Events: this number is indicates the number of events that was received by the node.
|
|
|
|
|
In order to make it more efficient for reading and handling the results, the information must be saved in a log file and delimited with “Tabs”.
|
|
|
|
|
|
|
|
|
|
''Using the runners for execution''
|
|
|
|
|
|
|
|
|
|
The application must initiate the universAAL platform and run the application without using graphical user interface, this will make it easier and will give the ability to the developer to run the application without user interface (run using ssh server), and also this allows the server to control and run the application using automated scripts.
|
|
|
|
|
Pax runner utility can be used to create “Felix” frame work, where the universAAL platform bundles can be started.
|
|
|
|
|
|
|
|
|
|
''Scripting''
|
|
|
|
|
|
|
|
|
|
In order to make it more efficient, the whole fault injection application must be implemented using scripts, which automates the whole procedure and allows creating complex fault injection testing:
|
|
|
|
|
|
|
|
|
|
* Server script: it is necessary to start the application simultaneously on all the nodes, so all the node will have same running time. Also it is more efficient to run the application using one command, rather than logging to each node and start the application form it local drive.
|
|
|
|
|
The server script must contain all the commands required to start the application at each node at the same time, also it must not wait for the results, otherwise it will be not simultaneous executed. On another hand the output must not interrupt the whole procedure and thus the application will run on the nodes in background.
|
|
|
|
|
* Nodes Script: this script must be on the nodes , it aims to:
|
|
|
|
|
** Get the last update for the application from the server using nfs server
|
|
|
|
|
** Startup the pax runner, the pax runner will initiate the universAAL platform and run the application.
|
|
|
|
|
** Send the results file to the server and clean the file for new test.
|
|
|
|
|
|
|
|
|
|
''Fault Injection Scenarios''
|
|
|
|
|
|
|
|
|
|
There are several scenarios for fault injection that can be implemented:
|
|
|
|
|
* Changing the pause time, this will affect the events waiting time in the events queue inside the bus..
|
|
|
|
|
* Disturbing the bus by creating faulty behaviour in one node, this will lead the node to connect and disconnect to the bus for several times aimlessly.
|
|
|
|
|
* Change the number of sending nodes and their event sending rate.
|
|
|
|
|
* Stop a node during the receiving process.
|
|
|
|
|
* Initiating multiple context subscriber or context publisher
|
|
|
|
|
|
|
|
|
|
===== Fault Injection Process Model =====
|
|
|
|
|
Through this section the construct steps for the fault injection application will be presented. The first step in the creation of the fault injection application, is to define the main programs that will be used for sending and receiving the events, they are separated into two programs ; Send program (which sends the events) and Receive program (which receives the events). Then a pax runner script will be used to initiate the Felix frame work that will launch the universAAL platform. At the same time some scripts must be programmed on both the server and the nodes to make the process automated and controlled by the server.
|
|
|
|
|
|
|
|
|
|
'''Programs and scripts description'''
|
|
|
|
|
* Send Program :
|
|
|
|
|
This program is used to send the events from specific node to the other nodes, the program is design in such a way that allows the user to implement different scenarios of fault injection by changing the program parameters, find the send program framework description in following [https://raw.githubusercontent.com/wiki/universAAL/middleware/8.png diagram].
|
|
|
|
|
|
|
|
|
|
The Sending program phases are:
|
|
|
|
|
|
|
|
|
|
# Initiation process : at this stage the program will initiate some essential objects that include; the context publisher that is responsible for sending the events to the bus ,and the context provider that represents the source of the context publisher.
|
|
|
|
|
# Creating the virtual devices: here the virtual devices which the events represent it’s status are created.
|
|
|
|
|
# Generating the events: now the devices are ready to issue their status, the events that connected to the devices status will be defined and ready to be sent.
|
|
|
|
|
# Checking the loop number: according to the program scenario the user will provide the number of experiments that must be done within each execution, this number also represents the number of the iterations that the program will perform, if the total number of the loops are done, the program will proceed to the exit process otherwise it will continue with the next step.
|
|
|
|
|
# Send start event: after entering the loop, and before start sending the status message, the program will send the starting event, to inform the other nodes that it is starting a new loop of sending events, in order to make these nodes ready for new running loop.
|
|
|
|
|
# Save time stamp: the time of starting a new loop, will be saved in a predefined variable, to be added to the results file at save results step in order to use it in the results processing.
|
|
|
|
|
# Delay: this parameter is responsible for the delay between sending two sequential events, by manipulating this parameter, the user can change the sending event rate which can be used in the fault injection process.
|
|
|
|
|
# Send status event: at this stage the status events will be sent.
|
|
|
|
|
# Check for the loop time: each loop of sending status event are controlled by the loop period, this period is specified by the user. If this period is finished, the program will exit the loop and save the results, otherwise it will go back to the sending status events phase.
|
|
|
|
|
# Wait after checking: when the sending status event phase finished, the program will give time to the events that are waiting on the queue in the bus to be sent and to clean the bus form the queued sent events. This parameter of time can be a possible fault injection factor.
|
|
|
|
|
# Save Time stamp: the time stamp of ending the loop is saved in a particular variable in order to be used in saving results phase.
|
|
|
|
|
# Send end loop event: the sending node sends to the other nodes the ending event to inform them that his sending loop is finished.
|
|
|
|
|
# Save Results: the results of the sending process will be saved. These results include; loop number, the time stamp of stating each loop, the time stamp of ending each loop and the number of events that was sent.
|
|
|
|
|
# Wait after save: this “wait-time” is used to give time gap between each loop of sending events, in order to give a period of time to the other nodes to save the results of the last loop and get ready to the new one.
|
|
|
|
|
# Send End event: when the whole operation of sending the events finished and the experiment is finished, the node will send the end process event to the other nodes to inform them that the experiment is finished.
|
|
|
|
|
|
|
|
|
|
* Receiving Program
|
|
|
|
|
The [https://raw.githubusercontent.com/wiki/universAAL/middleware/9.png Receive Program] is used by the nodes in order to receive the events that are sent by the other nodes. It consists of two parts; the main part which initiates all the objects that are needed for the receiving task, and the csubscriber which is the actual part that is responsible for receiving the events.
|
|
|
|
|
|
|
|
|
|
The receiving program description is as follow:
|
|
|
|
|
|
|
|
|
|
# Start phase: at which the program initiate.
|
|
|
|
|
# Wait: at the starting phase, the receive function waits for a specific period of time. This time is used to give the frame work enough time to be registered on the bus and to recognize any other nodes that are connected on the bus.
|
|
|
|
|
# Initiation process: at this stage the program initiates the required objects for the receiving process, this include the context event pattern, which will specify the restriction on receiving the events type defined by the context subscriber.
|
|
|
|
|
# Create Csubscriber: this step will create context subscriber object, which is responsible for receiving the events.
|
|
|
|
|
|
|
|
|
|
The context subscriber object, uses another thread to receive the events, it has several functions each one is responsible for a special task, the most critical part of our implementation is the [https://raw.githubusercontent.com/wiki/universAAL/middleware/10.png “handle context event”] function, which is modified in order to work accordingly to Framework requirements.
|
|
|
|
|
|
|
|
|
|
# Receiving events: this function is executed whenever an event is sent to the bus and it’s within the Csubscriber interest, the first step is to receive this event and identify its properties.
|
|
|
|
|
# Assign the event source: through this phase the event source will be defined form which node is this event coming from in order to put this event within its prober counter category.
|
|
|
|
|
# Check the event Category: the event that will be received will be checked for its event type wither it is a status, start, end iteration or end process event, and according to that the next step will be defined
|
|
|
|
|
# Increase status counter: if the received event is a status event, the counter defiend for receiving from specific node will be increased and the function will return back to the “receiving events phase”.
|
|
|
|
|
# Save the Time stamp(Start Event) : if the event that was received is a start event, the time stamp for the begging of start receiving from a specific node will be restored in order to be saved in “saving results phase”.
|
|
|
|
|
# Increase loop counter (start Event ):the number of receiving events loop counter will be increased once a starting event received .
|
|
|
|
|
# Reset the event status counter: the event status counter will reset, in order to start new receiving events loop.
|
|
|
|
|
# Check end Event: if the received event marked as “end event”, it will be checked if it is end process or end loop event, and according to the results the next step will be defined.
|
|
|
|
|
# Save the time stamp (end loop event) if the event received is “end loop event” , the time stamp of “ending events receiving loop” will be recorded to be saved within the results in saving results phase for processing.
|
|
|
|
|
# Save Results (end loop event): if the received event marked with end loop event, the results of “receiving events loop” will be saved in a log file, and the function will return back to receiving events phase.
|
|
|
|
|
# Check if all sending nodes finished: if the received event marked as “end process event”, the function will check if all the sending nodes finished its process or not, if not it will return back to the receiving event phase, otherwise it will continue to the next step.
|
|
|
|
|
# Wait (finished): if all the nodes that sending the events finished their task, the function will wait for predefined time period in order to clean the bus from events.
|
|
|
|
|
# Exit all: at this phase the function will end all the processes that are running within the whole program and shutdown the framework.
|
|
|
|
|
|
|
|
|
|
* [https://raw.githubusercontent.com/wiki/universAAL/middleware/11.png Framework Launch script]
|
|
|
|
|
In order to run the OSGI framework and the Felix framework without the need of running Eclipse SDK, a modified pax runner which can be executed through a terminal command were can be used. Executable script will call a Felix file to launch the pax runner, this will start the frameworks required for the universAAL platform, start all the bundles and applications related to the platform, and will a start the Fault Injection Framework
|
|
|
|
|
|
|
|
|
|
# Initiate OSGI frame work: the script starts by defining the starting level of the osgi at which all the required bundles will be activated and ready. Also the execution environment will be defined including all the required applications.
|
|
|
|
|
# Initiate Felix framework: at this stage the starting level of the Felix framework and the Felix sittings will be defined.
|
|
|
|
|
# Initiation process: the required applications for running the Felix framework will be initiated and activated.
|
|
|
|
|
# Join the bus: this function will activate the upnp driver and will join the system bus.
|
|
|
|
|
# Activate the Middleware : all the middleware bundles that are required for the universAAL platform will be installed and activated.
|
|
|
|
|
# Activate the Ontologies: the univesAAL platform ontologies bundles will be installed and activated.
|
|
|
|
|
# Activate the application: the Fault Injection Framework will be installed and activated.
|
|
|
|
|
# Start the framework: at this level all the requirements for the Felix framework are activated and ready, it will start.
|
|
|
|
|
|
|
|
|
|
* [https://raw.githubusercontent.com/wiki/universAAL/middleware/12.png/400px-12.png Server Script]:
|
|
|
|
|
The server will control the Fault Injection Framework; it will start the script and will trigger application tasks synchronously on the nodes, by the end of this script it will collect the results.
|
|
|
|
|
|
|
|
|
|
# Send the Task: the server’s script starts by sending the assigned task to the nodes.
|
|
|
|
|
# Run the task in the node: the server side script will start the node’s script inside the nodes themselves.
|
|
|
|
|
# Collect the Results: after finishing the task the server will collect the results that are sent form the nodes and organize them in one results folder.
|
|
|
|
|
|
|
|
|
|
* [https://raw.githubusercontent.com/wiki/universAAL/middleware/13.png Node Side Script]
|
|
|
|
|
At the nodes side, a script is required in order to execute some essential functions to start the framework.
|
|
|
|
|
|
|
|
|
|
# Update the Task: the script starts with updating the task, which is sent from the server, from the node’s specific folder.
|
|
|
|
|
# Execute the task: at this stage the script executes that task assigned to it by triggering the application.
|
|
|
|
|
# Send results: during the task execution the results will be saved on the local node, and after finishing the task execution the script will send these results to the server.
|
|
|
|
|
# Clean Results files: once the results were sent to the server, the node will delete the old results to be prepared for the next task.
|
|
|
|
|
|
|
|
|
|
==Artefact #4 : Time Triggered Ethernet Extension Module ==
|
|
|
|
|
|
|
|
|
|
=== Blackbox Description ===
|
|
|
|
|
uAAL environment contains large variety of embedded devices that need to communicate with each other to achieve a specific AAL service. The delivered AAL services differ in the degree of impact on the end user, i.e. safety-critical services that deals with user’s health and emergency systems in a uuSpace should be correctly delivered to achieve its targets of high reliability, consequently the different devices that cooperate to deliver such services should support real-time communication and fault-tolerance mechanism to provide a reliable service. Real-time communication means that it is not enough for the network artifact to receive correct response in value domain but also this response should be correct in time domain.
|
|
|
|
|
|
|
|
|
|
According to the mentioned facts above , dealing with AAL environments mean dealing with different scenarios with different technical requirements regarding real-time communication. Consequently, different communication networks, protocols, and services could be used to cover all of these requirements. Here, TTEthernet comes to reduce the gab and combine the networks with different criticality into one network. Time-triggered services inspired by Time-Triggered Protocol <ref> The time-triggered architecture. Kopetz, H., and Bauer, G. s.l. : IEEE Special Issue on Modeling and Design of Embedded., 2003.</ref> with Ethernet flavor and standard IEEE 802.3 Ethernet protocol are combined in one Ethernet network. Time-triggered services provide temporal firewall, i.e. no message will be transmitted or received at wrong time, and thus, this issue will facilitate the use of fault tolerance techniques to add more reliability to the services.
|
|
|
|
|
|
|
|
|
|
=== Bundles ===
|
|
|
|
|
{| border="1" style="cellspacing=0; bordercolor=gray; align=left; valign=top;"
|
|
|
|
|
! align="left" bgcolor="#DDDDDD" colspan="2" | Artifact: '' Time Triggered Ethernet Extension Module ''
|
|
|
|
|
|-
|
|
|
|
|
| GIT Address
|
|
|
|
|
| http://github.com/universAAL/support/tree/master/reliability/TTE%20Extension
|
|
|
|
|
|-
|
|
|
|
|
| Javadoc
|
|
|
|
|
|
|
|
|
|
|
|-
|
|
|
|
|
| Design Diagrams
|
|
|
|
|
| [wiki/https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt7.png TTEMessageAction()], [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt8.png processBusMessage()], [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt9.png TT-Messages transmutation native method], [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt10.png TTEListener package], [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt11.png nativeTTEMsgListening], [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt13.png nativeTTEMsgFetching]
|
|
|
|
|
|-
|
|
|
|
|
| Reference Documentation
|
|
|
|
|
|
|
|
|
|
|
|-
|
|
|
|
|
|}
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
|
|
|
This uAAL based extension module provides Time-triggered temporal guarantees and facilitates the use of fault tolerance specifications to support the reliability of the uAAL system.
|
|
|
|
|
|
|
|
|
|
=== Design Decision ===
|
|
|
|
|
|
|
|
|
|
One of the main targets in universAAL project is to make the AAL services more reliable and safe, by adding real time communication capabilities for the nodes. Because of that, adding real time communication infrastructure like TTEthernet will motivate the AAL services developers to innovate reliable AAL services by providing reliable platform. Therefore, the implementation of the Time Triggered Connector will be separated into two main phases:
|
|
|
|
|
|
|
|
|
|
# The implementation on the Time Triggered Ethernet patch
|
|
|
|
|
# Creation of an independent Time Triggered Ethernet connector
|
|
|
|
|
|
|
|
|
|
In this version of the deliverable we will present the first phase, the second phase will be included in future versions.
|
|
|
|
|
|
|
|
|
|
==== What is Time Triggered Ethernet(TTEthernet) ====
|
|
|
|
|
===== Overview =====
|
|
|
|
|
During the past decades, Ethernet is represented as the most successful local area network of the world. Because of the event triggered communication related with Ethernet and its open nature, it became difficult to talk about strict temporal properties within Ethernet. Nevertheless, many projects tried to adapt Ethernet with time critical applications (e.g. ARINC, ProfiNet). Due to demands for a unified communication architecture to cover both real time and non-real time application TTEthernet appeared for the first time in Vienna University as an academic project. TTEtherne can be considered to be a unification of the best properties of standard Ethernet and TTP/C.
|
|
|
|
|
|
|
|
|
|
TTEthernet provides seamless communication for a wide range of networks by using Ethernet. Different applications with different degree of criticality regarding safety can be combined in one network with full compatibility with the IEEE Ethernet 802.3 standard. Since TTEthernet uses time triggered services, several features like temporal partitioning, precise diagnosis, efficient resource utilization, and composability can be added to the system. Since several applications with different requirements can be applied in AAL environment, it’s very powerful if these applications are unified under a unique network. Moreover, features like temporal partitioning will support the reliability in such systems.
|
|
|
|
|
|
|
|
|
|
===== Basic concept of TTEthernet protocol =====
|
|
|
|
|
As shown in the next figure the TTEthernet network consists of end systems which contain the host application and switches which organize the different traffic available in TTEthernet. The synchronization between nodes is important to exchange the messages within time triggered traffic. Regarding synchronization, different functions are assigned to TTEthernets’ parts (switches and end-systems).Synchronization Master(SM) function is assigned only to the nodes (e.g. SM1, SM2 and SM3 ) for their clocks other nodes should subject. The main function for switches in TTEthernet network is Compression Master, where the switch collects the local clocks of SMs, combines them and retransmits the combined clock to the SMs and SCs. Both of end systems and switches could be Synchronization Client (SC).
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt1.png| Typical TTEthernet network |500px|center]]
|
|
|
|
|
|
|
|
|
|
TTEthernet communication protocol provides three communication modes (communication traffics). The frames of all these modes are exchanged through on integrated physical network. The main traffic in TTEthernet protocol is Time-Triggered traffic which is the best representative for time critical applications. The messages through this traffic should transport at a restricted point of time according to a static schedule. If an end system decide not to use the dedicated timed slot, then the switch sense the inactivity and frees the bandwidth to be used by other traffics.
|
|
|
|
|
The second traffic type which is used by TTEthernet protocol is Rate-Constrained traffic. This traffic is specified when a less demands of real time and determinism are required by the application. For transporting messages through this traffic, sufficient bandwidth should be allocated such that delays and temporal deviation have limits. Since messages exchange through RC traffic doesn’t subject to the synchronization between nodes, it is possible that some nodes send its RC messages simultaneously. To get rid of delays resulted from this scenario, the transmission rate of RC messages should be known, and thus the upper bound of the transmission latency could be calculated off-line. If there is neither TT nor RC Messages reserve the bandwidth, Best Effort messages can take its way through TTE protocol.
|
|
|
|
|
|
|
|
|
|
===== Selection of TTEthernet artifacts =====
|
|
|
|
|
To setup a new TTEthernet network two types of artifacts are needed which are; switch and end-systems. A variant of FPGA-based switches which are developed by TTETech Computertechnik AG, are now available, These FPGA-based solution of TTEthernet switches differ between each other with respect to the communication speed they supports. A 100 Mbit/s FPGA based switch has been selected for the development model. Regarding end-systems, TTTech provides two types of them:
|
|
|
|
|
# FPGA-based TTEthernet end-systems which are characterized with high speed and high capabilities regarding real-time communication and fault tolerance.
|
|
|
|
|
# Software-based TTEthernet end-system uses a software stack which is called TTE Protocol Layer. The software-based end system is based on COTS hardware and suitable for a broad range of Ethernet applications, such as real-time control applications, data acquisition or multimedia applications. The software stack is also supported to work under operating system
|
|
|
|
|
To facilitate the development work, and to set up both of uAAL platform and network configuration under single operating system, software based TTEthernet end-system has been selected.
|
|
|
|
|
|
|
|
|
|
==== TTE Protocol Layer ====
|
|
|
|
|
Any hardware platform has a timer interrupt mechanism and Ethernet controller can host TTE Protocol layer which in turns can run on different Operating Systems. The protocol layer uses the dedicated hardware platforms and the operating system to perform the TTEthernet protocol which involves:
|
|
|
|
|
*Transmission and receiving of Synchronization frames.
|
|
|
|
|
*Transmission of Time-Triggered and Best-Effort Messages.
|
|
|
|
|
*Time-Triggered reception for both TT and BE messages.
|
|
|
|
|
*Time-Triggered execution of application tasks.
|
|
|
|
|
|
|
|
|
|
===== Construction of TTE Protocol Layer =====
|
|
|
|
|
The TTE Protocol Layer consists mainly of three basic elements as shown in the next figure
|
|
|
|
|
*TTEthernet core, which plays the coordinator’s role between the hardware drivers (Network Interface Card (NIC) and timer) from one side and the application-specific configuration files from the other side to handle the execution of TTE protocol.
|
|
|
|
|
*NIC Driver provides a low level access to a network card. Depending on the application specific configuration requirements, the NIC deriver will adapt the network card to enable the communication node from communication among each other using TTEthernet protocol.
|
|
|
|
|
*Timer Driver provides a free running timer with programmable interrupt to the TTEthernet core. In fact, it takes its orders from the time actions that are predefined in each end-system configuration file. The configuration files contains time actions related to the communication actions like clock synchronization actions, sending and receiving TT and BE messages actions.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt2.png| Simple TTEthernet End-System construction|300px|center]]
|
|
|
|
|
|
|
|
|
|
TTEthernet core together with the hardware drivers, an application-specific configuration and optionally a set of application tasks, will corporate and compile to form a single kernel module (.ko file). The resulted module provide suitable interface to the user-space application by using the network devices and provide another interface for tasks which are made under kernel by using TTEthernet API.
|
|
|
|
|
|
|
|
|
|
=== Implementation ===
|
|
|
|
|
==== Initial implementation from selected input projects ====
|
|
|
|
|
There were no initial implementation from the input projects, where the need of such extension were driven from the leak of reliability and fault tolerance support in those projects.
|
|
|
|
|
|
|
|
|
|
==== Implementation Model ====
|
|
|
|
|
As depicted in the following figure, the model, used in this work, is constructed of 5 communication nodes, TTEthernet switch, Ethernet switch and server PC. The server has been used for generating the configuration files for the network and transferring these files to the corresponding communication nodes through Ethernet network by using Ethernet switch. Each communication node has been configured to have RT-Linux as a real time operating system. On the top of operating system two basic elements have been installed:
|
|
|
|
|
# UniversAAL platform.
|
|
|
|
|
# TTE Protocol Layer.
|
|
|
|
|
Since the schedule of Time Triggered events should be fixed during the operation, each node should have static configuration. In our case each node has been configured to send one TT message and receive 4 TT message from each other node, through one cluster cycle. Despite of the synchronization message and BE messages. The next figure shows the time line of node1 that send one and receive 4 TT messages.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt3.png| Time schedule for node 1 through one cluster cycle |500px|center]]
|
|
|
|
|
|
|
|
|
|
===== Compilation of the used Configuration =====
|
|
|
|
|
The elements of TTE Protocol layer are then compiled with the static configuration file related to each node to form at the end single kernel module. As shown in the next figure, the typical node has two NICs (eth0, eth1). eth0 channel has been chosen to exchange the messages through TTEthernet network. . Before inserting the new kernel module inside the kernel of Linux operating system, eth0 should be disabled. By inserting the new kernel module, the physical network interface (eth0) will be replaced by six logical network interfaces, 5 of them are dedicated for exchanging TT messages while the last one is used for exchanging messages on BE traffic.
|
|
|
|
|
|
|
|
|
|
One point should be mentioned, that each configured TT message should have a unique name which is called Virtual Link Identification (VLID). For example, TT message of node1 has been called (101), for node 2 (102) and so on. It can be noticed from previous shown figure, that the names of logical network interfaces are derived from the corresponding (VLID).
|
|
|
|
|
|
|
|
|
|
==== Putting into practice ====
|
|
|
|
|
P2PConnector is an essential part in universAAL platform to provide the seamless connectivity between different middleware instances in such a way that each peer can discover dynamically the services offered by other remote peers. P2PConnector is already implemented under ACL layer, which is responsible of creating peering functionality among the distributed nodes. Two discovery protocols are already implemented under ACL i.e. two types of P2PConnector are available by universAAL platform:
|
|
|
|
|
*UPnP P2PConnector
|
|
|
|
|
*R-OSGi P2PConnector
|
|
|
|
|
Both of these technologies have discovery capabilities i.e. the discovery protocol is already provided by these technologies. Therefore, and in order to create a new P2PConnector uses TTEthernet service, it’s necessary to provide TTEthernet service with discover protocol (e.g. SSDP as in UPnP).
|
|
|
|
|
Because of these difficulties, it’s decided to implement TTEthernet patch under UPnP P2PConnector, in other words all the discovery functions will be left to UPnP Connector to do them, while the exchanged messages will be sent by using TTEthernet patch. Figure shows the available P2PConnectors of the middleware and how TTE-patch is connected to UPnP connector. For more information of how UPnP-connector is created.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt5.png| Inserting TTE Patch within uAAL middleware |400px|center]]
|
|
|
|
|
|
|
|
|
|
UPnP-Connector and the other P2PConnectors in middleware can access the SodaPop layer and create a new local instance through two interface declarations; PeerDiscoveryListener and SodaPopPeer.
|
|
|
|
|
PeerDiscoverListener interface has two methods:
|
|
|
|
|
*noticeNewPeer
|
|
|
|
|
*noticeLostPeer
|
|
|
|
|
UPnP-Connector invokes these methods to notify all locally registered PeerDiscoveryListeners about the existence of new peer or lost of existed peer.
|
|
|
|
|
SodaPopPeer interface has the following methods:
|
|
|
|
|
*joinBus
|
|
|
|
|
*leaveBus
|
|
|
|
|
*noticePeerBuses
|
|
|
|
|
*replyPeerBuses
|
|
|
|
|
*processBusMessage
|
|
|
|
|
The first four methods are used by the local UPnP connector to create the SodaPopPeer local instance. This local instance is used also by the last method (processBusMessage) to transfere the message from the remote SodaPop peer to the local Peer. As a first step in developing TTE patch, only processBusMessage method has been selected to be invoked by TTE patch while the first 4 methods have been left to UPnP-Connector.
|
|
|
|
|
In order to understand how processBusMessage will be invoked over TTE Patch, it’s important to understand how the other methods can be invoked over UPnP-Connectors. Suppose two uAAL nodes both of them have created UPnP-Connector. Each Connector will discover its partner and create a proxy object for the remote instance. If node 1 wants to invoke a method from the remote peer, then the steps as shown in the next figure should be followed:
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt6.png| method invoke using UPnP-Connectors |500px|center]]
|
|
|
|
|
|
|
|
|
|
#SodaPop instance of node1 calls the required method from Proxy 2.
|
|
|
|
|
#Proxy 2 forwards the message in serialized form to the UPnP-connector of node 1.
|
|
|
|
|
#The message is transferred then from UPnP-Connector of node 1 to its partner in node 2.
|
|
|
|
|
#The UPnP connector in its turn, de-serialize the message, drive the intended method with its parameter, then call the same method with parameters from the SodaPop instance which is registered locally to it.
|
|
|
|
|
|
|
|
|
|
In the same way, step 2 will be changed by forwarding the message of processBusMessage() method to TTE Patch instead of UPnP-Connector. To achieve this process several points should be taken into consideration:
|
|
|
|
|
#Since TT message will be broadcasted on the network, the destination ID should be included within the message.
|
|
|
|
|
#TTE network doesn’t recognise the SodaPopPeer ID i.e. only Virtual Link Identification (VLID) would be recognised.
|
|
|
|
|
#Keeping on an updated image of SodaPop instance, enabling the new received message to access the SodaPop layer.
|
|
|
|
|
#Creating an algorithm to send the messages on TTE network with making the required changing on the message arguments.
|
|
|
|
|
#Building an algorithm to receive the TT-messages with doing all the required processing in order to submit the messages in the same way as by UPnP connector.
|
|
|
|
|
|
|
|
|
|
===== Coupled ID Protocol =====
|
|
|
|
|
For managing the first two aspects, A new class named TTEMsgHandling has been created under upnp.importer package for executing the new protocol (Let call it Coupled ID-protocol), which is described as follow:
|
|
|
|
|
When starting UPnP bundle, the Activator class, registers this new connector within OSGi registry service and open listener to listen to the other UPnP-Connector. Before registration function, a new job has been added to Activator class. This job is summarized in sending a new TT-message from the hosted node to all other node in the network if existed. The new TT-Message consists of two strings separated by a special sign “|”, the first part is VL-ID for the transmitted node while the last part is the middleware instance ID of the same node. This message will be saved as a static string in TTEMsgHandling class, then it’ll be sent by calling TTEMsgHandling.sendIdMsg(String msgMod).
|
|
|
|
|
On sendIdMsg(String msgMod) class, another string part has been added to the message to be as follow;
|
|
|
|
|
|
|
|
|
|
msgMod|VL-ID|middleware_instance_ID
|
|
|
|
|
|
|
|
|
|
this type of message can be sent under two different modes, nul/one depending on whether the message has been transmitted as a request or as a response as follow:
|
|
|
|
|
*If sendIdMsg() class has been invoked by UPnP Activator class, then the new node sends the message with mode nul, i.e. it sends request to all activated nodes and says this is my coupled ID, please send me back your coupled ID
|
|
|
|
|
*If there is at least one node receives the previous message, then it’ll recognize the message from its mode as a request for its coupled ID. The receiver node will split the mode part and save the remote coupled ID in a matrix of string called remotePeersId which has been identified under TTEMsgHandling class. After splitting the mode part and saving the remote coupled ID, the mode is tested, in our case the carried mode is nul, so the message will be interpreted as request and the host node will send back its coupled ID with mode one as a response to be saved by the remote partner.
|
|
|
|
|
Consequently, both nodes introduce each other and then, can exchange the messages carried on the buses of their middleware instances.
|
|
|
|
|
|
|
|
|
|
===== Saving an updated Image of middleware local instance =====
|
|
|
|
|
In order to receive an message in the same way as processBusMessageAction() class is doing, an identical class, named as TTEMessageAction(), has been created. This class could really receive identical message with the same input arguments, but the question how can this class access to SodaPop layer. A SodaPopPeer local instance is needed.
|
|
|
|
|
Based on the assumption that each OSGi framework will host only one SodaPop instance, one instance could be saved somewhere and reused by this class whenever a new message is received. But the SodaPopPeer instance is not static object, i.e. it may be changed dynamically depending on the whole uSpace. For example, at certain instance a communication node is added/removed also within one node one or more uAAL-aware component may join or remove from a certain bus. Because of that the local instance image should be resaved dynamically.
|
|
|
|
|
|
|
|
|
|
Three classes from UPnP package are used to keep on one updated image of the local instance as follow:
|
|
|
|
|
*NoticePeerBusesAction()
|
|
|
|
|
*JoinBusAction()
|
|
|
|
|
*leaveBusAction()
|
|
|
|
|
A new instance of [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt7.png TTEMessageAction()] with a middleware local instance as an input argument, has been created whenever one of the above classes is invoked by UPnP connector. The class diagram below describes this operation.
|
|
|
|
|
|
|
|
|
|
===== TTE Transmitting Algorithm =====
|
|
|
|
|
When a specific middleware instance has a message in one of its buses, and want to send this message to the identical bus of a remote instance, than it calls [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt8.png processBusMessage()] function from sodaPopProxy class of the related remote instance. Exactly at processBusMessage() method will be crossroads, i.e. the message tried to be transmitted through TTEthernet network interface by calling TTEMsgHandling.main(String [] sentMsg), if the sending process is done successfully, the calling of method above return 1, otherwise it returns 0. Then, the returned value is tested, if the returned value is 1 then the message will not be transmitted another time by UPnPConnector, otherwise the sending through TTEthernet service is failed and the message should be transmitted by UPnPConnector.
|
|
|
|
|
|
|
|
|
|
In addition to the original input arguments (busName, msg), two other arguments will be added by main (String [] sentMsg) function, first of all the address (VLID) of the destination node, and the mode of the message. All of the input arguments have been concatenated together in one string to be as shown below:
|
|
|
|
|
msgMod|TTEId|busName|msg
|
|
|
|
|
Since the message modes “nul” and “one” have been reserved for exchange messages within Coupled ID-protocol, the mode of this type of messages is “two”.
|
|
|
|
|
Since the source code of universAAL platform has been done in Java and TTE protocol is done in C language, Java Native Interface (JNI) has been used in order to exchanging data between the native method which is written in C and the Java code. Because of that, the processed message will be delivered though JNI to a native method which is responsible of broadcasting the message on the TTE network. The first diagram in the next figure describes all the processes that happen to the message within main () function until the message is delivered to the native method.
|
|
|
|
|
The native method algorithm, as shown in next figure (second diagram), will receive the input argument of type jstring from Java class. This type cannot be recognised by C, in other word it should be converted to a recognisable form. Ethernet header will be added to the converted message to distinguish it as TT message not as BE message. In total the message size is 1514 bytes, which is the maximum message size can be transmitted on TTE protocol layer , it is also identical to the message size as set in the configuration. When the message size from java class is more than 1500 byte, then the message is transmitted in two bunches or more. After preparing the message, a ROW-SOCKET will be opened to send finally the message on TTE network. The implemented [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt9.png TT-Messages transmutation native method] is a java method responsible for transmitting TT-messages to the native JNI class.
|
|
|
|
|
|
|
|
|
|
===== TTEListener package =====
|
|
|
|
|
A separate maven-Java project named TTEListener has been created to include all the classes assigned to achieve the listening job on TTEthernet channels. This project has been built as universAAL application within eclipse. Creating of a universAAL application means a new OSGi bundle has been created, thus an Activator class has been generated automatically. In addition to Activator class, the package includes TTMsgListening and TTEMsgFetching java classes.
|
|
|
|
|
Two native methods have been invoked under these classes, the first one which is invoked from TTEMsgListening has been used to initiate the listening process from TTEthernet channels and save the received message in a FIFO queue, the second native method is invoked by TTEMsgFetching and is responsible for fetching the already saved messages from the queue. After dragging one message, the message is returned to TTEMsgFetching class to complete the processing on it. The flow diagram, describe how each one of these classes are interacting in this [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt10.png TTEListener package].
|
|
|
|
|
|
|
|
|
|
The main job of Activator class in such design is to trigger the listening processes in both TTEMsgListening and TTEMsgFetching, and not to wait the return value of each process. By initiating TTEMsgListening class, an individual thread is created to invoke nativeTTEMsgListening which in turns will open sockets for each TT-channel and listen for that channel. Additionally, TTEMsgListening create a new instance of TTEMsgFetching which will also create a separate thread to do the fetching job, in fact this thread enter an infinite loop, in each loop invoke its native class (nativeTTEMsgFetching()) and wait for fetching a new message, after getting a message from the native class,the message is submitted to another class for further processing while the loop invoke the native class in next iteration.
|
|
|
|
|
Both of native classes (nativeTTEMsgFetching and TTEMsgListening) have involved in one dynamic library named libJniListener.so.
|
|
|
|
|
|
|
|
|
|
''nativeTTEMsgListening''
|
|
|
|
|
|
|
|
|
|
This native method has been invoked by TTMsgListening class, neither input nor output argument are needed to call this native method since the main object of invoking this class is just to create a RAW-Socket for each TT-channel and listen to that socket. For our cluster model use case, each node should open four listening channels, one channel for each remote node. Thus, [https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt11.png nativeTTEMsgListening] triggers four threads and return nothing. The same tasks have been achieved in all of these threads. The flow chart seen in the next figure shows the main tasks for listening thread.
|
|
|
|
|
|
|
|
|
|
The listening process begins with opening a RAW-Socket to listen on it. The same data structure have been used at the sending side will be used also at the listening side. When the message is received, several filtration processes are done on it, the first process is to check the size of the received message, when its size doesn’t equal the size of the sent message, then an error message is printed out and the algorithm flow go back to receive another message. If the message size is identical, then the message is represented as correct message. Here, another check process comes to check the completion of the message. A certain sign “||” has been put at the end of each transmitted message, if this sign hadn’t be seen at the end of the received message, then this part of the correct message will be concatenated in a string pointer, otherwise the message is completed.
|
|
|
|
|
When the received message complete, the message mode should be detected. As mentioned before, three mode have been used to send messages on TT-channel, mode “nul” and “one” has been used for messages that carries IDs information while mode “two” has been used for exchanging messages upon buses. The first two types of messages have been transmitted without dedicating a destination address within it, so there is no need to check to which node this message has been transmitted. Since all nodes will receive the TT-message transmitted from one node (as set in cluster model configuration), the message with mode “two” should be classified according to the destination address that it carries, if the destination address match the address of TT listening channel, than the message will continue in processing otherwise the message will be ignored.
|
|
|
|
|
The message finally will take its final form and saving in a round FIFO queue as shown in the next figure, the queue has been identified as global variable, such that it can be accessed by fetching native method.
|
|
|
|
|
|
|
|
|
|
''nativeTTEMsgFetching''
|
|
|
|
|
|
|
|
|
|
This native method has been invoked in an individual thread from TTEMsgListener java class. The main target of this method is to fetch the already received messages through TT-channels. The fetching algorithm ([https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt13.png nativeTTEMsgFetching]) has been clarified in this flow diagram.
|
|
|
|
|
|
|
|
|
|
The process begins with an infinite loop, at the top of this loop, the queue status is checked whether the queue is leer or not. In case of leer queue, a new pull method is invoked, waits for a certain time and repeats the loop. The algorithm exists out of the loop when a success pull process occurs.
|
|
|
|
|
In order to have a holistic view for the whole project, a class diagram for all classes developed during this thesis and their relationship to other classes in UPnP connector package, has been created. The class diagram shows the relationship among these classes in rational sequence to describe the main three functions of this development work:
|
|
|
|
|
*Exchanging coupled IDs.
|
|
|
|
|
*Transmitting TT-messages.
|
|
|
|
|
*Receiving TT-messages.
|
|
|
|
|
The Activator class of UPnP-Connector begins the process by invoking sendIdMsg() method from TTEMsgHandling class which forward the message to TTEthernet network by invoking the native method. From other side TTEMsgFetching class invokes its native method to receive three types of messages, two of them carry ID information which are forwarded to the TTEMsgHandling to save the message there and to reply the the ID request in case of “nul” mode. The third type of messages which represent a uAAL message is forwarded to TTEMsgAction class where an updated image of the local instance resident there, by using the local instance the message can now be forwarded to SodaPop.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Tt14.png| Class diagram describing the uAAL-TTEthernet Interface implementation |600px|center]]
|
|
|
|
|
|
|
|
|
|
==Artefact #5 : Fault Tolerance (Replication) Module ==
|
|
|
|
|
|
|
|
|
|
=== Backbox Description ===
|
|
|
|
|
|
|
|
|
|
Replication or redundancy of components is the creation of replicas of the system components aiming to increase the reliability of the system. This can be achieved by using the fault tolerance form of N-modular redundancy. In the case of the Triple-modular Redundancy (TMR) –also known as Triple Mode Redundancy - the occurrence of a faulty component can be out-voted by the other two remaining components. Furthermore, the reliability of a system can be enhanced to run with the minimum allowed failure rate by arranging and implementing redundancy in systems’ instances and having hardware system replication. In this way the failure that occurs to a single replica of the system does not impact the reliability of the overall system. Another method of redundancy is the Double Module Redundancy (DMR) where in case of node duplication, the second working node will cover the failure of the first node. To decide our redundancy management approach we need to conduct a “root Cause Analysis” to find the reason of each failure occurrence and use cases that will be covered by us. This analysis has been done and extended from the one that has been done in the diagnosis framework and the fault hypothesis (see Diagnosis Framework).
|
|
|
|
|
|
|
|
|
|
There are two functions of redundancy to prevent performance failure from exceeding acceptable performance limits:
|
|
|
|
|
*Active redundancy: Ensure performance by tracking each component individually and in the meanwhile it uses this monitoring to implement voting logic. This voting mechanisms switch in between components and reconfigure components accordingly. Examples of voting in redundancy management logic are the following:
|
|
|
|
|
**Error detection and correction.
|
|
|
|
|
**Data radio selection in aircrafts.
|
|
|
|
|
**Global Positioning System (GPS)
|
|
|
|
|
|
|
|
|
|
*Passive redundancy: Performance decline is commonly connected to this passive redundancy; it provides simple features while maintaining the basic functionality by excessing capability to reduce the impact of component failures
|
|
|
|
|
|
|
|
|
|
=== Bundles ===
|
|
|
|
|
{| border="1" style="cellspacing=0; bordercolor=gray; align=left; valign=top;"
|
|
|
|
|
! align="left" bgcolor="#DDDDDD" colspan="2" | Artifact: '' Fault Tolerance (Replication) Module ''
|
|
|
|
|
|-
|
|
|
|
|
| GIT Address
|
|
|
|
|
| [https://github.com/universAAL/middleware/tree/master/middleware.core/mw.reliability.redunduncy HW Redundancy(TMR)], [https://github.com/universAAL/middleware/tree/master/middleware.core/mw.reliability.EventDuplication Event Duplication]
|
|
|
|
|
|-
|
|
|
|
|
| Javadoc
|
|
|
|
|
|
|
|
|
|
|
|-
|
|
|
|
|
| Design Diagrams
|
|
|
|
|
| [https://raw.githubusercontent.com/wiki/universAAL/middleware/EventDuplication.png HW Redundancy(TMR)],[https://raw.githubusercontent.com/wiki/universAAL/middleware/Redundancy.png Event Duplication]
|
|
|
|
|
|-
|
|
|
|
|
| Reference Documentation
|
|
|
|
|
|
|
|
|
|
|
|-
|
|
|
|
|
|}
|
|
|
|
|
|
|
|
|
|
=== Features ===
|
|
|
|
|
Provide replication and voting mechanisms for the uAAL application and message transmitted in the uuSpace for error detection and error masking.
|
|
|
|
|
|
|
|
|
|
=== Design Decisions ===
|
|
|
|
|
|
|
|
|
|
For the further enhancement of the Fault tolerance of components of the universAAL communication platform, replication of nodes and messages for error detection and voting mechanisms by using a Triple Modular Redundancy for error masking. Redundancy with Triple Modular Redundancy (TMR) provides fault tolerance against component failures. TMR will be able to cover and hide faulty nodes and it can overcome faults created in between Fault Containment Regions. For example, handling a single point of failure where if a part of a system fail, this failure will stop the entire system from working. Furthermore, hiding the detected errors will be also done by Event Duplication Redundancy. In this case all the following cases can be easily handled: operational faults of communication system, value faults, transient and temporal faults e.g. late timing faults.
|
|
|
|
|
|
|
|
|
|
=== Implementation ===
|
|
|
|
|
==== Initial implementation from selected input projects ====
|
|
|
|
|
There were no initial implementation from the input projects.
|
|
|
|
|
|
|
|
|
|
==== Implementation Plan ====
|
|
|
|
|
|
|
|
|
|
*'''Redundancy with Event Duplication''': In the case of event duplication, there are two new interfaces that should be added to the nodes. The first one is the one responsible for duplicating each event leaving the node, An new event duplication Publisher that is inherited from the initial platform publisher is taking over the event and creating the replicas and send them to the context bus. The most important in this redundancy of events duplication is the duplication Subscriber implementation, the duplication voter uses the Result predefined class to determine the status of the received events by checking within a predefined time out the contents of the event for transient and operational faults, also checks for temporal behavior of the received events are deployed.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/EventDuplication.png|500px|center]]
|
|
|
|
|
|
|
|
|
|
*'''Hardware redundancy (Triple Modular Redundancy)''': As described earlier in this section Triple Module Redundancy is performed on hardware level. Three replicas of the same node that runs identical functions should run the TMR_Publisher, those nodes will send the same copy of events on the context bus. The TMR voter will be able to collect the duplicated messages and make a decision regarding the accuracy of the events and the desired operation of the redundant nodes. The so called RedSubscriber in the implemented TMR perform a replicas voter, the voter logic is simple: it perform monitoring to determine how to reconfigure components’ outputs so that the operation of the system continues without any violating of the operational and functional limits of the overall system. In another words, the voter establishes majority choice between available replicas, when it has two identical replicas at least, when there is disagreement it will drop the faulty choices from the voter because a single fault will not interrupt the whole system operation. The TMR perform continues timeout check counter for the replicas and voter in order to control the temporal violation of the TMR itself so it can stop the voter any time in case of no decision or delays of no reason.
|
|
|
|
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/universAAL/middleware/Redundancy.png|500px|center]]
|
|
|
|
|
|
|
|
|
|
== References ==
|
|
== References ==
|
|
|
<references/> |
|
<references/> |