More information

Artificial Intelligence in Your Cockpit

1.    Artificial Intelligence in Your Cockpit

Artificial Intelligence is adopted widely in various areas including aviation domain. Artificial intelligence in the cockpit – cooperating with a pilot or even taking autonomous decisions– this opens new opportunities for functions which increase safety, reduce crew workload and fatigue, optimize flight trajectories for fuel efficiency in crowded airspace and open completely new business opportunities. 

Besides the benefits which AI will bring to the aviation segment, there are also challenges to be addressed. The AI based functions need to be evaluated not just from the it’s benefits point of view, but also need to be evaluated from:

  • Safety perspective – “How we can know the AI based function decisions are always safe?”
  • Ethical perspective – “How we know the AI based function is free of cultural/racial/gender… biases?”
  • Transparency perspective – “Can we determine based on what AI function took particular decision?”
  • Human perception perspective – “How pilots will accept AI functionality and what will be pilot’s expectations from AI functionality?”
  • Certification perspective – “How we can certify complex SW functionality which is Machine learning based?”

All these questions must be asked when we are thinking about bringing AI based functionality into the cockpit. Once we know the answers, then the technology revolution in the cockpit may begin.

1.1.     AFI-X Prototype Story

The beginning of the Artificial Instructor story (in 2015) was simple – “I wish to have someone experienced on board with me, when I’m flying solo.” In other words, the idea came from the community of pilots with low flight time – either new pilots with fresh license or pilots with long flight break.

The second step was also the simple one – Let’s build application, running in the cockpit, which monitors in real-time pilot performance and detects pilot mistakes which may develop into the real problem if not mitigated accordingly. Once the mistake is detected, the system will recommend corrective action. The system may observe pilot improvements over the time and change focus from major errors which may lead to unsafe situations towards smaller errors which are commented to improve pilot skills. In other words – the function will do what flight instructor is doing. And how to implement such functionality? The Artificial Intelligence definition (one of many) gives us the answer: “Artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems.” (Definition is taken from means that simulation of human instructor intelligence will be Artificial Intelligence – Artificial Instructor. 

The idea was clear and desired AI functionality well defined, however the framework for development of AI based, real-time application, which assist to a pilot during flight – this framework was missing at that time – so we had to built the framework from scratch.

Note that today situation is better than our initial position in 2015. In 2020, EASA published the EASA AI Roadmap 1.0 and at the end of 2021 the First Usable Guidance for Level 1 Machine Learning Applications was published by EASA.  As we asked right questions back in the 2015, our framework we developed during the work on AFI-X prototype is well aligned with the EASA guidance and EASA roadmap. 

And mainly, our framework we developed is generic – we are able to tailor it development of various AI based applications and we are also able to evolve the framework from actual stage towards more autonomous AI/ML frameworks. The figure below is taken from EASA AI Roadmap 1.0 – it is clear that we are just at the beginning of the technology revolution in Aerospace.  


Figure 1: EASA published roadmap for AI applications. Taken from, page 13.

The AFI-X journey was successful – the project moved from the initial idea through the AI algorithms concept definition towards simulator testing and finally was flown in real airplane for many hours assisting to pilots. This story helped us to validate the correctness of the framework and develop good AI based Level 1 application prototype.


Test at simulator…


… and real flight test.

The developed framework and processes can be re-applied and customized for development and certification of various AI based avionic applications – despite the selected algorithms and target functionality.

Are you interested about tips and tricks for a AI development and testing? Read more in next article AI Development – Tips & Tricks

Or would you like to know more about the AI development framework? Read more in the article Framework for AI Development and Certification

AI Development - Tips & Tricks

1.    AI Based Application Development – Tips & Tricks

At the beginning of new AI oriented project, besides definition of required functionality, it is critical to ask the questions listed in the article Artificial Intelligence in your Cockpit. To find the right answers is not easy task and right answers may add additional work into the program – it means that “temptation” to ignore at least some of the questions may be high, however if any question is missed or not addressed in the project plan, the later corrections of problems may be expensive or even not possible.

1.1.     Safety First

Obviously, your AI function needs to work. But – as you will use it in the cockpit – it must be also safe. Now – “How we can know the AI based function decisions are always safe?”

The core is to understand hazards related with AI function. Pay attention to review of defined hazards with pilots / users of your AI application to correctly assess crew impacts. 

Another “catch” are situations, which were not covered in training data. If the situation is ‘far away’ from the scenarios covered in the data, then AI may have problems to provide good answers in these situations. Map operational space and compare with your data coverage. Understand well the corner cases. Check system response at the operational boundaries and during corner case scenarios by good robust testing. Try what will happen if inputs are completely out of the data coverage – for example – if you trained the AI to recognize animal species from the photo – use orange to test the response of AI. The right AI answer in this case is: “No animal recognized.” 

Prepare prototype/mockup and analyze user interaction with the AI application. Focus on unexpected interactions (not considered at the beginning) and see if new hazards may be imposed.

Keep in mind that “always safe” means – “the probability of the failure with safety impact is low enough”. Based on the selected AI algorithms use proper statistical/analytical methods to determine probability of incorrect output. Note that this differs from regular safety analysis – in this case we are not looking for failure of some component (i.e., CPU HW failure) and how it contributes to analyzed hazard – in this case we analyze complex AI algorithm performance and we are looking on probability of “wrong decision taken by AI”. 

Keep order in the data – do not loss track of what data were used for what (training, testing, …) and when (initial training phase, informal validation, re-training to improve performance,…). The data management has to be established from early phase of program.

And of course – the whole regular safety assessment process of the system which hosts AI application still applies. 

1.2.     Human Centric Design

Your AI function supposes to assist to a pilot during flight andthus, your design has to be oriented on pilot needs. At least following areas should be considered for AI design:

  • The user interface design. This may not differ too much from designing other avionics functions – but if the provided functionality is novel to the industry, then there may not be well known ways how to interact with the user established yet. However, the generic rules can apply here – the UI should intuitive, easy to use, not distracting – and of course - satisfying.
  • Another aspect, which is more specific to AI is human perception of the AI. Typically, the first reaction to AI function is “wow, that’s cool” (if it’s working).  Then, user will get used to the function and the ‘wow effect’ is over. Instead of initial enchantment, the sensitivity to errors may show up. For example – if you have new voice assistant, the initially you may be excited about the great functionality. Later, you get used to the fact, it works well most of the time and instead of it, you start to pay attention to errors of the assistant. If the number of errors is higher than ~5%, the function starts to be perceived as unreliable. Now – if the system for advertisement recommendations do more than 5% errors, then you probably will be sometimes surprised about displayed advertisement, but that’s it. However, if your cockpit assistant do 5% or more of weird recommendations, you stop to trust it and remaining 95% of good advices will be compromised. It means that deploying of AI based application too early into the field (even if there are no safety impacts) may compromise the function even if overall failure rate is relatively low and acceptable from safety point of view.
  • Your function may get cultural/racial/gender/… biases. There are various sources of biases. For example – Artificial Instructor responses to detected piloting problems are based on recommendations from skilled pilot instructors. Thus, the system is influenced by this fact. In other words – for design of expert systems the selection of human experts which help to set the system behavior and responses may lead to bias. Another source of bias may be your training data. For example – if your AI application processes pilot face images, then selection of images for training may cause bias. If you pick up images of while bold men with glasses into your training set, then the AI application may incorrectly provide responses when there is someone who does not fit well into your training set. General advice how to minimize these biases is high variance in inputs (experts from various countries/cultures equally represented during definition of correct responses of expert system,…).
  • Unexpected interactions with AI – people may react to the AI function in unexpected ways which designers did not consider. It is important to prepare representative mockup/prototype for early evaluation to discover these interactions, asses them and – if needed – adjust user interface/functionality as required.


How to indicate visually a need of quick action?


May be this way?


1.3.     The More Data – The Better AI?

Many people think that the more data is used for machine learning, the better AI will be. In general – yes and no. To train good AI, we need right amount of data – in other words, we need good coverage of the space, in which AI will operate. If you have huge amount of data, where most of data cover just limited subset of the operational space, then your AI may be over-trained for some scenarios and may fail to respond correctly in other scenarios. In such case – the problem may not be solved just by adding more data – you need to add data to cover gaps in operational space coverage and maybe reduce previous data set to ensure you correctly equalize data for various scenarios in your training set. The good data management process supported with iterative data sets analysis may help to reduce risks related to improper usage/balance of data. The right data management may also help you to not ‘over-collect’ data. If your project is dependent on the flight data, the data collection may be expensive process and collecting of unnecessary data will be primarily waste of money and time rather than real contribution to the AI function quality.

Another way how the cost of data collection may be reduced (in aerospace industry), is usage of simulation. This path may help to speed up data collection and keep cost reasonable, however the data must be checked for applicability. For example – during AFI-1 program – we discovered that data from simulator can do great job for flight phase detection AI function while for some other modules (i.e. monitoring of approach correctness) the simulator data were not representative enough. Further analysis has shown that the reason was in minor differences in a way how pilots fly at simulator versus how they fly in real airplane. The possible solution was to build high fidelity simulator or collect enough data for approaches.

Another way how to optimize data for AI algorithms in number of inputs into the algorithm. Sometimes designers think that injecting many inputs into the training is good idea – “Machine learning of Neural Network will solve this out.” Well, selecting right inputs for your application may help you with computational demand reduction and also analysis of why the algorithm took some decision is easier. The goal should be to minimize inputs for AI algorithms – but do not remove something critical. Watch the video and see how small change in input set may impact results in critical phase of the flight. The video (5x faster than real flight time) demonstrates ability of two neural networks to detect pattern phase which airplane currently flies (together with manual pattern phase tag created by operator). 

Both networks were trained by the same set of data and both networks have the same architecture. The only difference is that one network uses information about engine RPM while the other does not. You can see that both networks behave very similar way most of the time. The only noticeable difference is at the end of the flight. The network without RPM will change state from landing to take off (for about 1-2 seconds of real flight time) and then back to the landing (landing is correct state), while the network with RPM continues to detect landing. The 1-2 seconds error in pattern phase detection seems like negligible performance problem (as whole flight time is 371 seconds) – however the impact is significant. And why significant? The erroneous detection of take off in landing phase (even for 1-2 seconds) may generate Artificial Instructor output which will try to correct errors during take off phase – but such recommendation will be misleading in the landing phase.

1.4.     Performance Evaluation – Statistical vs. Subjective 

Once you are done with AI training and implementation, it is the time to verify the AI functionality. This verification is done at multiple levels and by various means. The very first step is validation of learning process results. Based on the selected AI algorithm, the proper method of learning process validation is selected. Let’s take a look on two neural networks for detection of the pattern phase (which ability to detect pattern phase was compared in previous part – you remember the video, right?).

As discussed previously, both networks have the same architecture and the same inputs are the same – except the second network utilizes also information about engine RPM. And as we have seen in the video, both networks behave almost equally – except landing phase. During landing phase, the network without information about actual engine setting may confuse landing with take off. Now let’s take a look if we can recognize that the first net is prone to such problem in early phase of the project.

The figure below shows confusion matrices for the two networks (captured after initial training phase). The Class 1 corresponds to the Take Off and the Class 8 corresponds to the landing.If you check overall performance of both networks after the initial training, you will find out that both networks deliver similar quality of outputs – the network without engine RPM input reached 96.5% of correct classifications while the network with the engine RPM input reached 97.2% of correct classifications. This difference seems minor and additional cost of make this input available for the AI application may seems as not worth investment.However, if we check the confusion matrix for the first network carefully, we can see that Class 1 and Class 8 has approximately 10% of erroneous detections. After adding the engine RPM information, the error ratio drops down to approximately 5% - and if you check the confusion matrix really carefully, you fill see that scenario when class 1 was classified as class 8 or vice versa is almost completely removed but the confusion between class 1 and 9 remains almost the same. 

And bonus question – what you can say about the initial data set by looking on the confusion matrix? It can be seen that number of data points for various pattern phases significantly differs – and better balancing of the data set may improve performance as well.