How we Won the 2021 Afwerx Datathon Challenge

December 15, 2021

Raft is excited to announce that our team placed first place in this year’s AFWERX Datathon sponsored by the Air Force Chief Data Office. This year’s challenge presented two subproblems: using flight simulation data and physiological and cognitive sensor data, predict the difficulty rating of a flight simulation scenario and the pilot’s overall performance. The ability to make these predictions will enable the Air Force to train pilots more effectively. By dynamically adjusting the difficulty of training scenarios, pilots are trained at an intensity that optimizes their learning.

Our multifaceted team was composed of both data scientists and software developers. This integration ensured that the final product was a true data science pipeline, rather than the typical collection of Jupyter Notebooks. To facilitate modularity, we utilized a Python package named Kedro. The Kedro documentation page describes the project as an open-source Python framework for creating reproducible, maintainable, and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. It took us a few days to get comfortable using the framework, but once we got the hang of it, we were able to iterate incredibly quickly. Each step of the pipeline, including data ingestion, preprocessing, training, validation, etc., were abstracted as Kedro nodes which map to Python classes. Swapping between models and adding ETL / preprocessing steps was as simple as adding an entry to the pipeline. For example, our pipeline for learning simulation difficulty using a random forest model is represented in just a few lines of code.

analysis = Pipeline(
    [
        load_data.load_multi_features_data,
        load_data.load_label_data_node,
        learn_labels.multi_learn_difficulty_rf_node,
        write_data.save_rf_model_node,
        learn_labels.print_multi_score_node,
    ]
)

The above pipeline encapsulates all steps of model training and validation once the features have been constructed. Feature construction is handled by another pipeline because it takes several hours to run on a 2020 MacBook Pro. The first step of the feature creation pipeline is consolidating the thousands of csv files we were given for this challeng. Once consolidation, we dropped data with high proportions of missing values and performed interpolation to fill missing values in the remainder of the time series. One data series which presented us with a large amount of difficulty was eye tracking data. The eye tracking device measured the angle of gaze at 0 degrees when the subject was staring straight ahead. The slightest gaze to the left would register at 359.9 degrees. This lack of continuity in the eye tracking time series made the data look like white noise and resulted in low variable significance within our preliminary models.

We were just about ready to discard the eye tracking data as not useful when an idea came to us. After consoluting a human doctor and an ornithologist, we confirmed that, unlike owls, humans cannot rotate their heads 180 degrees. Thus, by artificially rotating our reference point 180 degrees (by adding 180 and taking the modulus of 360), we can reorient the measurements provided by the eye tracking device. By placing 180 degrees at straight ahead and 0 degrees directly behind the pilot, our time series went from something resembling white noise, to a smooth series ranging between about 110 degrees and 250 degrees with minimal discontinuities. This modified time series significantly improved the performance of our models. As an added bonus, the Air Force now has a means of testing for owls amongst their ranks.

A second aspect of data preprocessing that proved difficult was handling inconsistent sampling frequencies across the various sensor streams. The flight simulator produced a data stream at approximately 4Hz, while eye tracking and several other physiological sensors produced data at more than 5,000Hz. This sampling issue made it impossible to perform alignment on the original data streams to run any form of multivariate time series modeling. To overcome the inconsistent sampling, we attempted to up sample or down sample the data streams to a consistent sampling rate using splines and linear interpolation. This operation was incredibly memory intensive and didn’t provide us with any apparent benefit in our initial modeling efforts. With more time we would have liked to continue this area of research. We instead elected to utilize wavelets to decompose the original time series. By selecting various levels of decomposition we were able to standardize the dimensionality of features fairly well.

We tested several supervised learning techniques with different levels of success. Ultimately, we arrived at an XG Boost algorithm for difficulty classification which resulted in an AUC of just over 80 percent. This algorithm also performed well for predicting cumulative flight error. Testing various combinations of learning algorithms and processing techniques was accomplished through adding or removing steps from the above pipeline and rerunning the script. This modularity made iterating on different algorithms a trivial process.

Managing code alongside data and compute resources was another aspect of this project that took us a while to optimize. The traditional data science workflow is often disjoint from the CI/CD pipelines that developers utilize. Our code was version within Github but the data was stored both locally and in a shared network folder. This did work, but it left a lot to be desired. Computing wavelet decompositions across billions of points of data took hours even on a VM with 64 cores and 128 gigs of ram. To reduce redundancy, one of us would calculate the wavelet features, push the file up to the cloud, and then notify the rest of the team to pull it down. If a change to the file was made upstream, it became the responsibility of each individual to make sure they pulled the latest copy. This quickly became a royal pain even with a small team; such a workflow would never work with a larger team.

The above challenges are in no way unique to this particular challenge. Analysts and data scientists across both the DoD and private sector struggle with the task of synthesizing code, infrastructure, and data. Additionally, decision makers who rely on data insights to make informed decisions are often forced to rely solely on intuition because they can’t easily access relevant information. Raft is currently working alongside the Air Force to create a unified data ecosystem for the entirety of the DoD. We call this ecosystem Data Fabric. Data Fabric enables the integration of legacy data silos with applications, edge devices, and other various data actors. It supports data consumers and producers while ensuring the data is securely delivered to the right end user at speed of relevance. With Raft’s data fabric architecture, consumers can see data available to them based on their roles and attributes. Onboarding to data fabric is easy, allowing data stewards to connect their data, contributing to a data storefront. Through the data storefront, data consumers of all experience levels can easily access and join disparate data sets spanning various schemas and technologies into a single, unified representation that can be securely shared with relevant parties.

The Data Fabric will enable data scientists, analysts, engineers, and key stakeholders to collaborate within a single ecosystem without being burdened by the logistics of managing data and infrastructure. This platform will enable Raft and its customers to fully leverage disparate data and share insights with the click of a button. By leveraging our expertise in data science and software engineering, and aided by the Data Fabric, Raft will bring even more to the table for next year’s AFWERX Datathon.