Ariel Peterson is from Punaluu, Hawaii on the island of Oahu and graduated from Kahuku High School. She is currently pursuing a B.S in Computer Science at the University of California, Los Angeles. Upon graduation in March 2019, she plans to move back to Hawaii to work. Her current interests are in computer security, machine learning, and artificial intelligence. In her free time she likes to go to the beach, hike, and do yoga.

Home Island: Oahu

Institution when accepted: University of California, Los Angeles

Akamai Project: Using Deep Learning Algorithms to Generate Accurate Training Scenarios

Project Site: Akimeka LLC

Mentor: Rob Nelson

Co-mentors: Des Iorgova, Joey Andrews

Project Abstract:

The Theater Medical Data Store (TMDS) and Medical Situational Awareness in Theater (MSAT) systems store Electronic Medical Records (EMR), allowing clinicians and caregivers worldwide the ability to view individual patient records for those treated in a Department of Defense (DoD) operational environment. TMDS/MSAT developers generate synthetic (fictional) EMRs manually which can be time consuming and may not accurately represent the real data in the production tier. TMDS/MSAT is exploring the use of deep learning algorithms to learn more about patterns within the production tier to help improve synthetic data generated for development tiers. Furthermore, once they learn more of those patterns they would like to incorporate them when generating synthetic health records. We researched various deep learning algorithms used within the EMR domain and due to the complexity of mapping the discovered insights, such as patterns in the production tier back onto the training and testing tier decided on a deep generative model to generate synthetic medical records to begin with and then evaluate the validity of the generated data. We show that we can use medGAN a generative model to generate synthetic medical records by learning the distribution of the diagnosis codes in the training data. Once the synthetic data is generated we plan to validate our model using a clustering algorithm to visually compare the patterns from real and synthetic data. This will ensure the model is generating data that resembles the real data. While these methods are useful to generate data and validate the data generated; future work will be done to implement a clustering algorithm and incorporate a broader spectrum of features, not only diagnosis codes, in a medical record when generating synthetic data.