Jason was born and raised in Honolulu, Hawaii. He graduated from McKinley High School in 2014. He is currently a second year Computer Science major at Cal Poly – San Luis Obispo. His hobbies include hiking, going to the beach, photography, and gaming. His goals are to learn more about machine learning and security.
Home Island: Honolulu, Hawaii
High School: McKinley High School
Institute when accepted: Cal Poly – San Luis Obispo
Project Title: Developing a Real-Time Display for Radio Astronomy
Project Site: Academia Sinica Institute of Astronomy and Astrophysics
Mentors: Geoffrey Bower & Ranjani Srinivasan
The Yuan Tseh Lee array is a radio interferometer that is currently being redeveloped for the purposes of studying the evolution of galaxies and galaxy clusters in the early universe. This experiment will be trying to detect very weak signals coming from the early universe from carbon monoxide spectral lines. Since data is real time, astronomers must have some way to assess data quality and accuracy. The software will receive datasets that contain 21 baselines of data at a maximum rate of every 0.226 seconds. The main real-time display code along with the miscellaneous software tools are written in an Anaconda distribution of Python 3.5 with the display engine running on the PyQTGraph library. The purpose of this project was to extend and optimize existing software by rewriting the code to be more efficient and trying alternative solutions. The previous real-time display program was limited in certain aspects like consuming roughly 80% of the CPU, resulting in delays and eventual failure of the real-time display. In addition to the real time display, software tools were developed for astronomers to analyze specific time intervals of the data collected in a real time setting. Through profiling of the code, the most CPU intensive processes were located and optimized. The most CPU intensive process of the code was the file I/O from the Python library Pandas’ ASCII read function. With H5PY’s H5 read function as an alternative to Pandas and modification of file retrieval method, the CPU usage dropped from 80% to 20%. With the implementation of multiprocessing, computation time for the software tools has been reduced by over 99%, allowing over 50000 points of data to be processed in less than 10 seconds as opposed to over 10 minutes previously. The software is easily scalable with minimal changes to accommodate the experiment as it expands in scope. It is also currently being adjusted to accommodate for the 1% possible data corruption when reading from a data file that is simultaneously being written into, but further testing is still necessary.