Cami Nakagawa was born and raised in Kihei on the island of Maui, where she graduated from Kamehameha Schools Maui in 2023. She is a first-year undergraduate student studying Computer Science and Coastal Marine Sciences at Amherst College in Massachusetts. She rows for the Amherst crew team, plays club water polo, and is a member of the Hawai’i Club and Outing Club. In her free time, Cami loves to study languages, learn new skills, be in the ocean, and spend time with friends and family. She is passionate about learning and is currently looking to find a bridge between her interests to pursue as a career. In the future, she hopes to return home and use her knowledge and skills to help the people and wildlife of Hawai’i.
Home Island: Maui
High School: Kamehameha Schools Maui
Institution when accepted: Amherst College
Project Site: University of Hawaii at Hilo, Hilo, Big Island HI
Mentor: Winston Wu
Project Title: Aiding Hawaiian Language Preservation Through Machine Translation
Project Abstract: Over the past few decades, the communities of Hawaii have worked to revitalize and preserve Olelo
Hawaii, a critically endangered language, largely without the use of technology. One promising technology for assisting in language revitalization and preservation is machine translation. However, current Hawaiian-English translation systems have limited accuracy and translation abilities due to the low-resource nature of Hawaiian. As an initially oral-only language that has gone through major losses of knowledge, there is a lack of usage and sufficient data to train accurate translation systems. This project aims to develop and improve a machine translation system, which will ultimately benefit the local Hawaiian community. There are three main parts to this project: 1) gathering translation data from websites and online databases, 2) evaluating existing commercial machine translation systems, and 3) developing an in-house machine translation system. Using the neural machine translation package OpenNMT, we train machine translation models on tokenized bitext generated from the translation data. We experiment with variations in the models’ structure, hyperparameters, tokenization configurations, and vocabulary size. Additionally, we employ various data augmentation techniques such as backtranslation and lexicon expansion to address the lack of usable data. The quality of our translation systems and other commercial systems are measured using established metrics for evaluating machine translation systems, such as BLEU and chr-F. More broadly, creating an accurate translation system will greatly benefit the Hawaiian language community, enabling Hawaiian speakers to access more resources in their native language and providing a useful tool to Hawaiian
language learners and researchers.