
Jerald Cascayan is a recent graduate with a Bachelor of Science in Computer Science from the University of Hawai‘i at Manoa. He is the founder of PLAYSET, an indie game studio focused on making multiplayer games grounded in the science of play. He previously worked in fast-paced startup environments, gaining hands-on experience in software engineering and entrepreneurship. Most recently, he contributed to efforts at organizations like AI4ALL—assisting in lectures led by AI researchers and serving as a project manager for a range of student-led projects spanning language models and core areas of machine learning, from classical methods to modern approaches.
Home Island: O‘ahu
Institution when accepted: UH Manoa
Site: University of Hawai‘i at Hilo, Hawai‘i Island
Mentors: Winston Wu
Project title: Developing Encoder-Based Language Models for Future Technology to Support ‘Olelo Hawai‘i
Project Abstract:
‘Olelo Hawai‘i remains underrepresented in modern language technologies, lacking essential Natural Language Processing (NLP) tools such as spell-checking and semantic search that depend on understanding word meanings and sentence structure. Addressing this gap requires a system capable of generating computationally meaningful representations of ‘Olelo Hawai‘i words and sentences that can support core NLP tasks in low-resource settings. In response, we developed a benchmarking system designed to evaluate NLP models trained on ‘Olelo Hawai‘i text. This system is grounded in two key datasets: a collection of raw ‘Olelo Hawai‘i sentences from textbooks, short stories, and other teaching materials, and a curated corpus of part-of-speech-tagged sentences created through human-in-the-loop annotation. We explored both traditional and modern approaches to word representations, including Word2Vec and transformer-based architectures such as Bidirectional Encoder Representations from Transformers (BERT). All models were trained on the raw corpus and evaluated on a downstream part-of-speech tagging task and language modeling task. Model performance was evaluated using F1 score and accuracy for part-of-speech tagging, and perplexity for models with language modeling objectives. These benchmarking results enable direct comparison between traditional and transformer-based approaches. By analyzing model behavior on downstream tasks, we identify which architectures demonstrate the most promise for advancing ‘Olelo Hawai‘i NLP capabilities. Our findings establish a performance baseline and offer guidance for selecting and developing future models that support culturally aligned language technologies, helping to bridge the technological gap and pave the way toward a comprehensive ‘Olelo Hawai‘i language model.