Developing a common language for humans and machines

The study of languages has long offered profound insights into communication, culture, and history. However, in the past, people seldom associated linguistics with computer science. ‘This landscape has changed with the emergence of generative AI models such as ChatGPT,’ says Yuan Yulin, chair professor and head of the Department of Chinese Language and Literature at the University of Macau (UM), who is also a computational linguist.

Revolutionising Linguistics Research with Large Language Models

‘The extraordinary advances in AI over the past decade have brought ground-breaking developments in areas such as machine translation, text analysis, and speech-to-text technology,’ says Prof Yuan. More recently, these advances have culminated in the advent of conversational AI systems underpinned by large language models (LLMs), such as OpenAI’s GPT-4 and Google’s Bard.

Some linguists believe that humans develop their language abilities and ‘Theory of Mind’—the ability to comprehend and infer the mental states of others—through language use and continuous interaction. It is worth noting that LLMs have made significant progress in text comprehension and generation in a similar trajectory. ‘Research suggests that after receiving feedback provided by humans using natural language, some of these models begin to exhibit an ability similar to the “Theory of Mind”,’ says Prof Yuan. As a result, there is a growing effort to integrate traditional linguistics research, neuroimaging studies on neural behaviour, and LLM research. Prof Yuan is optimistic about the progress of this interdisciplinary field, which promises to amplify the language capabilities of AI while deepening our understanding of human language use and cognition.

Cultivating Language Professionals with Emphasis on Both Theory and Practice

As linguistics and computer science are increasingly intertwined, many students see the advantages offered by the Computational Linguistics Specialisation under UM’s Master of Science in Data Science programme. This specialisation, jointly offered by the Institute of Collaborative Innovation and the Faculty of Arts and Humanities, combines theoretical research with practical applications. It equips students with foundational concepts and methodologies in language and linguistics research, as well as skills in collecting and analysing linguistic data. It also covers the application of big data techniques to areas such as machine translation, automated grading, and corpus-based analysis.

Over the past year, Prof Yuan has supervised four master’s students, each focusing on a project related to LLMs. The first project aims to develop a more refined test set for evaluating the semantic understanding and common-sense reasoning capabilities of large language models. The second project involves the creation of a text detection system that can distinguish between AI-generated and human-authored content. The third project focuses on text analysis to investigate personality traits through an individual’s written text. Lastly, the fourth project delves into practical applications, such as crafting more precise prompts to enable LLMs to adapt writing styles, create tables, and write computer programmes more effectively.

‘An increasing number of professionals across various linguistic disciplines are employing data science technologies for language analysis,’ Prof Yuan remarks. ‘The career prospects for graduates specialising in computational linguistics are broad. They may find opportunities in sectors such as publishing, education, media, corporate communication, translation, or even research in the humanities, social sciences, and information technology.’

Chinese & English Text / Davis Ip, Trainee UM Reporter Ason Lei

Photo / Jack Ho

Source: UMagazine ISSUE 28

Prof Yuan Yulin

Developing a common language for humans and machines

Share