NNA Corpus Video 1: Intro to Linguistic Corpus

Author(s): Hannah Brennan, Domi Hannon, Bren Yaghmour, Jaxon Myers,

Mentor(s): Vincent Chanethom, Linguistics; Harim Kwon, English; Giulia Masella Soldati, Haley A. todd, Graduate Assistants

Abstract

Ultrasound is a non-invasive method of looking inside the body to ascertain the tongue’s movements. Because we cannot see what is happening inside of a person’s mouth when they talk, studying speech is largely done by making judgements on what could be happening based on the acoustics and lip movements of the speaker. The Non-Native Articulatory Corpus will provide researchers all over the world access to speech ultrasound data, a commodity which is expensive and difficult to acquire. By creating an online database of audio and ultrasound recordings of native and non-native speakers of foreign languages, we will be able to compare and contrast the movements of the tongue when speakers attempt the same sounds. These comparisons can be used by linguists, speech therapists, and language learners to analyze and alter their own pronunciations. This database provides extensive data on a subject that is understudied due to the prevalence of anglocentrism in research and makes use of the diverse language population in and around George Mason University.

Video Transcript

Linguistics is, simply, the study of language. Our project focuses on a branch of linguistics called phonetics, which studies how speech sounds are produced and interpreted. The goal of this project is to create an online corpus of Non-Native speakers of foreign languages. A corpus is an online database that houses written and/or spoken material for the purpose of research. We have worked diligently this summer to create a publicly available source of ultrasound and audio data. The Non-Native Articulatory Corpus, or NNA for short, will be used by researchers to study the similarities and differences in the tongue movements of native and non-native speakers of foreign languages. The best way to show this work would be to introduce NNA.GMU.EDU. This is the Home Screen of the NNA Corpus. The website will likely change a great deal over the next year, but this will be the fundamental outline for what purpose the website will serve. We intend for this website to fill a gap in research data while maintaining a functional and pleasing presentation. There have been many studies focusing on English as a Second Language, so we wanted to use our resources at George Mason to study students of the foreign languages offered. This is a long-term project, where we will see dozens of speakers from each of the different languages spoken on our campus. Starting with French, we used the ultrasound equipment to record these speakers and start populating the Non-Native Articulatory Corpus. Each speaker reads the same list of French sentences, which includes as many of the different sounds as possible. Our corpus covers non-native speakers at all levels, from beginner to advanced, and our number one goal this summer was to get the website up and running as well as to begin the process of populating it with our non-native French speakers. You will be able to search the ultrasound data by target-language, native language, speaker experience level, age, and more. The website will be used as a tool for both researchers and learners in order to study and even attempt to replicate the tongue’s motor movements As the project grows, so will our staff. There are many steps before recording the first participant. Each language requires the Primary Investigators to create a complex list of representative stimuli for the language. Perfecting our knowledge of before, during, and after participant recording has been imperative. With the help of the graduate researchers and the professors this summer, we undergraduate researchers have worked to create a set of procedures for conducting the data collection. To help both our future researchers, and those who visit our website, we have created manuals for the purposes of explaining how to use the ultrasound and software as well as the procedures for interacting with participants. Most corpora only have acoustic data because it is simple and economical to produce, acquire, and store online. The NNA Corpus will provide an articulatory resource that is frequently not available to most researchers. We’ve set the groundwork to give the NNA Corpus the start it needs to become the unique and essential resource it can be. This database will grow throughout the years to include all of the languages offered at George Mason. If you think it would be fun to see your tongue as you speak, keep an eye on our website and social media for when we will start recording participants of your second language. Check out the other two videos that we have made this summer to give you a look into the lab and analysis work that went into the making of this project. Thank you!

For more on this topic see:
NNA Corpus Video 2: Lab and Procedures
NNA Corpus Video 3: Ultrasound Analysis

One reply on “NNA Corpus Video 1: Intro to Linguistic Corpus”

Leave a Reply