APLAWDW Database Structure

APLAWD contains 151 difference utterances each repeated ten times by each of ten
British RP (received pronounciation) speakers. The data was recorded in anechoic
conditions at 20 kHz, filtered with an 8-pole elliptical filter and sampled with
a 12-bit A/D converter. Simultaneous recordings were made of speech and
laryngograph (EGG) signals. The polarity of the laryngograph signal is such that
a positive value denotes glottal closure. Further details are given in aplawd.pdf.

The recordings were made at University College London between June 1987 and Feb 1988
to support the SPAR (Speech-pattern algorithms and representations) research project
conducted by University College London, Imperial College London, GEC Hirst Research
Centre, Plessey Research and Leeds University (Alvey project MMI-09).

A typical data file has the path name

             aplawdw/w/xx/y/awxxyz.eee

where:

  w/xx   w identifies the type of word or phrase and xx identifies a
         particular example of this type according to the follwing table:

         w  xx   Num
         - -----
         c 01      1 : calibration tone (speakers f-j only)
         d 00-09  10 : digit (with 0 pronounced as "zero")
         l 0a-0z  26 : letter of alphabet
         s 01-05   5 : sentence
         w 01-66  66 : isolated monosyllabic word (see table below for texts)
         x 01-44  44 : isolated multisyllabic word or word-pair

  y      identifies one of ten speakers: a-e are male, f-j are female
         The speaker initials: a=ma, b=mb, c=dh, d=jm, e=jw, f=ea, g=gb, h=hb, i=jh, j=sr

  z      identifies the utterance repetition in the range 0-9

  eee    identifies the type of file according to:
  
         wav  is a single-channel speech recording in Microsoft WAV format. Each
              12-bit sample is left-justified in a 16-bit word, i.e. the raw data
              is multiplied by 16. The file creation date gives the date of recording.
         egg  is a single channel laryngograph recording in Microsoft WAV format.
         txt  is a single line ASCII text file of the form "a b text" where text is
              the English text of the utterance and a,b are the starting and ending
              sample numbers.
         phn  is a phonetic transcription in UTF-8 unicode format. Each line has
              the form "a b phoneme" where phoneme is the unicode phoneme (or
              occasionally phoneme pair) and a,b are the starting and ending sample
              numbers. The first sample in the file is numbered 0 and b in one line
              is normally equal to a in the following line. Phonetic transcriptions
              are currently only available for the five sentence tokens and only for
              speakers a,e,f,g,i,j.

Missing files

Some files are missing from the current database. The table below lists the 152
tokens in the database and shows, in each case, how many repetitions are available
for each of the ten speakers, a to j. In particular, most of the w and x tokens
are missing for speakers f, g, h and i.
       
         a   b   c   d   e   f   g   h   i   j  text
         
ac01--	 0	 0	 0	 0	 0	 1	 1	 1	 1	 1	
ad00--	10	10	10	10	10	10	10	10	10	10	zero
ad01--	10	10	10	10	10	10	10	10	10	10	one
ad02--	10	10	10	10	10	10	10	10	10	10	two
ad03--	10	10	10	 9	10	10	10	10	10	10	three
ad04--	10	10	10	10	10	10	10	10	10	10	four
ad05--	10	10	10	10	10	10	10	10	10	10	five
ad06--	10	10	10	10	10	10	10	10	10	10	six
ad07--	10	10	10	10	10	10	10	10	10	10	seven
ad08--	10	10	10	10	10	10	10	10	10	10	eight
ad09--	10	10	10	10	10	 9	 9	10	10	10	nine
al0a--	10	10	10	10	10	10	10	10	10	10	A
al0b--	10	10	10	10	10	10	10	10	10	10	B
al0c--	10	10	10	10	10	10	10	10	10	10	C
al0d--	10	 9	10	10	10	10	10	10	10	10	D
al0e--	10	10	10	10	10	10	10	10	10	10	E
al0f--	10	10	10	10	10	10	10	10	10	10	F
al0g--	10	10	10	10	10	10	10	10	10	10	G
al0h--	10	10	10	10	10	10	10	10	10	10	H
al0i--	10	10	10	10	10	10	10	10	10	10	I
al0j--	10	10	10	10	10	10	10	10	10	10	J
al0k--	 9	10	10	10	10	10	10	10	10	10	K
al0l--	10	10	10	10	10	10	10	10	10	10	L
al0m--	10	10	10	10	10	10	10	10	10	10	M
al0n--	10	10	10	10	10	10	10	10	10	10	N
al0o--	 9	10	10	10	10	10	10	10	10	 9	O
al0p--	10	10	 9	10	10	10	10	10	10	10	P
al0q--	10	10	10	10	10	10	10	10	10	10	Q
al0r--	10	10	10	10	10	10	10	10	10	10	R
al0s--	10	10	10	10	10	10	10	10	10	10	S
al0t--	10	10	10	10	10	10	10	10	10	10	T
al0u--	 9	10	10	10	10	10	10	 9	10	10	U
al0v--	10	10	10	10	10	10	10	 9	10	10	V
al0w--	10	10	10	10	10	10	10	10	10	10	W
al0x--	10	10	10	10	10	10	10	10	10	10	X
al0y--	10	10	10	10	10	10	10	10	10	10	Y
al0z--	10	10	10	10	10	10	10	10	10	10	Z
as01--	10	10	10	10	10	10	10	10	10	10	George made the girl measure a good blue vase
as02--	10	 9	10	10	10	10	10	10	10	10	Why are you early you owl?
as03--	10	 8	10	10	10	10	10	10	10	10	Cathy hears a voice amongst SPAR's data
as04--	10	10	10	10	10	10	10	10	 9	10	Be sure to fetch a file and send their's off to Hove
as05--	10	10	10	10	10	10	10	10	 9	10	Six plus three equals nine
aw01--	10	10	10	10	10	 0	 0	 0	 0	10	airs
aw02--	10	10	10	10	10	 0	 0	 0	 0	10	at
aw03--	10	10	10	10	10	 0	 0	 0	 0	10	bathe
aw04--	10	10	10	10	10	 0	 0	 0	 0	10	blue
aw05--	10	10	10	10	10	10	10	10	10	10	Cath
aw06--	10	10	10	10	10	 0	 0	 0	 0	10	check
aw07--	10	10	10	10	10	 0	 0	 0	 0	 9	chow
aw08--	10	10	10	10	10	 0	 0	 0	 0	10	desk
aw09--	10	10	10	10	10	 0	 0	 0	 0	10	down
aw10--	10	10	10	10	10	 0	 0	 0	 0	10	ears
aw11--	10	10	10	10	10	 0	 0	 0	 0	10	ebb
aw12--	10	10	10	10	10	 0	 0	 0	 0	10	end
aw13--	10	10	10	10	10	 0	 0	 0	 0	10	false
aw14--	10	10	10	10	10	 0	 0	 0	 0	10	fetch
aw15--	10	10	10	10	10	 0	 0	 0	 0	10	file
aw16--	10	10	10	10	10	 0	 0	 0	 0	10	from
aw17--	10	10	10	10	10	10	10	10	10	10	George
aw18--	10	10	10	10	10	 0	 0	 0	 0	10	girl
aw19--	10	10	10	10	10	 0	 0	 0	 0	10	good
aw20--	10	 9	10	10	10	10	10	10	10	10	Hove
aw21--	10	10	10	10	10	 0	 0	 0	 0	10	hears
aw22--	10	10	10	10	10	 0	 0	 0	 0	10	irk
aw23--	10	10	10	10	10	 0	 0	 0	 0	10	itch
aw24--	10	10	10	10	10	 0	 0	 0	 0	10	left
aw25--	10	10	10	 9	10	 0	 0	 0	 0	10	load
aw26--	10	10	10	10	10	 0	 0	 0	 0	10	made
aw27--	10	10	10	10	10	 0	 0	 0	 0	10	moors
aw28--	10	10	10	10	10	 0	 0	 0	 0	10	move
aw29--	10	10	10	10	10	 0	 0	 0	 0	10	no
aw30--	10	10	10	10	10	 0	 0	 0	 0	10	nought
aw31--	10	10	10	10	10	 0	 0	 0	 0	10	of
aw32--	10	10	10	10	10	 0	 0	 0	 0	10	off
aw33--	10	10	10	10	10	 0	 0	 0	 0	10	oil
aw34--	10	10	10	10	10	 0	 0	 0	 0	10	on
aw35--	10	10	10	10	10	 0	 0	 0	 0	10	ooze
aw36--	10	10	10	10	10	 0	 0	 0	 0	10	ought
aw37--	10	10	10	10	10	 0	 0	 0	 0	10	owl
aw38--	10	 8	10	10	10	 0	 0	 0	 0	10	peer
aw39--	10	10	10	10	10	 0	 0	 0	 0	10	plus
aw40--	10	10	10	10	10	 0	 0	 0	 0	10	point
aw41--	 9	 9	10	10	10	 0	 0	 0	 0	10	purr
aw42--	10	10	10	10	10	 0	 0	 0	 0	10	push
aw43--	10	10	10	10	10	 0	 0	 0	 0	10	pig
aw44--	10	10	10	10	10	 0	 0	 0	 0	10	rare
aw45--	10	10	10	10	10	 0	 0	 0	 0	 9	read
aw46--	10	10	10	10	10	 0	 0	 0	 0	10	right
aw47--	10	10	10	10	10	 0	 0	 0	 0	10	rouge
aw48--	10	10	10	10	10	 0	 0	 0	 0	10	run
aw49--	10	10	10	10	10	 0	 0	 0	 0	10	save
aw50--	10	10	10	10	10	 0	 0	 0	 0	10	send
aw51--	10	10	10	10	10	 0	 0	 0	 0	10	sing
aw52--	10	10	10	10	10	10	10	10	10	10	Spars
aw53--	10	10	10	10	10	 0	 0	 0	 0	10	speed
aw54--	 9	10	10	10	10	 0	 0	 0	 0	10	start
aw55--	10	10	10	10	10	 0	 0	 0	 0	10	stop
aw56--	10	10	10	10	10	 0	 0	 0	 0	10	sure
aw57--	10	10	10	10	10	 0	 0	 0	 0	10	ten
aw58--	10	10	10	10	10	 0	 0	 0	 0	10	the
aw59--	10	10	10	10	10	 0	 0	 0	 0	10	theirs
aw60--	10	10	10	10	10	 0	 0	 0	 0	10	times
aw61--	10	10	 9	10	10	 0	 0	 0	 0	10	toy
aw62--	10	10	10	10	10	 0	 0	 0	 0	10	true
aw63--	10	10	10	10	10	 0	 0	 0	 0	10	up
aw64--	10	10	10	10	10	 0	 0	 0	 0	10	vase
aw65--	10	10	10	10	10	 0	 0	 0	 0	10	voice
aw66--	10	10	10	10	10	 0	 0	 0	 0	10	yes
ax01--	10	10	10	10	10	 0	 0	 0	 0	10	amongst
ax02--	10	10	10	10	10	 0	 0	 0	 0	 9	bather
ax03--	10	10	10	10	10	 0	 0	 0	 0	10	calculate
ax04--	10	10	10	10	10	10	10	10	10	10	Cathy
ax05--	10	10	 9	10	10	 0	 0	 0	 0	10	data
ax06--	10	10	10	10	10	 0	 0	 0	 0	10	divided_by
ax07--	10	 9	10	10	10	 0	 0	 0	 0	10	double
ax08--	10	10	10	10	10	 0	 0	 0	 0	10	early
ax09--	10	10	10	10	10	 0	 0	 0	 0	10	equals
ax10--	10	10	10	10	10	 0	 0	 0	 0	10	for_George
ax11--	10	10	10	10	10	 0	 0	 0	 0	10	for_theirs
ax12--	10	10	10	10	10	 0	 0	 0	 0	10	forward
ax13--	10	10	10	10	10	10	10	10	10	10	Georgie
ax14--	10	10	10	10	10	 0	 0	 0	 0	10	happy
ax15--	10	10	10	10	10	 0	 0	 0	 0	10	her_right
ax16--	10	10	10	10	10	 0	 0	 0	 0	10	her_tea
ax17--	 9	10	10	10	10	 0	 0	 0	 0	 9	her_zip
ax18--	10	10	10	10	10	 0	 0	 0	 0	10	hundred
ax19--	10	10	10	10	10	 0	 0	 0	 0	10	her_cat
ax20--	10	 9	10	10	10	 0	 0	 0	 0	10	her_check
ax21--	10	10	10	10	10	 0	 0	 0	 0	10	her_girl
ax22--	10	10	10	10	10	 0	 0	 0	 0	10	her_leg
ax23--	10	10	10	10	10	 0	 0	 0	 0	10	her_mate
ax24--	10	10	10	10	10	 0	 0	 0	 0	10	her_note
ax25--	10	10	10	10	10	 0	 0	 0	 0	10	her_pig
ax26--	10	10	10	10	10	 0	 0	 0	 0	10	letter
ax27--	10	10	10	10	10	 0	 0	 0	 0	10	measure
ax28--	10	10	10	10	10	 0	 0	 0	 0	10	message
ax29--	10	10	10	10	10	 0	 0	 0	 0	10	million
ax30--	10	10	10	 8	10	 0	 0	 0	 0	10	minus
ax31--	10	 9	10	10	10	 0	 0	 0	 0	10	piggy
ax32--	10	 9	10	10	10	 0	 0	 0	 0	10	pushy
ax33--	10	10	10	10	10	 0	 0	 0	 0	10	percent
ax34--	10	10	10	10	10	 0	 0	 0	 0	10	reverse
ax35--	10	10	10	10	10	 0	 0	 0	 0	10	searcher
ax36--	10	10	10	10	10	 0	 0	 0	 0	10	she_hears
ax37--	10	10	10	10	10	 0	 0	 0	 0	10	singer
ax38--	10	10	10	10	10	 0	 0	 0	 0	10	so_sure
ax39--	10	10	10	10	10	 0	 0	 0	 0	10	so_thick
ax40--	10	10	10	10	10	 0	 0	 0	 0	10	thousand
ax41--	10	10	10	10	10	 0	 0	 0	 0	10	toffee
ax42--	10	10	10	10	10	10	10	10	10	10	Tommy
ax43--	10	10	10	10	10	 0	 0	 0	 0	10	we_fetch
ax44--	10	10	10	10	10	10	10	10	10	10	Zhivago


Mike Brookes, August 2015