Break indices represent a rating for the degree of juncture perceivedbetween each pair of words and between the final word and the silenceat the end of the utterance. They are to be marked after all wordsthat have been transcribed in the orthographic tier. All junctures --including those after fragments and filled pauses -- must be assignedan explicit break index value; there is no default juncture type.

The orthographic tier will be used only for the transcription of orthographic words. In the waves(tm) label file, each word's orthographic form should be marked at the end of the final segment in the word, as determined by the labeller from the waveform or spectrogram record. That is, each orthographic word will be marked at its right `edge'.

  1. Break Index Values

    Values for the break index are chosen from the following set:

      0 -- for cases of clear phonetic marks of clitic groups; e.g. the medial affricate in contractions of `did you' or a flap as in `got it'.

      1 -- most phrase-medial word boundaries.

      2 -- a strong disjuncture marked by a pause or virtual pause, but with no tonal marks; i.e. a well-formed tune continues across the juncture -- OR -- a disjuncture that is weaker than expected at what is tonally a clear intermediate or full intonation phrase boundary.

      3 -- intermediate intonation phrase boundary; i.e. marked by a single phrase tone affecting the region from the last pitch accent to the boundary.

      4 -- full intonation phrase boundary; i.e. marked by a final boundary tone after the last phrase tone.

    For example, a typical fluent utterance of the following sentence:

      Did you want an example?

    might have a `0' between `Did' and `you' indicating palatalization ofthe /d j/ sequence across the boundary between these words.Similarly, the break index value between `want' and `an' might againbe `0' indicating deletion of /t/ and subsequent flapping of /n/. Theremaining break index values would probably be `1' between `you' and`want' and between `an' and `example', indicating the presence of amere word boundary, and `4' at the end of the utterance, indicatingthe end of a well-formed intonation phrase.

    In the waves(tm) break index label file, the number should be associated with a point in time at the end of each word, as indicated in theorthographic tier (Section 2). It should be located exactly at, orslightly to the right, of this word marker, so that break indices canbe unambiguously associated with other tiers.

  2. Uncertainty and Underspecification

    Transcriber uncertainty about break-index strength is to be indicatedwith a minus (`-') affixed directly to the right of the break index(e.g. `1-' to indicate uncertainty between `0' and `1'; `2-' toindicate uncertainty between `2' and `1'; and so on).

    The full ToBI transcription must include both break index values andtone values. However, to accommodate backward compatibility withpreviously labelled databases or to allow intermediate stages in thelabelling process, a partial ToBI transcription may have only breakindex values or only tone values assigned. Underspecification ofbreak index values may be indicated by a value of `X' at the wordboundary in the break index tier.

  3. Disfluencies

    The perception of an audible hesitation (for example, an abrupt cutoffor a prolongation) can be marked by the diacritic `p' immediately tothe right of the break index (e.g. `3p'). This diacritic should beapplied only to break indices of 1, 2, or 3. We expect that `1p' willbe used for abrupt cutoffs, and `2p' and `3p' will be used to indicateprolongation, with `3p' suggesting hesitation after the onset of thetonal marks for an intermediate phrase. (See also Section 5.)