Package 'wisclabmisc' reference manual

Title:	Tools to Support the 'WiscLab'
Description:	A collection of 'R' functions for use (and re-use) across 'WiscLab' projects. These are analysis or presentation oriented functions--that is, they are not for data reading or data cleaning.
Authors:	Tristan Mahr [aut, cre]
Maintainer:	Tristan Mahr <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2025-03-20 19:19:57 UTC
Source:	https://github.com/tjmahr/wisclabmisc

Set default arguments for brms model fitting

Description

Set default arguments for brms model fitting

Usage

brms_args_create(
  ...,
  .backend = "cmdstanr",
  .threads = 2,
  .chains = 4,
  .cores = 4,
  .iter = 2000,
  .silent = 0,
  .file_refit = "on_change",
  .refresh = 25,
  .control = list()
)
brms_args_create(
  ...,
  .backend = "cmdstanr",
  .threads = 2,
  .chains = 4,
  .cores = 4,
  .iter = 2000,
  .silent = 0,
  .file_refit = "on_change",
  .refresh = 25,
  .control = list()
)

Arguments

`...`	arguments to `brms::brm()` to use as default values.
`.backend`, `.threads`, `.chains`, `.cores`, `.iter`, `.silent`, `.file_refit`, `.refresh`, `.control`	These arguments set default default values. Overwrite these defaults by using the argument without the `.` prefix.

Details

One can set the default value for adapt_delta directly (adapt_delta = .98). This value will be propagated to control$adapt_delta.

Value

A function for generating a list of arguments to brms::brm().

Examples

brm_args <- brms_args_create()

# using package-provided defaults
brm_args()

# overwriting a default value
brm_args(iter = 500)

# adapt_delta is handled specially
brm_args(adapt_delta = .95)
# adapt_delta is handled specially
brm_args(adapt_delta = .95)

# We can overwrite the package-provided defaults
other_brm_args <- brms_args_create(iter = 4000, backend = "rstan")
other_brm_args()

# And overwrite them too
other_brm_args(backend = "cmdstanr")
brm_args <- brms_args_create()

# using package-provided defaults
brm_args()

# overwriting a default value
brm_args(iter = 500)

# adapt_delta is handled specially
brm_args(adapt_delta = .95)
# adapt_delta is handled specially
brm_args(adapt_delta = .95)

# We can overwrite the package-provided defaults
other_brm_args <- brms_args_create(iter = 4000, backend = "rstan")
other_brm_args()

# And overwrite them too
other_brm_args(backend = "cmdstanr")

Compute the percentage of points under each centile line

Description

check_model_centiles() computes centiles from a model and computes the calibration of each centile. check_computed_centiles() works on a dataframe with precomputed c[XX] columns of centiles.

Usage

check_model_centiles(
  data,
  model,
  var_x,
  var_y,
  centiles = c(5, 10, 25, 50, 75, 90, 95)
)

check_computed_centiles(data, var_y)
check_model_centiles(
  data,
  model,
  var_x,
  var_y,
  centiles = c(5, 10, 25, 50, 75, 90, 95)
)

check_computed_centiles(data, var_y)

Arguments

`data`	a dataset used to fit a model. If the dataframe is grouped with `dplyr::group_by()`, sample centiles are computed for each group.
`model`	a gamlss model prepared by `mem_gamlss()`
`var_x`, `var_y`	bare column names of the predictor and outcome variables
`centiles`	centiles to use for prediction. Defaults to `c(5, 10, 25, 50, 75, 90, 95)`.

Value

a tibble the number of points and the percentage of points less than or equal to each quantile value.

Compute chronological age in months

Description

Ages are rounded down to the nearest month. A difference of 20 months, 29 days is interpreted as 20 months.

Usage

chrono_age(t1, t2)
chrono_age(t1, t2)

Arguments

t1, t2

dates in "yyyy-mm-dd" format

Value

the chronological ages in months. NA is returned if the age cannot be computed.

Examples

# Two years exactly
chrono_age("2014-01-20", "2012-01-20")
#> 24

# Shift a year
chrono_age("2014-01-20", "2013-01-20")
#> 12
chrono_age("2014-01-20", "2011-01-20")
#> 36

# Shift a month
chrono_age("2014-01-20", "2012-02-20")
#> 23
chrono_age("2014-01-20", "2011-12-20")
#> 25

# 3 months exactly
chrono_age("2014-05-10", "2014-02-10")
#> 3

# Borrow a month when the earlier date has a later day
chrono_age("2014-05-10", "2014-02-11")
#> 2, equal to 2 months, 29 days rounded down to nearest month

# Inverted argument order
chrono_age("2012-01-20", "2014-01-20")
#> 24

# Multiple dates
t1 <- c("2012-01-20", "2014-02-10", "2010-10-10")
t2 <- c("2014-01-20", "2014-05-10", "2014-11-10")
chrono_age(t1, t2)
#> [1] 24  3 49
# Two years exactly
chrono_age("2014-01-20", "2012-01-20")
#> 24

# Shift a year
chrono_age("2014-01-20", "2013-01-20")
#> 12
chrono_age("2014-01-20", "2011-01-20")
#> 36

# Shift a month
chrono_age("2014-01-20", "2012-02-20")
#> 23
chrono_age("2014-01-20", "2011-12-20")
#> 25

# 3 months exactly
chrono_age("2014-05-10", "2014-02-10")
#> 3

# Borrow a month when the earlier date has a later day
chrono_age("2014-05-10", "2014-02-11")
#> 2, equal to 2 months, 29 days rounded down to nearest month

# Inverted argument order
chrono_age("2012-01-20", "2014-01-20")
#> 24

# Multiple dates
t1 <- c("2012-01-20", "2014-02-10", "2010-10-10")
t2 <- c("2014-01-20", "2014-05-10", "2014-11-10")
chrono_age(t1, t2)
#> [1] 24  3 49

Create an ROC curve from observed data

Description

Create an ROC curve from observed data

Usage

compute_empirical_roc(
  data,
  response,
  predictor,
  direction = "auto",
  best_weights = c(1, 0.5),
  ...
)
compute_empirical_roc(
  data,
  response,
  predictor,
  direction = "auto",
  best_weights = c(1, 0.5),
  ...
)

Arguments

`data`	a dataframe containing responses (groupings) and predictor variable
`response`	a bare column name with the group status (control vs. cases)
`predictor`	a bare column name with the predictor to use for classification
`direction`	`direction` to set for the for `pROC::roc()`. Defaults to `"auto"`.
`best_weights`	weights for computing the best ROC curve points. Defaults to `c(1, .5)`, which are the defaults used by `pROC::coords()`.
`...`	additional arguments passed to `pROC::roc()`.

Value

a new dataframe of ROC coordinates is returned with columns for the predictor variable, .sensitivities, .specificities, .auc, .direction, .controls, .cases, .n_controls, .n_cases, .is_best_youden and .is_best_closest_topleft.

Examples

set.seed(100)
x1 <- rnorm(100, 4, 1)
x2 <- rnorm(100, 2, .5)
both <- c(x1, x2)
steps <- seq(min(both), max(both), length.out = 200)
d1 <- dnorm(steps, mean(x1), sd(x1))
d2 <- dnorm(steps, mean(x2), sd(x2))
data <- tibble::tibble(
  y = steps,
  d1 = d1,
  d2 = d2,
  outcome = rbinom(200, 1, prob = 1 - (d1 / (d1 + d2))),
  group = ifelse(outcome, "case", "control")
)

# get an ROC on the fake data
compute_empirical_roc(data, outcome, y)
# this guess the cases and controls from the group name and gets it wrong
compute_empirical_roc(data, group, y)
# better
compute_empirical_roc(data, group, y, levels = c("control", "case"))
set.seed(100)
x1 <- rnorm(100, 4, 1)
x2 <- rnorm(100, 2, .5)
both <- c(x1, x2)
steps <- seq(min(both), max(both), length.out = 200)
d1 <- dnorm(steps, mean(x1), sd(x1))
d2 <- dnorm(steps, mean(x2), sd(x2))
data <- tibble::tibble(
  y = steps,
  d1 = d1,
  d2 = d2,
  outcome = rbinom(200, 1, prob = 1 - (d1 / (d1 + d2))),
  group = ifelse(outcome, "case", "control")
)

# get an ROC on the fake data
compute_empirical_roc(data, outcome, y)
# this guess the cases and controls from the group name and gets it wrong
compute_empirical_roc(data, group, y)
# better
compute_empirical_roc(data, group, y, levels = c("control", "case"))

Compute positive and negative predictive value

Description

Compute positive and negative predictive value

Usage

compute_predictive_value_from_rates(sensitivity, specificity, prevalence)
compute_predictive_value_from_rates(sensitivity, specificity, prevalence)

Arguments

sensitivity, specificity, prevalence

vectors of confusion matrix rates

Details

These vectors passed into this function should be some common length and/or length 1. For example, 4 sensitivities, 4 specificities and 1 incidence will work because the sensitivities and specificities have a common length and we can safely recycle (reuse) the incidence value. But 4 sensitivities, 2 specificities, and 1 incidence will fail because there is not a common length.

Value

a tibble with the columns sensitivity, specificity, prevalence, ppv, npv where ppv and npv are the positive predictive value and the negative predictive value.

Examples

compute_predictive_value_from_rates(
  sensitivity = .9,
  specificity = .8,
  prevalence = .05
)

compute_predictive_value_from_rates(
  sensitivity = .67,
  specificity = .53,
  prevalence = c(.15, .3)
)
compute_predictive_value_from_rates(
  sensitivity = .9,
  specificity = .8,
  prevalence = .05
)

compute_predictive_value_from_rates(
  sensitivity = .67,
  specificity = .53,
  prevalence = c(.15, .3)
)

Create an ROC curve from smoothed densities

Description

Create an ROC curve from smoothed densities

Usage

compute_smooth_density_roc(
  data,
  controls,
  cases,
  along = NULL,
  best_weights = c(1, 0.5),
  direction = "auto",
  ...
)
compute_smooth_density_roc(
  data,
  controls,
  cases,
  along = NULL,
  best_weights = c(1, 0.5),
  direction = "auto",
  ...
)

Arguments

`data`	a dataframe containing densities
`controls`, `cases`	bare column name for the densities of the control group
`along`	optional bare column name for the response values
`best_weights`	weights for computing the best ROC curve points. Defaults to `c(1, .5)`, which are the defaults used by `pROC::coords()`.
`direction`	`direction` to set for the for `pROC::roc()`. Defaults to `"auto"`.
`...`	additional arguments. Not used currently.

Value

the dataframe is updated with new columns for the .sensitivities, .specificities, .auc, .roc_row, .is_best_youden and .is_best_closest_topleft.

Examples

set.seed(100)
x1 <- rnorm(100, 4, 1)
x2 <- rnorm(100, 2, .5)
both <- c(x1, x2)
steps <- seq(min(both), max(both), length.out = 200)
d1 <- dnorm(steps, mean(x1), sd(x1))
d2 <- dnorm(steps, mean(x2), sd(x2))
data <- tibble::tibble(
  y = steps,
  d1 = d1,
  d2 = d2,
  outcome = rbinom(200, 1, prob = 1 - (d1 / (d1 + d2))),
  group = ifelse(outcome, "case", "control")
)
compute_smooth_density_roc(data, d1, d2)
compute_smooth_density_roc(data, d1, d2, along = y)

# terrible ROC because the response is not present (just the densities)
data_shuffled <- data[sample(seq_len(nrow(data))), ]
compute_smooth_density_roc(data_shuffled, d1, d2)

# sorted along response first: correct AUC
compute_smooth_density_roc(data_shuffled, d1, d2, along = y)
set.seed(100)
x1 <- rnorm(100, 4, 1)
x2 <- rnorm(100, 2, .5)
both <- c(x1, x2)
steps <- seq(min(both), max(both), length.out = 200)
d1 <- dnorm(steps, mean(x1), sd(x1))
d2 <- dnorm(steps, mean(x2), sd(x2))
data <- tibble::tibble(
  y = steps,
  d1 = d1,
  d2 = d2,
  outcome = rbinom(200, 1, prob = 1 - (d1 / (d1 + d2))),
  group = ifelse(outcome, "case", "control")
)
compute_smooth_density_roc(data, d1, d2)
compute_smooth_density_roc(data, d1, d2, along = y)

# terrible ROC because the response is not present (just the densities)
data_shuffled <- data[sample(seq_len(nrow(data))), ]
compute_smooth_density_roc(data_shuffled, d1, d2)

# sorted along response first: correct AUC
compute_smooth_density_roc(data_shuffled, d1, d2, along = y)

Acquisition and developmental descriptions of consonants and vowels

Description

This package provides dataframes of information about the consonants and vowels in American English. The following datasets collect acquisition (acq) features which (try to) characterize the expected acquisition or speech-motor difficulty of speech sounds. See also data_features_consonants.

Usage

data_acq_consonants
data_acq_consonants

Format

An object of class tbl_df (inherits from tbl, data.frame) with 24 rows and 10 columns.

Details

Crowe and McLeod (2020) norms for English consonant acquisition

Crowe and McLeod (2020, below as the cm2020_ variables) provides a systematic review and summary statistics for age of acquisition norms for English consonants. They scoured the literature of acquisition ages for individual consonants and computed summary statistics on them. They considered just accuracy of sounds when produced in single words. Their sources include a mix of a journal articles and norms for articulation assessments. They do not weight statistics from individual studies by sample size or sampling procedure.

I prepared the Crowe and McLeod (2020) data by copying the relevant numbers from their Table 2 making the following changes: 1) rounding mean and SD values to 1 decimal point (3 days for ages in months), 2) dropping /ʍ/, 3) using /r/, /g/, /tʃ/, /dʒ/ for IPA characters instead of the specialized characters used in the article.

The early 8, middle 8 and late 8 (Shriberg, 1993)

The English consonants are often broken down into three developmental classes, based on Shriberg (1993):

Early 8: m b j n w d p h
Middle 8: t ŋ k g f v tʃ dʒ,
Late 8: ʃ θ s z ð l r ʒ

This classification is included as the s93_eights column.

From these names alone, we might interpret these classes such that sounds in the Early 8 would be acquired before the ones in the Middle 8, and likewise that the Middle 8 would be acquired before the Late 8. But these classes were not created by examining patterns of typical consonant acquisition.

For some context, Shriberg (1993) introduces the Early 8, Middle 8, and Late 8 data by describing the following panel of the article's Figure 7:

About which, Shriberg (1993) says: "The values for this trend, which is a profile of consonant mastery, were taken from a group of 64 3- to 6-year-old speech-delayed children Shriberg, Kwiatkowski, & Gruber, 1992). Severity of involvement of the 24 English consonants is represented as the percentage correct for each consonant sorted in decreasing order from left to right. Notice that the most obvious breaks in this function allow for a division of the 24 consonants into three groups of eight sounds termed the Early-8, averaging over 75% correct, the Middle-8, averaging 25%-75% correct, and the Late-8, including consonants averaging less than 25% correct in continuous conversational speech (/ʒ/ is infrequently represented in young, speech-delayed children's spontaneous conversational speech)."

So, there were 64 3–6-year-old children with speech delays, and consonant sounds were divided into three classes based on how often these children produced the sounds correctly on average in a conversational speech sample. This classification is not so much a measure of the relative ordering of speech sound development as it is the relative difficulty of these sounds for children with a speech delay of unknown origin. It would be more appropriate to replace the levels of Early/Middle/Late with Easy/Medium/Hard.

Consonant acquisition features

data_acq_consonants provides the following features:

knitr::kable(data_acq_consonants)

phone	cmubet	wiscbet	cm2020_90_age_mean	cm2020_90_age_sd	cm2020_90_age_min	cm2020_90_age_max	cm2020_90_num_studies	cm2020_90_stage	s93_eights
p	P	p	33.2	6.9	24	48	12	early	early
b	B	b	31.4	7.8	24	48	13	early	early
t	T	t	38.5	9.2	24	60	13	early	middle
d	D	d	35.7	6.7	24	48	13	early	early
k	K	k	37.7	7.3	24	48	13	early	middle
g	G	g	36.8	6.6	24	48	13	early	middle
tʃ	CH	tsh	53.5	10.7	36	72	12	middle	middle
dʒ	JH	dzh	51.0	11.8	36	72	13	middle	middle
m	M	m	33.2	6.7	24	48	13	early	early
n	N	n	33.1	7.4	24	48	13	early	early
ŋ	NG	ng	40.3	10.8	24	55	10	early	middle
f	F	f	38.3	6.3	24	48	13	early	middle
v	V	v	50.8	10.8	36	66	12	middle	middle
θ	TH	th	77.0	7.4	72	96	10	late	late
ð	DH	dh	69.0	11.3	54	96	12	late	late
s	S	s	51.3	16.3	24	84	12	middle	late
z	Z	z	56.8	14.3	30	84	11	middle	late
ʃ	SH	sh	55.0	10.5	36	72	12	middle	late
ʒ	ZH	zh	70.7	12.2	60	84	3	late	late
h	HH	h	35.0	7.0	24	48	13	early	early
l	L	l	53.8	10.4	24	60	12	middle	late
r	R	r	66.6	18.6	30	96	12	late	late
w	W	w	35.2	6.8	24	48	13	early	early
j	Y	j	45.8	11.0	30	60	13	early	early

Description of each column:

phone: phone in IPA
cmubet: phone in the CMU alphabet
wiscbet: phone in an older system used by our lab
cm2020_90_age_mean, cm2020_90_age_sd, cm2020_90_age_min, cm2020_90_age_max: Age of acquisition statistics reported by Crowe & McLeod (2020). Statistics are the mean, SD, min and max age (in months) when children reached 90% accuracy on a consonant.
cm2020_90_num_studies: Number of studies used by Crowe & McLeod (2020) to compute the corresponding statistics.
cm2020_90_stage: Developmental stage assigned to the consonant by Crowe & McLeod (2020). Sounds with an age_mean before 48 months are early, before 60 months are middle, and of 60 or older are late.
s93_eights: Developmental stage of Shriberg (1993)—that is, the early 8, middle 8 and late 8 consonants.

References

Crowe, K., & McLeod, S. (2020). Children’s English Consonant Acquisition in the United States: A Review. American Journal of Speech-Language Pathology, 29(4), 2155–2169. https://doi.org/10.1044/2020_AJSLP-19-00168

Shriberg, L. D. (1993). Four New Speech and Prosody-Voice Measures for Genetics Research and Other Studies in Developmental Phonological Disorders. Journal of Speech, Language, and Hearing Research, 36(1), 105–140. https://doi.org/10.1044/jshr.3601.105

Simulated intelligibility scores by utterance length

Description

A dataset of simulated intelligibility scores for testing and demonstrating modeling functions. These were created by fitting a Bayesian model of the raw Hustad and colleagues (2020) and drawing 1 sample from the posterior distribution of expected predictions (i.e., "epreds). In other words, these values are model predictions of the original dataset. They are correlated with original dataset values at r = .86. We might think of the simulation as adding random noise to the original dataset.

Usage

data_example_intelligibility_by_length
data_example_intelligibility_by_length

Format

A data frame with 694 rows and 5 variables:

child: identifier for the child
age_months: child's age in months
length_longest: length of the child's longest utterance
tocs_level: utterance length
sim_intelligibility: child's intelligibility for the given utterance length (proportion of words said by the child that were correctly transcribed by two listeners)

References

Hustad, K. C., Mahr, T., Natzke, P. E. M., & Rathouz, P. J. (2020). Development of Speech Intelligibility Between 30 and 47 Months in Typically Developing Children: A Cross-Sectional Study of Growth. Journal of Speech, Language, and Hearing Research, 63(6), 1675–1687. https://doi.org/10.1044/2020_JSLHR-20-00008

Fake intelligibility data

Description

A dataset of fake intelligibility scores for testing and demonstrating modeling functions. These were created by randomly sampling 200 rows of an intelligibility dataset and adding random noise to the age_months and intelligibility variables. These values do not measure any real children but represent plausible age and intelligibility measurements from our kind of work.

Usage

data_fake_intelligibility
data_fake_intelligibility

Format

A data frame with 200 rows and 2 variables:

age_months: child's age in months
intelligibility: child's intelligibility (proportion of words said by the child that were correctly transcribed by two listeners)

Fake speaking rate data

Description

A dataset of fake speaking rate measures for testing and demonstrating modeling functions. These were created by randomly sampling 200 rows of a speaking rate dataset and adding random noise to the age_months and speaking_sps variables. These values do not measure any real children but represent plausible age and rate measurements from our kind of work.

Usage

data_fake_rates
data_fake_rates

Format

A data frame with 200 rows and 2 variables:

age_months: child's age in months
speaking_sps: child's speaking rate in syllables per second

Phonetic features of consonants and vowels

Description

This package provides dataframes of information about the consonants and vowels in American English. The phonetic features are conventional descriptions of how sounds are produced. See also data_acq_consonants.

Usage

data_features_consonants

data_features_vowels
data_features_consonants

data_features_vowels

Format

An object of class tbl_df (inherits from tbl, data.frame) with 24 rows and 11 columns.

An object of class tbl_df (inherits from tbl, data.frame) with 17 rows and 13 columns.

Details

Most of the phonetic features are self-evident and definitional. For example, /p/ is the bilabial voiceless stop. For fuzzier features, I consulted the general IPA chart and the Wikipedia page on English phonology. These issues included things like: what are the lax vowels again? or the last two rows of the consonant tables are approximants, so /r,l,j/ are approximants.

Some features have alternative feature sets in order to accommodate degrees of aggregation. For example, /r,l,j,w/ are approximant in manner but divided into liquid and glide in manner_alt.

Phonetic features of consonants

data_features_consonants is a dataframe with 24 rows and 10 variables.

knitr::kable(data_features_consonants)

phone	cmubet	wiscbet	voicing	voicing_alt	manner	manner_alt	place	place_fct	sonorance	sonorance_alt
p	P	p	voiceless	spread_glottis	stop	stop	labial	labial	obstruent	obstruent
b	B	b	voiced	plain	stop	stop	labial	labial	obstruent	obstruent
t	T	t	voiceless	spread_glottis	stop	stop	alveolar	alveolar	obstruent	obstruent
d	D	d	voiced	plain	stop	stop	alveolar	alveolar	obstruent	obstruent
k	K	k	voiceless	spread_glottis	stop	stop	velar	velar	obstruent	obstruent
g	G	g	voiced	plain	stop	stop	velar	velar	obstruent	obstruent
tʃ	CH	tsh	voiceless	spread_glottis	affricate	affricate	postalveolar	postalveolar	obstruent	strident
dʒ	JH	dzh	voiced	plain	affricate	affricate	postalveolar	postalveolar	obstruent	strident
m	M	m	voiced	NA	nasal	nasal	labial	labial	sonorant	sonorant
n	N	n	voiced	NA	nasal	nasal	alveolar	alveolar	sonorant	sonorant
ŋ	NG	ng	voiced	NA	nasal	nasal	velar	velar	sonorant	sonorant
f	F	f	voiceless	spread_glottis	fricative	fricative	labiodental	labiodental	obstruent	strident
v	V	v	voiced	plain	fricative	fricative	labiodental	labiodental	obstruent	strident
θ	TH	th	voiceless	spread_glottis	fricative	fricative	dental	dental	obstruent	obstruent
ð	DH	dh	voiced	plain	fricative	fricative	dental	dental	obstruent	obstruent
s	S	s	voiceless	spread_glottis	fricative	fricative	alveolar	alveolar	obstruent	strident
z	Z	z	voiced	plain	fricative	fricative	alveolar	alveolar	obstruent	strident
ʃ	SH	sh	voiceless	spread_glottis	fricative	fricative	postalveolar	postalveolar	obstruent	strident
ʒ	ZH	zh	voiced	plain	fricative	fricative	postalveolar	postalveolar	obstruent	strident
h	HH	h	voiceless	spread_glottis	fricative	fricative	glottal	glottal	obstruent	obstruent
l	L	l	voiced	NA	approximant	liquid	alveolar	alveolar	sonorant	sonorant
r	R	r	voiced	NA	approximant	liquid	postalveolar	postalveolar	sonorant	sonorant
w	W	w	voiced	NA	approximant	glide	labiovelar	NA	sonorant	sonorant
j	Y	j	voiced	NA	approximant	glide	palatal	palatal	sonorant	sonorant

Description of each column:

phone: phone in IPA
cmubet: phone in the CMU alphabet
wiscbet: phone in an older system used by our lab
voicing: voiced versus voiceless
voicing_alt: spread_glottis versus plain
manner: manner of articulation
manner_alt: alternative manner coding that separates approximants into liquids and glides
place: place of articulation
place_fct: place coded as a factor and ordered based on frontness of the articulators. labiovelar is recoded as NA.
sonorance: obstruent versus sonorant
sonorance_alt: obstruant versus sonorant versus strident.

Levels of the factor columns:

data_features_consonants |>
  lapply(levels) |>
  Filter(length, x = _)
#> $place_fct
#> [1] "labial"       "labiodental"  "dental"       "alveolar"     "postalveolar"
#> [6] "palatal"      "velar"        "glottal"

Considerations about consonant phonetic features

The CMU alphabet does not include a glottal stop.

Here /f,v/ are coded as strident following Wikipedia and Sound Pattern of English. If this feature value doesn't seem right, we should probably use an alternative feature of sibilant for the stridents minus /f,v/.

The alternative voicing scheme was suggested by a colleague because of how the voice-voiceless phonetic contrast is achieved with different articulatory strategies in different languages. Note that voicing_alt does not assign a feature to nasals or approximants.

Phonetic features of vowels

data_features_vowels is a dataframe with 17 rows and 11 variables.

knitr::kable(data_features_vowels)

phone

cmubet

wiscbet

hint

manner

manner_alt

tenseness

height

height_fct

height_alt

backness

backness_fct

rounding

beat

vowel

tense

high

front

unrounded

bit

vowel

lax

mid-high

high

front

unrounded

eɪ

bait

vowel

tense

mid-high

mid

front

unrounded

bet

vowel

lax

mid-low

mid

front

unrounded

bat

vowel

lax

low

front

unrounded

but

vowel

lax

mid-low

mid

central

unrounded

comma

vowel

lax

mid-low

mid

central

unrounded

boot

vowel

tense

high

back

rounded

book

vowel

lax

mid-high

high

back

rounded

oʊ

boat

vowel

tense

mid-high

mid

back

rounded

bought

vowel

tense

mid-low

low

back

rounded

bot

vowel

tense

low

back

unrounded

aʊ

bout

vowel

diphthong

aɪ

bite

vowel

diphthong

ɔɪ

boy

vowel

diphthong

letter

vowel

r-colored

mid-low

mid

central

r-colored

burt

vowel

r-colored

mid-low

mid

central

r-colored

Description of each column:

phone: phone in IPA
cmubet: phone in the CMU alphabet
wiscbet: phone in an older system used by our lab
hint: a word containing the selected vowel
manner: manner of articulation
manner_alt: alternative manner with vowel, diphthong and r-colored
tenseness: tense versus lax (versus diphthong and r-colored)
height: vowel height on a four-level scale
height_fct: height coded as a factor ordered high, mid-high, mid-low, low. diphthong is recoded to NA.
height_alt: vowel height on a three-level scale
backness: vowel backness
backness_fct: backness coded as a factor ordered front, central, back. diphthong is recoded to NA.
rounding: unrounded versus rounded (versus diphthong and r-colored)

Levels of the factor columns:

data_features_vowels |>
  lapply(levels) |>
  Filter(length, x = _)
#> $height_fct
#> [1] "high"     "mid-high" "mid-low"  "low"     
#> 
#> $backness_fct
#> [1] "front"   "central" "back"

Considerations about vowel features

I don't consider /eɪ/ and /oʊ/ to be diphthongs, but perhaps manner_alt could encode the difference of these vowels from the others.

In the CMU alphabet and ARPAbet, vowels can include a number to indicate vowel stress, so AH1 or AH2 is /ʌ/ but AH0 is /ə/.

The vowel features for General American English, according to Wikipedia, are as follows:

	Front		Central		Back
	lax	tense	lax	tense	lax	tense
Close	ɪ	i			ʊ	u
Mid	ɛ	eɪ	ə	(ɜ)		oʊ
Open	æ		(ʌ)	ɑ		(ɔ)
Diphthongs	aɪ ɔɪ aʊ

I adapted these features in this way:

A four-level breakdown of height (high, mid-high, mid-low, low) was used instead of a three-level one (close, mid, open).
tense and lax features were directly borrowed. Diphthongs and r-colored vowels are were not assign a tenseness.
/ɑ/ moved to back (following the general IPA)
diphthongs have no backness or height
r-colored vowels were given the backness and height of the /ʌ,ə/

Based on the assumption that /ʌ,ə/ are the same general vowel with differing stress, these vowels have the same features. This definition clashes with the general IPA chart which places /ʌ/ as a back vowel. However, /ʌ/ is a conventional notation. Quoting Wikipedia again: "Although the notation /ʌ/ is used for the vowel of STRUT in RP and General American, the actual pronunciation is closer to a near-open central vowel [ɐ] in RP and advanced back [ʌ̟] in General American." That is, /ʌ/ is fronted in American English (hence, mid) in American English.

Rename file basenames using functions

Description

file_replace_name() uses stringr::str_replace() to rename files. file_rename_with() allows you to rename files with a generic string-transforming function.

Usage

file_replace_name(
  path,
  pattern,
  replacement,
  .dry_run = FALSE,
  .overwrite = FALSE
)

file_rename_with(path, .fn, ..., .dry_run = FALSE, .overwrite = FALSE)
file_replace_name(
  path,
  pattern,
  replacement,
  .dry_run = FALSE,
  .overwrite = FALSE
)

file_rename_with(path, .fn, ..., .dry_run = FALSE, .overwrite = FALSE)

Arguments

`path`	vector of paths for files to rename
`pattern`, `replacement`	arguments forwarded to `stringr::str_replace()`
`.dry_run`	when `FALSE` (the default), files are renamed. When `TRUE`, no files are renamed but the affected files are printed out.
`.overwrite`	Whether to overwrite files. Defaults to `FALSE` so that overwriting files is opt-in.
`.fn`	function to call file paths
`...`	arguments passed onto `.fn`

Details

Only the basename of the file (returned by basename() undergoes string replacement).

Value

the contents of paths with updated file names. Duplicated elements are removed. This function throws an error if a name collision is detected (where two files are both renamed into the same target path).

Examples

# With .dry_run = TRUE, we can make up some file paths.
dir <- "//some-fake-location/"
path <- file.path(
  dir,
  c("report_1.csv", "report_2.csv", "report-1.csv", "skipped.csv")
)

updated <- file_replace_name(path, "report_", "report-", .dry_run = TRUE)

# Collisions are detected
updated <- file_replace_name(path, "report_\\d", "report-1", .dry_run = TRUE)
# With .dry_run = TRUE, we can make up some file paths.
dir <- "//some-fake-location/"
path <- file.path(
  dir,
  c("report_1.csv", "report_2.csv", "report-1.csv", "skipped.csv")
)

updated <- file_replace_name(path, "report_", "report-", .dry_run = TRUE)

# Collisions are detected
updated <- file_replace_name(path, "report_\\d", "report-1", .dry_run = TRUE)

Fit a beta regression model (for intelligibility)

Description

The function fits the same type of GAMLSS model as used in Hustad and colleagues (2021): A beta regression model (via gamlss.dist::BE()) with natural cubic splines on the mean (mu) and scale (sigma). This model is fitted using this package's mem_gamlss() wrapper function.

Usage

fit_beta_gamlss(data, var_x, var_y, df_mu = 3, df_sigma = 2, control = NULL)

fit_beta_gamlss_se(
  data,
  name_x,
  name_y,
  df_mu = 3,
  df_sigma = 2,
  control = NULL
)

predict_beta_gamlss(newdata, model, centiles = c(5, 10, 50, 90, 95))

optimize_beta_gamlss_slope(
  model,
  centiles = 50,
  interval = c(30, 119),
  maximum = TRUE
)

uniroot_beta_gamlss(model, centiles = 50, targets = 0.5, interval = c(30, 119))
fit_beta_gamlss(data, var_x, var_y, df_mu = 3, df_sigma = 2, control = NULL)

fit_beta_gamlss_se(
  data,
  name_x,
  name_y,
  df_mu = 3,
  df_sigma = 2,
  control = NULL
)

predict_beta_gamlss(newdata, model, centiles = c(5, 10, 50, 90, 95))

optimize_beta_gamlss_slope(
  model,
  centiles = 50,
  interval = c(30, 119),
  maximum = TRUE
)

uniroot_beta_gamlss(model, centiles = 50, targets = 0.5, interval = c(30, 119))

Arguments

`data`	a data frame
`var_x`, `var_y`	(unquoted) variable names giving the predictor variable (e.g., `age`) and outcome variable (.e.g, `intelligibility`).
`df_mu`, `df_sigma`	degrees of freedom. If `0` is used, the `splines::ns()` term is dropped from the model formula for the parameter.
`control`	a `gamlss::gamlss.control()` controller. Defaults to `NULL` which uses default settings, except for setting trace to `FALSE` to silence the output from gamlss.
`name_x`, `name_y`	quoted variable names giving the predictor variable (e.g., `"age"`) and outcome variable (.e.g, `"intelligibility"`). These arguments apply to `fit_beta_gamlss_se()`.
`newdata`	a one-column dataframe for predictions
`model`	a model fitted by `fit_beta_gamlss()`
`centiles`	centiles to use for prediction. Defaults to `c(5, 10, 50, 90, 95)` for `predict_beta_gamlss()`. Defaults to `50` for `optimize_beta_gamlss_slope()` and `uniroot_beta_gamlss()`, although both of these functions support multiple centile values.
`interval`	for `optimize_beta_gamlss_slope()`, the range of `x` values to optimize over. For `uniroot_beta_gamlss()`, the range of `x` values to search for roots (target y values) in.
`maximum`	for `optimize_beta_gamlss_slope()`, whether to find the maximum slope (`TRUE`) or minimum slope (`FALSE`).
`targets`	for `uniroot_beta_gamlss()`, the target y values to use as roots. By default, .5 is used, so that `uniroot_beta_gamlss()` returns the `x` value where the y value is .5. Multiple targets are supported.

Details

There are two versions of this function. The main version is fit_beta_gamlss(), and it works with unquoted column names (e.g., age). The alternative version is fit_beta_gamlss_se(); the final "se" stands for "Standard Evaluation". This designation means that the variable names must be given as strings (so, the quoted "age" instead of bare name age). This alternative version is necessary when we fit several models using parallel computing with furrr::future_map() (as when using bootstrap resampling).

predict_centiles() will work with this function, but it will likely throw a warning message. Therefore, predict_beta_gamlss() provides an alternative way to compute centiles from the model. This function manually computes the centiles instead of relying on gamlss::centiles(). The main difference is that new x values go through splines::predict.ns() and then these are multiplied by model coefficients.

optimize_beta_gamlss_slope() computes the point (i.e., age) and rate of steepest growth for different quantiles. This function wraps over the following process:

an internal prediction function computes a quantile at some x from model coefficients and spline bases.
another internal function uses numDeriv::grad() to get the gradient of this prediction function for x.
optimize_beta_gamlss_slope() uses stats::optimize() on the gradient function to find the x with the maximum or minimum slope.

uniroot_beta_gamlss() also uses this internal prediction function to find when a quantile growth curve crosses a given value. stats::uniroot() finds where a function crosses 0 (a root). If we modify our prediction function to always subtract .5 at the end, then the root for this prediction function would be the x value where the predicted value crosses .5. That's the idea behind how uniroot_beta_gamlss() works. In our work, we would use this approach to find, say, the age (root) when children in the 10th percentile (centiles) cross 50% intelligibility (targets).

GAMLSS does beta regression differently

This part is a brief note that GAMLSS uses a different parameterization of the beta distribution for its beta family than other packages.

The canonical parameterization of the beta distribution uses shape parameters $\alpha$ and $\beta$ and the probability density function:

$f(y;\alpha,\beta) = \frac{1}{B(\alpha,\beta)} y^{\alpha-1}(1-y)^{\beta-1}$

where $B$ is the beta function.

For beta regression, the distribution is reparameterized so that there is a mean probability $\mu$ and some other parameter that represents the spread around that mean. In GAMLSS (gamlss.dist::BE()), they use a scale parameter $\sigma$ (larger values mean more spread around mean). Everywhere else—betareg::betareg() and rstanarm::stan_betareg() in vignette("betareg", "betareg"), brms::Beta() in vignette("brms_families", "brms"), mgcv::betar()—it's a precision parameter $\phi$ (larger values mean more precision, less spread around mean). Here is a comparison:

$\text{betareg, brms, mgcv, etc.} \\ \mu = \alpha / (\alpha + \beta) \\ \phi = \alpha + b \\ \textsf{E}(y) = \mu \\ \textsf{VAR}(y) = \mu(1-\mu)/(1 + \phi) \\$

$\text{GAMLSS} \\ \mu = \alpha / (\alpha + \beta) \\ \sigma = (1 / (\alpha + \beta + 1))^.5 \\ \textsf{E}(y) = \mu \\ \textsf{VAR}(y) = \mu(1-\mu)\sigma^2$

Value

for fit_beta_gamlss() and fit_beta_gamlss_se(), a mem_gamlss()-fitted model. The .user data in the model includes degrees of freedom for each parameter and the splines::ns() basis for each parameter. For predict_beta_gamlss(), a dataframe containing the model predictions for mu and sigma, plus columns for each centile in centiles. For optimize_beta_gamlss_slope(), a dataframe with the optimized x values (maximum or minimum), the gradient at that x value (objective), and the quantile (quantile). For uniroot_beta_gamlss(), a dataframe one row per quantile/target combination with the results of calling stats::uniroot(). The root column is the x value where the quantile curve crosses the target value.

Source

Associated article: https://doi.org/10.1044/2021_JSLHR-21-00142

Examples

data_fake_intelligibility

m <- fit_beta_gamlss(
  data_fake_intelligibility,
  age_months,
  intelligibility
)

# using "qr" in summary() just to suppress a warning message
summary(m, type = "qr")

# Alternative interface
d <- data_fake_intelligibility
m2 <- fit_beta_gamlss_se(
  data = d,
  name_x = "age_months",
  name_y = "intelligibility"
)
coef(m2) == coef(m)

# how to use control to change gamlss() behavior
m_traced <- fit_beta_gamlss(
  data_fake_intelligibility,
  age_months,
  intelligibility,
  control = gamlss::gamlss.control(n.cyc = 15, trace = TRUE)
)

# The `.user` space includes the spline bases, so that we can make accurate
# predictions of new xs.
names(m$.user)

# predict logit(mean) at 55 months:
logit_mean_55 <- cbind(1, predict(m$.user$basis_mu, 55)) %*% coef(m)
logit_mean_55
stats::plogis(logit_mean_55)

# But predict_gen_gamma_gamlss() does this work for us and also provides
# centiles
new_ages <- data.frame(age_months = 48:71)
centiles <- predict_beta_gamlss(new_ages, m)
centiles

# Confirm that the manual prediction matches the automatic one
centiles[centiles$age_months == 55, "mu"]
stats::plogis(logit_mean_55)

if(requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  ggplot(pivot_centiles_longer(centiles)) +
    aes(x = age_months, y = .value) +
    geom_line(aes(group = .centile, color = .centile_pair)) +
    geom_point(
      aes(y = intelligibility),
      data = subset(
        data_fake_intelligibility,
        48 <= age_months & age_months <= 71
      )
    )
}

# Age of steepest growth for each centile
optimize_beta_gamlss_slope(
  model = m,
  centiles = c(5, 10, 50, 90),
  interval = range(data_fake_intelligibility$age_months)
)

# Manual approach: Make fine grid of predictions and find largest jump
centiles_grid <- predict_beta_gamlss(
  newdata = data.frame(age_months = seq(28, 95, length.out = 1000)),
  model = m
)
centiles_grid[which.max(diff(centiles_grid$c5)), "age_months"]

# When do children in different centiles reach 50%, 70% intelligibility?
uniroot_beta_gamlss(
  model = m,
  centiles = c(5, 10, 50),
  targets = c(.5, .7)
)
data_fake_intelligibility

m <- fit_beta_gamlss(
  data_fake_intelligibility,
  age_months,
  intelligibility
)

# using "qr" in summary() just to suppress a warning message
summary(m, type = "qr")

# Alternative interface
d <- data_fake_intelligibility
m2 <- fit_beta_gamlss_se(
  data = d,
  name_x = "age_months",
  name_y = "intelligibility"
)
coef(m2) == coef(m)

# how to use control to change gamlss() behavior
m_traced <- fit_beta_gamlss(
  data_fake_intelligibility,
  age_months,
  intelligibility,
  control = gamlss::gamlss.control(n.cyc = 15, trace = TRUE)
)

# The `.user` space includes the spline bases, so that we can make accurate
# predictions of new xs.
names(m$.user)

# predict logit(mean) at 55 months:
logit_mean_55 <- cbind(1, predict(m$.user$basis_mu, 55)) %*% coef(m)
logit_mean_55
stats::plogis(logit_mean_55)

# But predict_gen_gamma_gamlss() does this work for us and also provides
# centiles
new_ages <- data.frame(age_months = 48:71)
centiles <- predict_beta_gamlss(new_ages, m)
centiles

# Confirm that the manual prediction matches the automatic one
centiles[centiles$age_months == 55, "mu"]
stats::plogis(logit_mean_55)

if(requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  ggplot(pivot_centiles_longer(centiles)) +
    aes(x = age_months, y = .value) +
    geom_line(aes(group = .centile, color = .centile_pair)) +
    geom_point(
      aes(y = intelligibility),
      data = subset(
        data_fake_intelligibility,
        48 <= age_months & age_months <= 71
      )
    )
}

# Age of steepest growth for each centile
optimize_beta_gamlss_slope(
  model = m,
  centiles = c(5, 10, 50, 90),
  interval = range(data_fake_intelligibility$age_months)
)

# Manual approach: Make fine grid of predictions and find largest jump
centiles_grid <- predict_beta_gamlss(
  newdata = data.frame(age_months = seq(28, 95, length.out = 1000)),
  model = m
)
centiles_grid[which.max(diff(centiles_grid$c5)), "age_months"]

# When do children in different centiles reach 50%, 70% intelligibility?
uniroot_beta_gamlss(
  model = m,
  centiles = c(5, 10, 50),
  targets = c(.5, .7)
)

Fit a generalized gamma regression model (for speaking rate)

Description

The function fits the same type of GAMLSS model as used in Mahr and colleagues (2021): A generalized gamma regression model (via gamlss.dist::GG()) with natural cubic splines on the mean (mu), scale (sigma), and shape (nu) of the distribution. This model is fitted using this package's mem_gamlss() wrapper function.

Usage

fit_gen_gamma_gamlss(
  data,
  var_x,
  var_y,
  df_mu = 3,
  df_sigma = 2,
  df_nu = 1,
  control = NULL
)

fit_gen_gamma_gamlss_se(
  data,
  name_x,
  name_y,
  df_mu = 3,
  df_sigma = 2,
  df_nu = 1,
  control = NULL
)

predict_gen_gamma_gamlss(newdata, model, centiles = c(5, 10, 50, 90, 95))
fit_gen_gamma_gamlss(
  data,
  var_x,
  var_y,
  df_mu = 3,
  df_sigma = 2,
  df_nu = 1,
  control = NULL
)

fit_gen_gamma_gamlss_se(
  data,
  name_x,
  name_y,
  df_mu = 3,
  df_sigma = 2,
  df_nu = 1,
  control = NULL
)

predict_gen_gamma_gamlss(newdata, model, centiles = c(5, 10, 50, 90, 95))

Arguments

`data`	a data frame
`var_x`, `var_y`	(unquoted) variable names giving the predictor variable (e.g., `age`) and outcome variable (.e.g, `rate`).
`df_mu`, `df_sigma`, `df_nu`	degrees of freedom. If `0` is used, the `splines::ns()` term is dropped from the model formula for the parameter.
`control`	a `gamlss::gamlss.control()` controller. Defaults to `NULL` which uses default settings, except for setting trace to `FALSE` to silence the output from gamlss.
`name_x`, `name_y`	quoted variable names giving the predictor variable (e.g., `"age"`) and outcome variable (.e.g, `"rate"`). These arguments apply to `fit_gen_gamma_gamlss_se()`.
`newdata`	a one-column dataframe for predictions
`model`	a model fitted by `fit_gen_gamma_gamlss()`
`centiles`	centiles to use for prediction. Defaults to `c(5, 10, 50, 90, 95)`.

Details

There are two versions of this function. The main version is fit_gen_gamma_gamlss(), and it works with unquoted column names (e.g., age). The alternative version is fit_gen_gamma_gamlss_se(); the final "se" stands for "Standard Evaluation". This designation means that the variable names must be given as strings (so, the quoted "age" instead of bare name age). This alternative version is necessary when we fit several models using parallel computing with furrr::future_map() (as when using bootstrap resampling).

predict_centiles() will work with this function, but it will likely throw a warning message. Therefore, predict_gen_gamma_gamlss() provides an alternative way to compute centiles from the model. This function manually computes the centiles instead of relying on gamlss::centiles(). The main difference is that new x values go through splines::predict.ns() and then these are multiplied by model coefficients.

Value

for fit_gen_gamma_gamlss() and fit_gen_gamma_gamlss_se(), a mem_gamlss()-fitted model. The .user data in the model includes degrees of freedom for each parameter and the splines::ns() basis for each parameter. For predict_gen_gamma_gamlss(), a dataframe containing the model predictions for mu, sigma, and nu, plus columns for each centile in centiles.

Source

Associated article: https://doi.org/10.1044/2021_JSLHR-21-00206

Examples

data_fake_rates

m <- fit_gen_gamma_gamlss(data_fake_rates, age_months, speaking_sps)

# using "qr" in summary() just to suppress a warning message
summary(m, type = "qr")

# Alternative interface
d <- data_fake_rates
m2 <- fit_gen_gamma_gamlss_se(
  data = d,
  name_x = "age_months",
  name_y = "speaking_sps"
)
coef(m2) == coef(m)

# how to use control to change gamlss() behavior
m_traced <- fit_gen_gamma_gamlss(
  data_fake_rates,
  age_months,
  speaking_sps,
  control = gamlss::gamlss.control(n.cyc = 15, trace = TRUE)
)

# The `.user` space includes the spline bases, so that we can make accurate
# predictions of new xs.
names(m$.user)

# predict log(mean) at 55 months:
log_mean_55 <- cbind(1, predict(m$.user$basis_mu, 55)) %*% coef(m)
log_mean_55
exp(log_mean_55)

# But predict_gen_gamma_gamlss() does this work for us and also provides
# centiles
new_ages <- data.frame(age_months = 48:71)
centiles <- predict_gen_gamma_gamlss(new_ages, m)
centiles

# Confirm that the manual prediction matches the automatic one
centiles[centiles$age_months == 55, "mu"]
exp(log_mean_55)

if(requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  ggplot(pivot_centiles_longer(centiles)) +
    aes(x = age_months, y = .value) +
    geom_line(aes(group = .centile, color = .centile_pair)) +
    geom_point(
      aes(y = speaking_sps),
      data = subset(
        data_fake_rates,
        48 <= age_months & age_months <= 71
      )
    )
}

# Example of 0-df splines
m <- fit_gen_gamma_gamlss(
  data_fake_rates,
  age_months,
  speaking_sps,
  df_mu = 0,
  df_sigma = 2,
  df_nu = 0
)
coef(m, what = "mu")
coef(m, what = "sigma")
coef(m, what = "nu")

# mu and nu fixed, c50 mostly locked in
predict_gen_gamma_gamlss(new_ages, m)[c(1, 9, 17, 24), ]
data_fake_rates

m <- fit_gen_gamma_gamlss(data_fake_rates, age_months, speaking_sps)

# using "qr" in summary() just to suppress a warning message
summary(m, type = "qr")

# Alternative interface
d <- data_fake_rates
m2 <- fit_gen_gamma_gamlss_se(
  data = d,
  name_x = "age_months",
  name_y = "speaking_sps"
)
coef(m2) == coef(m)

# how to use control to change gamlss() behavior
m_traced <- fit_gen_gamma_gamlss(
  data_fake_rates,
  age_months,
  speaking_sps,
  control = gamlss::gamlss.control(n.cyc = 15, trace = TRUE)
)

# The `.user` space includes the spline bases, so that we can make accurate
# predictions of new xs.
names(m$.user)

# predict log(mean) at 55 months:
log_mean_55 <- cbind(1, predict(m$.user$basis_mu, 55)) %*% coef(m)
log_mean_55
exp(log_mean_55)

# But predict_gen_gamma_gamlss() does this work for us and also provides
# centiles
new_ages <- data.frame(age_months = 48:71)
centiles <- predict_gen_gamma_gamlss(new_ages, m)
centiles

# Confirm that the manual prediction matches the automatic one
centiles[centiles$age_months == 55, "mu"]
exp(log_mean_55)

if(requireNamespace("ggplot2", quietly = TRUE)) {
  library(ggplot2)
  ggplot(pivot_centiles_longer(centiles)) +
    aes(x = age_months, y = .value) +
    geom_line(aes(group = .centile, color = .centile_pair)) +
    geom_point(
      aes(y = speaking_sps),
      data = subset(
        data_fake_rates,
        48 <= age_months & age_months <= 71
      )
    )
}

# Example of 0-df splines
m <- fit_gen_gamma_gamlss(
  data_fake_rates,
  age_months,
  speaking_sps,
  df_mu = 0,
  df_sigma = 2,
  df_nu = 0
)
coef(m, what = "mu")
coef(m, what = "sigma")
coef(m, what = "nu")

# mu and nu fixed, c50 mostly locked in
predict_gen_gamma_gamlss(new_ages, m)[c(1, 9, 17, 24), ]

Run (scaled) k-means on a dataset.

Description

Observations are scale()-ed before clustering.

Usage

fit_kmeans(data, k, vars, args_kmeans = list())
fit_kmeans(data, k, vars, args_kmeans = list())

Arguments

`data`	a dataframe
`k`	number of clusters to create
`vars`	variable selection for clustering. Select multiple variables with `c()`, e.g., `c(x, y)`. The selection supports tidyselect semantics tidyselect::select_helpers, e.g., `⁠c(x, starts_with("mean_")⁠`.
`args_kmeans`	additional arguments passed to `stats::kmeans()`.

Details

Note that each variable is scaled() before clustering and then cluster means are unscaled to match the original data scale.

This function provides the original kmeans labels as .kmeans_cluster but other alternative labeling based on different sortings of the data. These are provided in order to deal with label-swapping in Bayesian models. See bootstrapping example below.

Value

the original data but augmented with additional columns for clustering details. including .kmeans_cluster (cluster number of each observation, as a factor) and .kmeans_k (selected number of clusters).

Cluster-level information is also included. For example, suppose that we cluster using the variable x. Then the output will have a column .kmeans_x giving the cluster mean for x and .kmeans_rank_x giving the cluster labels reordered using the cluster means for x. The column .kmeans_sort contains the cluster sorted using the first principal component of the scaled variables. All columns of cluster indices are a factor() so that they can be plotted as discrete variables.

Examples

data_kmeans <- fit_kmeans(mtcars, 3, c(mpg, wt, hp))

library(ggplot2)
ggplot(data_kmeans) +
  aes(x = wt, y = mpg) +
  geom_point(aes(color = .kmeans_cluster))

ggplot(data_kmeans) +
  aes(x = wt, y = mpg) +
  geom_point(aes(color = .kmeans_rank_wt))

# Example of label swapping
set.seed(123)
data_boots <- lapply(
  1:10,
  function(x) {
    rows <- sample(seq_len(nrow(mtcars)), replace = TRUE)
    data <- mtcars[rows, ]
    data$.bootstrap <- x
    data
  }
) |>
  lapply(fit_kmeans, k = 3, c(mpg, wt, hp)) |>
  dplyr::bind_rows() |>
  dplyr::select(.bootstrap, dplyr::starts_with(".kmeans_")) |>
  dplyr::distinct()

# Clusters start off in random locations and move to center, so the labels
# differ between model runs and across bootstraps.
ggplot(data_boots) +
  aes(x = .kmeans_wt, y = .kmeans_mpg) +
  geom_point(aes(color = .kmeans_cluster)) +
  labs(title = "k-means centers on 10 bootstraps")

# Labels sorted using first principal component
# so the labels are more consistent.
ggplot(data_boots) +
  aes(x = .kmeans_wt, y = .kmeans_mpg) +
  geom_point(aes(color = .kmeans_sort)) +
  labs(title = "k-means centers on 10 bootstraps")
data_kmeans <- fit_kmeans(mtcars, 3, c(mpg, wt, hp))

library(ggplot2)
ggplot(data_kmeans) +
  aes(x = wt, y = mpg) +
  geom_point(aes(color = .kmeans_cluster))

ggplot(data_kmeans) +
  aes(x = wt, y = mpg) +
  geom_point(aes(color = .kmeans_rank_wt))

# Example of label swapping
set.seed(123)
data_boots <- lapply(
  1:10,
  function(x) {
    rows <- sample(seq_len(nrow(mtcars)), replace = TRUE)
    data <- mtcars[rows, ]
    data$.bootstrap <- x
    data
  }
) |>
  lapply(fit_kmeans, k = 3, c(mpg, wt, hp)) |>
  dplyr::bind_rows() |>
  dplyr::select(.bootstrap, dplyr::starts_with(".kmeans_")) |>
  dplyr::distinct()

# Clusters start off in random locations and move to center, so the labels
# differ between model runs and across bootstraps.
ggplot(data_boots) +
  aes(x = .kmeans_wt, y = .kmeans_mpg) +
  geom_point(aes(color = .kmeans_cluster)) +
  labs(title = "k-means centers on 10 bootstraps")

# Labels sorted using first principal component
# so the labels are more consistent.
ggplot(data_boots) +
  aes(x = .kmeans_wt, y = .kmeans_mpg) +
  geom_point(aes(color = .kmeans_sort)) +
  labs(title = "k-means centers on 10 bootstraps")

Convert age in months to years;months

Description

Convert age in months to years;months

Usage

format_year_month_age(x)
format_year_month_age(x)

Arguments

`x`	a vector ages in months

Details

Ages of NA return "NA;NA".

This format by default is not numerically ordered. This means that c("2;0", "10;10", "10;9") would sort as c("10;10", "10;9", "2;0"). The function stringr::str_sort(..., numeric = TRUE) will sort this vector correctly.

Value

ages in the years;months format

Examples

ages <- c(26, 58, 25, 67, 21, 59, 36, 43, 27, 49)
format_year_month_age(ages)
ages <- c(26, 58, 25, 67, 21, 59, 36, 43, 27, 49)
format_year_month_age(ages)

Staged imputation

Description

Impute missing data at different utterance lengths using successive linear models.

Usage

impute_values_by_length(
  data,
  var_y,
  var_length,
  id_cols = NULL,
  include_max_length = FALSE,
  data_train = NULL
)
impute_values_by_length(
  data,
  var_y,
  var_length,
  id_cols = NULL,
  include_max_length = FALSE,
  data_train = NULL
)

Arguments

`data`	dataframe in which to impute missing value
`var_y`	bare name of the response variable for imputation
`var_length`	bare name of the length variable
`id_cols`	a selection of variable names that uniquely identify each group of related observations. For example, `c(child_id, age_months)`.
`include_max_length`	whether to use the maximum length value as a predictor in the imputation models. Defaults to `FALSE`.
`data_train`	(optional) dataframe used to train the imputation models. For example, we might have data from a reference group of children in `data_train` but a clinical population in `data`. If omitted, the dataframe in `data` is used to train the models. `data_train` can also be a function. In this case, it is applied to the `data` argument in order to derive (filter) a subset of the data for training.

Value

a dataframe with the additional columns ⁠{var_y}_imputed⁠ (the imputed value), ⁠.max_{var_length}⁠ with the highest value of var_length with observed data, and ⁠{var_y}_imputation⁠ for labeling whether observations were "imputed" or "observed".

Background

In Hustad and colleagues (2020), we modeled intelligibility data in young children's speech. Children would hear an utterance and then they would repeat it. The utterances started at 2 words in length, then increased to 3 words in length, and so on in batches of 10 sentences, all the way to 7 words in length. There was a problem, however: Not all of the children could produce utterances at every length. Specifically, if a child could not reliably produced 5 utterances of a given length length, the task was halted. So given the nature of the task, if a child had produced 5-word utterances, they also produced 2–4-word utterances as well.

The length of the utterance probably influenced the outcome variable: Longer utterances have more words that might help a listener understand the sentence, for example. Therefore, it did not seem appropriate to ignore the missing values. We used the following two-step procedure (see the Supplemental Materials for more detail):

Other notes

Remark about data and data_train: One might ask, why shouldn't some children help train the data imputation models? Let's consider a norm-referenced standardized testing scenario: We have a new participant (observations in data), and we want to know how they compare to their age peers (participants in data_train). By separating out data_train and fixing it to a reference group, we can apply the same adjustment/imputation procedure to all new participants.

References

Hustad, K. C., Mahr, T., Natzke, P. E. M., & J. Rathouz, P. (2020). Supplemental Material S1 (Hustad et al., 2020). ASHA journals. https://doi.org/10.23641/asha.12330956.v1

Examples

set.seed(1)
fake_data <- tibble::tibble(
  child = c(
    "a", "a", "a", "a", "a",
    "b", "b", "b", "b", "b",
    "c", "c", "c", "c", "c",
    "e", "e", "e", "e", "e",
    "f", "f", "f", "f", "f",
    "g", "g", "g", "g", "g",
    "h", "h", "h", "h", "h",
    "i", "i", "i", "i", "i"
  ),
  level = c(1:5, 1:5, 1:5, 1:5, 1:5, 1:5, 1:5, 1:5),
  x = c(
    c(100, 110, 120, 130, 150) + c(-8, -5, 0, NA, NA),
    c(100, 110, 120, 130, 150) + c(6, 6, 4, NA, NA),
    c(100, 110, 120, 130, 150) + c(-5, -5, -2, 2, NA),
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6
  )
)
data_imputed <- impute_values_by_length(
  fake_data,
  x,
  level,
  id_cols = c(child),
  include_max_length = FALSE
)

if (requireNamespace("ggplot2")) {
  library(ggplot2)
  ggplot(data_imputed) +
    aes(x = level, y = x_imputed) +
    geom_line(aes(group = child)) +
    geom_point(aes(color = x_imputation))
}
set.seed(1)
fake_data <- tibble::tibble(
  child = c(
    "a", "a", "a", "a", "a",
    "b", "b", "b", "b", "b",
    "c", "c", "c", "c", "c",
    "e", "e", "e", "e", "e",
    "f", "f", "f", "f", "f",
    "g", "g", "g", "g", "g",
    "h", "h", "h", "h", "h",
    "i", "i", "i", "i", "i"
  ),
  level = c(1:5, 1:5, 1:5, 1:5, 1:5, 1:5, 1:5, 1:5),
  x = c(
    c(100, 110, 120, 130, 150) + c(-8, -5, 0, NA, NA),
    c(100, 110, 120, 130, 150) + c(6, 6, 4, NA, NA),
    c(100, 110, 120, 130, 150) + c(-5, -5, -2, 2, NA),
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6,
    c(100, 110, 120, 130, 150) + rbinom(5, 12, .5) - 6
  )
)
data_imputed <- impute_values_by_length(
  fake_data,
  x,
  level,
  id_cols = c(child),
  include_max_length = FALSE
)

if (requireNamespace("ggplot2")) {
  library(ggplot2)
  ggplot(data_imputed) +
    aes(x = level, y = x_imputed) +
    geom_line(aes(group = child)) +
    geom_point(aes(color = x_imputation))
}

Compute entropy and related measures

Description

Compute entropy and related measures

Usage

info_surprisal(x, base = NULL)

info_entropy(p, base = NULL)

info_cross_entropy(p, q, base = NULL)

info_kl_divergence(p, q, base = NULL)

info_kl_divergence_matrix(mat, base = NULL)
info_surprisal(x, base = NULL)

info_entropy(p, base = NULL)

info_cross_entropy(p, q, base = NULL)

info_kl_divergence(p, q, base = NULL)

info_kl_divergence_matrix(mat, base = NULL)

Arguments

`x`	a vector of probabilities
`base`	the base of the logarithm. Defaults to e (`exp(1)`).
`p`, `q`	a probability distribution (vector of probabilities that sum to 1)
`mat`	a matrix where each row is a probability distribution

Value

the entropy

Information theory basics

Given a probability x, the surprisal value (or information content) of x is the log of the inverse probability, log(1 / x). Rare events (smaller probabilities) have larger surprisal values than more common events. The word "surprise" here conveys how unexpected or informative an event is. (The idea that surprises are "informative" or contain information makes sense with the intuition that there isn't much to be learned from predictable events.) Because of how division works with logarithms, log(1 / x) simplifies so that info_surprisal(x) is the negative log probability,-log(x). The units of the information content depends on the base of the logarithm. Our functions by default use the natural log() which provides information in nats, but it is also common to see log2()-based surprisals which provide information in bits.

Given a probability distribution p—a vector of probabilities that sum to 1—the entropy of the distribution is the probability-weighted average of the surprisal values. A weighted average is sum(weights * values) / sum(weights), but because the weights here are probabilities that sum to 1, the info_entropy(p) is sum(p * info_surprisal(p)). Entropy can be interpreted as a measure of uncertainty in the distribution; it's the expected surprisal value in the distribution.

In info_entropy(p) the surprisal values and their weights came from the same probability distribution. But this doesn't need to be the case. Suppose that there are ground-truth probabilities p and there are also estimated probabilities q. info_entropy(q) computes a weighted average where events with surprisal info_surprisal(q) have frequencies of q. But if we knew the true frequencies, the surprisals in q would occur with frequency p, so their weighted average should be sum(p * info_surprisal(q)). This value is the cross entropy of q with respect to p, and info_cross_entropy(p, q) implements this function. Cross entropy is commonly used as a loss function in machine learning.

Gibb's inequality says that info_entropy(p) <= info_cross_entropy(p, q). Unless p and q are the same distribution, there will always be some excess uncertainty or surprise in the cross entropy compared to the entropy: ⁠info_cross_entropy(p, q) = info_entropy(p) + *excess*⁠. This excess is the Kullback-Liebler divergence (KL divergence or relative entropy). Due to the properties of logarithms, info_kl_divergence(p, q) simplifies to sum (p * log(p / q)). KL divergence is an important quantity for comparing probability distributions. For example, each row in a confusion matrix is a probability distribution, and KL divergence can identify which rows are most similar. info_kl_divergence_matrix(mat) computes a distance matrix using on each pair of rows in mat using KL divergence.

Examples

wikipedia_example <- rbind(
  p = c(9, 12, 4) / 25,
  q = c(1,  1, 1) / 3
)

info_kl_divergence_matrix(wikipedia_example)
wikipedia_example <- rbind(
  p = c(9, 12, 4) / 25,
  q = c(1,  1, 1) / 3
)

info_kl_divergence_matrix(wikipedia_example)

Join data onto resampled IDs

Description

Join data onto resampled IDs

Usage

join_to_split(x, y, by, validate = FALSE)
join_to_split(x, y, by, validate = FALSE)

Arguments

`x`	an rset object created by `rsample::bootstraps()`
`y`	y dataframe with a column of the id values which was resampled to create `x`
`by`	the name of column in y with the data
`validate`	whether to validate the join by counting the number of rows associated with each id. Defaults to `FALSE`.

Value

the original rset object with its x$data updated to join with y and with the row numbers x$in_id updated to work on the expanded dataset.

Examples

library(dplyr)
data_trees <- tibble::as_tibble(datasets::Orange)

data_tree_ids <- distinct(data_trees, Tree)

# Resample ids
data_bootstraps <- data_tree_ids %>%
  rsample::bootstraps(times = 20) %>%
  rename(splits_id = splits) %>%
  # Attach data to resampled ids
  mutate(
    data_splits = splits_id %>% purrr::map(
      join_to_split,
      data_trees,
      by = "Tree",
      validate = TRUE
    )
  )

data_bootstraps
library(dplyr)
data_trees <- tibble::as_tibble(datasets::Orange)

data_tree_ids <- distinct(data_trees, Tree)

# Resample ids
data_bootstraps <- data_tree_ids %>%
  rsample::bootstraps(times = 20) %>%
  rename(splits_id = splits) %>%
  # Attach data to resampled ids
  mutate(
    data_splits = splits_id %>% purrr::map(
      join_to_split,
      data_trees,
      by = "Tree",
      validate = TRUE
    )
  )

data_bootstraps

Compute the mean of logit-normal distribution(s)

Description

This function is a wrapper around logitnorm::momentsLogitnorm().

Usage

logitnorm_mean(mu, sigma)
logitnorm_mean(mu, sigma)

Arguments

`mu`	mean(s) on the logit scale
`sigma`	standard deviation(s) on the logit scale

Value

the means of the distributions

Examples


x <- logitnorm_mean(2, 1)
x

# compare to simulation
set.seed(100)
rnorm(1000, 2, 1) |> plogis() |> mean()
x <- logitnorm_mean(2, 1)
x

# compare to simulation
set.seed(100)
rnorm(1000, 2, 1) |> plogis() |> mean()

Fit a gamlss model but store user data

Description

Think of it as a gamlss model with memories (mem. gamlss).

Usage

mem_gamlss(...)
mem_gamlss(...)

Arguments

...

arguments passed to gamlss::gamlss()

Value

the fitted model object but updated to include user information in model$.user. Includes the dataset used to fit the model model$.user$data, the session info model$.user$session_info and the call used to fit the model model$.user$call. model$call is updated to match

Predict and tidy centiles from a GAMLSS model

Description

gamlss has trouble doing predictions without the original training data.

Usage

predict_centiles(newdata, model, centiles = c(5, 10, 50, 90, 95), ...)

pivot_centiles_longer(data)
predict_centiles(newdata, model, centiles = c(5, 10, 50, 90, 95), ...)

pivot_centiles_longer(data)

Arguments

`newdata`	a one-column dataframe for predictions
`model`	a gamlss model prepared by `mem_gamlss()`
`centiles`	centiles to use for prediction. Defaults to `c(5, 10, 50, 90, 95)`.
`...`	arguments passed to `gamlss::centiles.pred()`
`data`	centile predictions to reshape for `pivot_centiles_longer()`

Value

a tibble with fitted centiles for predict_centiles() and a long-format tibble with one centile value per row in pivot_centiles_longer()

Extract the TOCS details from a string (usually a filename)

Description

Extract the TOCS details from a string (usually a filename)

Usage

tocs_item(xs)

tocs_type(xs)

tocs_length(xs)
tocs_item(xs)

tocs_type(xs)

tocs_length(xs)

Arguments

`xs`	a character vector

Value

tocs_item() returns the substring with the TOCS item, tocs_type() returns whether the item is "single-word" or "multiword", and tocs_length() returns the length of the TOCS item (i.e., the number of words).

Examples

x <- c(
  "XXv16s7T06.lab", "XXv15s5T06.TextGrid", "XXv13s3T10.WAV",
  "XXv18wT11.wav", "non-matching", "s2T01",
  "XXv01s4B01.wav", "XXv01wB01.wav"
)
data.frame(
  x = x,
  item = tocs_item(x),
  type = tocs_type(x),
  length = tocs_length(x)
)
x <- c(
  "XXv16s7T06.lab", "XXv15s5T06.TextGrid", "XXv13s3T10.WAV",
  "XXv18wT11.wav", "non-matching", "s2T01",
  "XXv01s4B01.wav", "XXv01wB01.wav"
)
data.frame(
  x = x,
  item = tocs_item(x),
  type = tocs_type(x),
  length = tocs_length(x)
)

Compute AUCs using the trapezoid method

Description

Compute AUCs using the trapezoid method

Usage

trapezoid_auc(xs, ys)

partial_trapezoid_auc(xs, ys, xlim)
trapezoid_auc(xs, ys)

partial_trapezoid_auc(xs, ys, xlim)

Arguments

`xs`, `ys`	x and y positions
`xlim`	two-element vector (a range) of the `xs` to sum over

Value

the area under the curve computed using the trapezoid method. For partial_trapezoid_auc(), the partial area under the curve is computed.

Examples

if (requireNamespace("rstanarm", quietly = TRUE)) {
  wells <- rstanarm::wells
  r <- pROC::roc(switch ~ arsenic, wells)
  pROC::auc(r)
  trapezoid_auc(r$specificities, r$sensitivities)

  pROC::auc(r, partial.auc = c(.9, 1), partial.auc.focus = "sp")
  partial_trapezoid_auc(r$specificities, r$sensitivities, c(.9, 1))

  pROC::auc(r, partial.auc = c(.9, 1), partial.auc.focus = "se")
  partial_trapezoid_auc(r$sensitivities, r$specificities, c(.9, 1))

  pROC::auc(r, partial.auc = c(.1, .9), partial.auc.focus = "sp")
  partial_trapezoid_auc(r$specificities, r$sensitivities, c(.1, .9))

  pROC::auc(r, partial.auc = c(.1, .9), partial.auc.focus = "se")
  partial_trapezoid_auc(r$sensitivities, r$specificities, c(.1, .9))
}
if (requireNamespace("rstanarm", quietly = TRUE)) {
  wells <- rstanarm::wells
  r <- pROC::roc(switch ~ arsenic, wells)
  pROC::auc(r)
  trapezoid_auc(r$specificities, r$sensitivities)

  pROC::auc(r, partial.auc = c(.9, 1), partial.auc.focus = "sp")
  partial_trapezoid_auc(r$specificities, r$sensitivities, c(.9, 1))

  pROC::auc(r, partial.auc = c(.9, 1), partial.auc.focus = "se")
  partial_trapezoid_auc(r$sensitivities, r$specificities, c(.9, 1))

  pROC::auc(r, partial.auc = c(.1, .9), partial.auc.focus = "sp")
  partial_trapezoid_auc(r$specificities, r$sensitivities, c(.1, .9))

  pROC::auc(r, partial.auc = c(.1, .9), partial.auc.focus = "se")
  partial_trapezoid_auc(r$sensitivities, r$specificities, c(.1, .9))
}

Weight utterance lengths by using an ordinal regression model

Description

For each participant, we find their length of longest utterance. We predict this longest utterance length as a nonlinear function of some variable, and we compute the probability of reaching each utterance length at each value of the predictor variable. These probabilities are then normalized to provide weights for each utterance length.

Usage

weight_lengths_with_ordinal_model(
  data_train,
  var_length,
  var_x,
  id_cols,
  spline_df = 2,
  data_join = NULL
)
weight_lengths_with_ordinal_model(
  data_train,
  var_length,
  var_x,
  id_cols,
  spline_df = 2,
  data_join = NULL
)

Arguments

`data_train`	dataframe used to train the ordinal model. `data_train` can also be a function. In this case, it is applied to the `data_join` argument in order to derive (filter) a subset of the data for training.
`var_length`	bare name of the length variable. For example, `tocs_level`.
`var_x`	bare name of the predictor variable. For example, `age_months`.
`id_cols`	a selection of variable names that uniquely identify each group of related observations. For example, `c(child_id, age_months)`.
`spline_df`	number of degrees of freedom to use for the ordinal regression model.
`data_join`	(optional) dataset to use join the weights onto. This feature is necessary because we want to train a dataset on the observed data but supply the weights to the dataset with missing values imputed.

Value

the probability and weights of each utterance length at each observed value of var_x. These are in the added columns ⁠{var_length}_prob_reached⁠ and ⁠{var_length}_weight⁠, respectively.

References

Hustad, K. C., Mahr, T., Natzke, P. E. M., & J. Rathouz, P. (2020). Supplemental Material S1 (Hustad et al., 2020). ASHA journals. https://doi.org/10.23641/asha.12330956.v1

Examples

data_weights <- weight_lengths_with_ordinal_model(
  data_example_intelligibility_by_length,
  tocs_level,
  age_months,
  child,
  spline_df = 2
)

if (requireNamespace("ggplot2")) {
  library(ggplot2)
  p1 <- ggplot(data_weights) +
    aes(x = age_months, y = tocs_level_prob_reached) +
    geom_line(aes(color = ordered(tocs_level)), linewidth = 1) +
    scale_color_ordinal(end = .85) +
    labs(y = "Prob. of reaching length", color = "Utterance length")
  print(p1)

  p2 <- p1 +
    aes(y = tocs_level_weight) +
    labs(y = "Weight of utterance length")
  print(p2)
}
data_weights <- weight_lengths_with_ordinal_model(
  data_example_intelligibility_by_length,
  tocs_level,
  age_months,
  child,
  spline_df = 2
)

if (requireNamespace("ggplot2")) {
  library(ggplot2)
  p1 <- ggplot(data_weights) +
    aes(x = age_months, y = tocs_level_prob_reached) +
    geom_line(aes(color = ordered(tocs_level)), linewidth = 1) +
    scale_color_ordinal(end = .85) +
    labs(y = "Prob. of reaching length", color = "Utterance length")
  print(p1)

  p2 <- p1 +
    aes(y = tocs_level_weight) +
    labs(y = "Weight of utterance length")
  print(p2)
}

Package 'wisclabmisc'

Help Index

Set default arguments for brms model fitting

Description

Usage

Arguments

Details

Value

Examples

Compute the percentage of points under each centile line

Description

Usage

Arguments

Value

Compute chronological age in months

Description

Usage

Arguments

Value

Examples

Create an ROC curve from observed data

Description

Usage

Arguments

Value

Examples

Compute positive and negative predictive value

Description

Usage

Arguments

Details

Value

Examples

Create an ROC curve from smoothed densities

Description

Usage

Arguments

Value

Examples

Acquisition and developmental descriptions of consonants and vowels

Description

Usage

Format

Details

Crowe and McLeod (2020) norms for English consonant acquisition

The early 8, middle 8 and late 8 (Shriberg, 1993)

Consonant acquisition features

References

Simulated intelligibility scores by utterance length

Description

Usage

Format

References

Fake intelligibility data

Description

Usage

Format

Fake speaking rate data

Description

Usage

Format

Phonetic features of consonants and vowels

Description

Usage

Format

Details

Phonetic features of consonants

Considerations about consonant phonetic features

Phonetic features of vowels

Considerations about vowel features

Rename file basenames using functions

Description

Usage

Arguments

Details

Value

Examples

Fit a beta regression model (for intelligibility)

Description

Usage