Chapter Objectives
ㅇ To provide a more formal introduction to latent class analysis
ㅇ To review two empirical examples based on health-related data gathered on adolescents:
- Female pubertal development
- Health risk behaviors
ㅇ To introduce the notation that will be used throughout the course
ㅇ To explain the mathematical model
Example: Pubertal Development
ㅇ The data are from the public version of the Add Health
(Udry, 2003),Wave I (1994-1995).
ㅇ The sample consists of 469 adolescents (mean age = 12.9
years) who were seventh grade in the US.
ㅇ Only subjects who provided a response to at least one
variable were included.
ㅇ The girls were asked about their body change to measure
pubertal development.
Puberty
ㅇ Growing up can be dicult, especially with all these new
changes going on!
ㅇ So sometimes, it’s nice to have someone explain how, what
and why these things are happening.
Indicators of Pubertal Development
Contingency Table
ㅇ We might wonder whether there are any patterns of
responses that stand out as occurring more frequently than
the others.
We can create a contingency table with the four variables.
The size of the contingency table is 2 3 3 3 = 54 cells.
Each of the cells represents the frequency of subject who provided a combination of responses to the four items (i.e., response pattern).
ㅇ The incomplete response patterns (i.e., missing data) should be handled properly.
Why Conduct LCA on the Puberty Data?
To represent the complex array of the data in the
contingency table in terms of a small number of
representative response patterns (i.e., latent classes)
To provide a sense of the prevalence of each latent class
To provide a sense of the amount of error associated with
each variable in measuring the latent classes
LCA on the Puberty Data
LCA on the pubertal development data indicated that four
latent classes represented the data adequately.
1. Delayed menstrual onset (24%)
2. Biologically mature (29%)
3. Visibly mature (19%)
4. Mature (29%)
LCA for the Pubertal Development Data
Quantitative and Qualitative Differences Among Classes
The mature class was ahead of of the other three classes.
The biologically mature class was ahead of the delayed
menstrual onset class.
The visibly mature class was ahead of the delayed menstrual
onset class.
However, the biologically mature girls appeared to be
developing in dierent ways from those in the visibly
mature class.
Latent Class Membership
The underlying idea of LCA is that everyone belongs to
one and only one latent classes.
However, the LCA does not know each individual’s latent
class membership.
Instead, corresponding to each individual there is a vector
of probabilities of membership in each latent class.
Item-Response Probability
In LCA, it is the responsibility of the investigator to assign
names to the latent classes.
Interpretation of the latent classes is based on the
item-response probabilities.
A Hypothetical Example
In LCA, it is the responsibility of the investigator to assign
names to the latent classes.
Interpretation of the latent classes is based on the
item-response probabilities.
Example: Health Risk Behaviors
The data are from the Youth Risk Behavior Survey (CDC, 2004).
The sample consists of 13,840 U.S. high school students.
The students were asked to indicate whether they had engaged in each of the 12 health risk behaviors.
Indicators of Health Risk Behavior
LCA of the Health Risk Behavior Data
The Latent Class Measurement Model
There are two criteria that define a strong relation between
an observed variable and a latent variable:
1. Distribution of mkjl for observed variable m that varies
across latent classes, l = 1; : : : ;C.
2. Array of item-response probabilities corresponding to
observed variable m that are close to 1 and 0.
Distribution of rho s Across Latent Classes
Suppose there is no relation between the observed variable
m and the latent variable.
The response for the observed variable m by an individual
does not depend on class membership.
The item-response probabilities across all the latent classes
are identical:
Distribution of rho s Across Latent Classes
Suppose observed variable m and the latent variable L are
not independent.
The item-response probabilities across all the latent classes
are not identical:
rho s Are Close to 0 and 1
Hypothetical Example of Independence
Homogeneity
Hypothetical Example of Strong Homogeneity
Latent Class Separation
When a set of item-response probabilities is characterized
by good latent class separation, the pattern of
item-response probabilities clearly dierentiates among
the latent classes.
When there is a high degree of latent class separation, a
response pattern that has a large probability of occurrence
conditional on one latent class will have much smaller
probabilities on any other latent classes.
Hypothetical Example of High Degree of Class Separation
Example: Smoking Behavior
Consider a brief questionnaire to be given to college
students who smoke at least occasionally.
There are three response categories for each of following
three items: Always / Sometimes / Never.
1. Do you smoke first thing in the morning?
2. Do you purchase packs of cigarettes exclusively for your
own use?
3. Do you go more than three consecutive days without
smoking at all?
Homogeneity and Latent Class Separation
A high-degree of latent class separation implies a high
degree of homogeneity.
However, a high degree of homogeneity does not
necessarily imply a high degree of latent class separation.
LCA with perfect homogeneity and latent class separation
is not really a latent variable model.
When there is error, response patterns are observed that
would not have been observed if there were no error.
Bernoulli Trial
Multinomial Distribution
Independent Multinomials
Latent Class Analysis
An LCA is made up of estimated latent class prevalences
and item-response probabilities that can be used to obtain
expected cell proportions for the contingency table.
If the model fits the data well, the expected cell
proportions closely match the observed cell proportions.
We assume that there is no missing data on the observed
indicator variables (missing data will be discussed in
Chapter 4).
Notation
Complete Data Likelihood
Model Assumption
Local Independence
There is one fundamental assumption, called local
independence assumption, made by latent class models.
The assumption of local independence specifies that
conditional on the latent variable, the observed variables
are independent.
The Local Independence Assumption
The aim of an LCA is to determine whether the
dependencies between the observed variables may be
explained by a small number of latent classes.
Here explaining means that the observed variables are
assumed to be conditionally independent given the latent
class membership.
Likelihood Function
Expected Response Probabilities
Find the expected response probabilities under the
following LCA estimates.
Posterior Probabilities of Latent Class Membership
Posterior Probability
Three Examples of Smoking Behavior
Posterior Probability
Even in the model with high homogeneity and latent class
separation, there are individuals for whom there is much
classification uncertainty.
It is nearly inevitable that in any empirical data set at least
a few individuals will have response patterns for which
posterior classification is very uncertain.
Mean Posterior Probabilities