#3. Parameter Estimation and Model Selection

728x90

Latent Variable Mdoeling

3. Parameter Estimation and Model selection

Model estimation

모수추정은 maximum-likelihood(ML)과 Bayesian methods로 가능
LCA에서 모수추정은 ML방식에서는 EM알고리즘 or Newton Raphson 알고리즘을 사용
Upon Convergence, Standard error는 inverting the negative Hessian matrix에 의해 성립된다.
즉, the negative second derivative matrix of loglikelihood function
다른 finite mixure models와 같이 LCA는 likelihood function에서 특이항 특징을 가질 수 있다.
이것은 표준 ML 및 베이지안 methods에 어려움을 초래할 수 있다.

Estimating Model Parameters

LCA는 2개의 paramter를 가지고 있다.
Latent class prevalences $\gamma$ and Item response probablilities $\rho$.
We`d like to maximize the (log)likelihood function:

$$L = \prod_{i=1}^{n}P(\mathbf{Y=y_{i})} \,,
= \prod_{i=1}^{n}\sum_{i=1}^{C}\gamma_{l}\prod_{m=1}^{M}\prod_{k=1}^{r_{m}}\rho_{mk|l}^{I(y_{im}=k)}$$

LCA는 closed-form estimates of modelprameters는 불가능하다.
LCA의 모수 추정은 by means of(~을 통해) seome version of iterative procedures:
- EM알고리즘(Expactation-maximazaion)
- Newton Raphson algorithm
- Hybrid algorithm

EM Algorithm

Direct maximazation of loglikelihood is complicated.
만약 class membership were known, 우리는 쉽게 loglikelihood를 쉽게 최대화할 수 있다.

$$
L^{\star} = \prod_{i=1}^{n}P(\mathbf{Y=y_{i}},(L=l_{i})\,,
=\prod_{i=1}^{n}\gamma_{l_{i}}\prod_{m=1}^{M}\prod_{k=1}^{r_{m}}\rho_{mk|l}^{I(y_{im}=k)}
$$

iterating two steps(E-step and M-step) produces a sequence of parameter estimates that converges reliably to a local or global maximum of loglikelihood.
두 단계(E-step 및 M-step)를 반복하면 로컬 또는 전역 로그 가능성 최대치로 안정적으로 수렴하는 일련의 모수 추정치가 생성됩니다.

E-step

We compute the $\bf{posterior , probability}$ of the class membership for each individual $\mathbf{Y_{i}} = (y_{i1},...,y_{iM}),i=1,...,n.$
$$\theta_{il}=P(L=l|\mathbf(Y=y_{i}))$$

M-step

In the M-step, parameter estimates are updated by maximazing the expected complete-data loglikelihood, assuming the the class membership is observed.
The expected complete-data loglikelihood with respect to the model parameters can be written as 수식
The expected complete-data loglikelihood is the sum of two likelihoods of multinomial distribution with fractional counts.
We updated the parameter estimates by
$$\hat{\gamma_{l}} = \frac{\sum_{i=1}^{n}\theta_{il}}{n}$$ $$\hat{\rho_{mk|l}}=\frac{\sum_{i=1}^{n}\theta_{il}I(y_{im}=k)}{\sum_{i=1}^{n}\theta_{il}}$$

Missing Data Estimation

Empirical data에서는 Missing data는 대부분 발생한다.
대부분의 LCA에서는 missing data가 존재한다.
MCAR and MAR 데이터를 핸들링한다.
In the E-step, the conditional probability of $\theta_{il}$ is caculated only using the observed response of $\bf{Y_{i}}$.
$$
\theta_{il}^{obs} = 수식
$$
In the M-step, we update the parameter estimates by 수식

Two Genera Way to Assess a Model

Absolute Model fit

Relative Model fit

많은 과학자들이 상대적인 모형 적합성만을 평가한다. 이것은 경쟁 모델들의 집합에서 최고의 모델이 실제로 잘 맞는지에 대해 아무 말도 하지 않습니다.

Model Selection Criteria

Model assessment methods under consideration
- The loglikelihood-ratio statistic(LRT)
- Bootstrapping LRT
- Posterior probability check distribution(PPCD)
- AIC
- BIC
The Likelihood-Ratio Statistic
LRT는 LCA의 모형 적합을 평가하는 표준통계량

$G^2=,,2\sum_{r=1}^{npatt}O_{r}log{O_{r} \over E_{r}}\=,,-2\times(log(L_{LCA})-log(L_{sat}))~\chi^2_{df},$

where $npatt$ = number of possible response patterns.

$G^2$가 클수록 귀무가설 기각의 큰 증거이다.

Some parameters가 고정이었을 때, 우리는 the constrained 와 free estimated models를 비교하고자 LRT difference를 사용할 수 있다.
Difference in LRT has asymptotic chi-square distribution with degrees of freedom equals to difference in number of parameters.
Standard LRT may NOT be a valid statistic to assess the absolute fit for LCA involving large contingency tables with large degrees of freedom.

The Likelihood-Ratio Statistic and Missing Data
Alternatives to the Standard LRT

Information Criteria
- Akaike information criterion(AIC)
  $AIC = G^2 + 2params$
- Bayesian information criterion(BIC)
  $BIC = G^2 +log(n) \times params$
- Consistent AIC(CAIC)
  $CAIC = G^2 + params(log(n)+1)$

The lower the AIC or BIC values, the better the model.

728x90

저작자표시 비영리 변경금지 (새창열림)

'Statistic Class > Latent Class lecture' 카테고리의 다른 글

#1. Overview (0)	2023.12.20

Latent Variable Mdoeling