The Crowded-field problem

High spatial densities of activated molecules can result in a ‘‘crowded field problem,’’ in which single molecules are not adequately resolved. The reason for this is the limitation of the PSF models described above to fit only a single molecule. To help solve this problem, ThunderSTORM uses a multiple-emitter fitting analysis (MFA) approach similar to the algorithm described in [2].

Multiple-emitter fitting analysis

The multiple-emitter fitting analysis approach uses a PSF defined by a model

\mathrm{PSF}_{N}\left(x,y\mid\boldsymbol{\Theta}\right)=\sum_{i=1}^{N}\mathrm{%
PSF}\left(x,y\mid\boldsymbol{\theta}_{i}\right)\,, (1)

where N is a number of molecules allowed in the fitting region, and \boldsymbol{\Theta}=\left[\boldsymbol{\theta}_{1},\ldots,\boldsymbol{\theta}_{%
N}\right] are parameters describing position and shape of the imaged molecules modeled by the PSF.

The fitting of multiple-emitter models to the raw data proceeds according to the following algorithm. First, the algorithm fits \mathrm{PSF}_{1} (a single molecule model) with an initial molecular position. The fitted PSF is subtracted from the raw data and the position of the maximum intensity value in the residual image is taken as an approximate position of a second molecule. The fitting is now repeated on the raw data with \mathrm{PSF}_{2} (a model containing two molecules) and with the initial positions estimated in the previous steps. The result of the fit is subtracted from the raw data to find an approximate position of a third molecule in the residual image. This routine is repeated until the maximum number of molecules allowed in the fitting region is reached. Note that initial positions of the molecules are further adjusted, during the multiple-emitter fitting analysis, by a ‘‘Push&Pull" process [2]. To find the optimal number of molecules, statistical tests are required, see Section Model selection.

Users can specify the size of the fitting region, the maximum number of molecules allowed in one fitting region, the type of PSF, and the fitting method (least-squares methods or maximum likelihood estimation). Optionally, users can constrain the multiple-emitter fitting algorithm such that all fitted molecules have the same intensity or an intensity in a given range. The background offset is constrained to the same intensity for all fitted molecules.

Model selection

Because a model with more parameters will always be able to fit the data at least as well as a model with fewer parameters [3, 1], statistical tests are required to determine whether the more complex model provides a significantly better fit of the underlying data. Statistical tests are usually based on pair-wise model comparison. Here a fit by \mathrm{PSF}_{1} is compared with a fit by \mathrm{PSF}_{2}, the better of the two is compared with a fit by \mathrm{PSF}_{3}, etc. Pair-wise comparisons are based on an F-test [3, 1] or on a log-likelihood ratio test [3, 2] as described below.

F-test

An F-test [3, 1] arises in the case of fitting by least squares methods, when we need to compare significance of the fit between two models, where one model (the null model) is a special case of the other (the alternative model) for some choice of parameters. The F-test statistic computed from the data is given by the formula

F_{\chi^{2}}\left(\mathcal{D}\right)=\frac{\left(\chi^{2}\left(\boldsymbol{%
\Theta}_{0}\mid\mathcal{D}\right)-\chi^{2}\left(\boldsymbol{\Theta}_{1}\mid%
\mathcal{D}\right)\right)/\left(v_{1}-v_{0}\right)}{\chi^{2}\left(\boldsymbol{%
\Theta}_{1}\mid\mathcal{D}\right)/\left(n-v_{1}\right)}\,, (2)

where the sum of squared residuals \chi^{2}\left(\boldsymbol{\Theta}\mid\mathcal{D}\right) computed for a model with parameters \boldsymbol{\Theta} is defined here, vectors \boldsymbol{\Theta}_{0} and \boldsymbol{\Theta}_{1} are parameters of the null and alternative model, respectively, v_{0} and v_{1} (where v_{0}<v_{1}) represent the number of free parameters of the null and alternative model, respectively, and n=l^{2} is the number of data points within the fitting region \mathcal{D}.

Assuming the null hypothesis that the alternative model does not provide a significantly better fit than the null model, the F-test statistics computed in Equation (2) has an F-distribution with \left(v_{1}-v_{0},n-v_{1}\right) degrees of freedom. The null hypothesis is rejected if F_{\chi^{2}}\left(\mathcal{D}\right) computed from the data is greater than the critical value of the F_{v_{1}-v_{0},n-v_{1}} distribution for a user-specified p-value.

Log-likelihood ratio test

To compare between the fits of two models, in the case of fitting by the maximum likelihood estimation method, we use a model selection criteria based on a log-likelihood ratio test [3, 2]. Assuming that one model (the null model) is a special case of the other (the alternative model) for some choice of parameters, the log-likelihood ratio is given by the formula

\Lambda\left(\mathcal{D}\right)=-2\ln\left[\frac{L\left(\boldsymbol{\Theta}_{0%
}\mid\mathcal{D}\right)}{L\left(\boldsymbol{\Theta}_{1}\mid\mathcal{D}\right)}%
\right]\,, (3)

where the likelihood L\left(\boldsymbol{\Theta}\mid\mathcal{D}\right) of parameters \boldsymbol{\Theta} is defined here, \boldsymbol{\Theta}_{0} and \boldsymbol{\Theta}_{1} are the parameters of the null and alternative model, respectively, and \mathcal{D} is a fitting region.

The probability distribution of the log-likelihood ratio computed in Equation (3), assuming the null hypothesis that the alternative model does not provide a significantly better fit than the null model, can be approximated by the \chi^{2} distribution with v_{1}-v_{0} degrees of freedom. This approximation is usually valid even for small sample sizes [3]. Here v_{0} and v_{1} (where v_{0}<v_{1}) represent the number of free parameters of the null and alternative models, respectively. The null hypothesis is rejected if the log-likelihood ratio \Lambda\left(\mathcal{D}\right) computed from the data is greater than the critical value of the \chi_{v_{1}-v_{0}}^{2} distribution for some p-value specified by the users.

Guidelines for the choice of parameters

Multiple emitter fitting analysis (MFA) can be used in high-density data to estimate the number of molecules detected as a single blob. We recommend setting 3 to 5 molecules per fitting region. The stability of the algorithm is improved if the molecular brightness is limited to a realistic range (perhaps 300 to 5000 photons), depending on the sample. This range can be estimated by ThunderSTORM by processing a few frames of the data and evaluating the number of detected photons from each molecule using the histogram function (this relies on correct entry of the camera parameters). The algorithm can also be more stable by forcing all molecules to have the same intensity. Unfortunately, multiple emitter analysis is a computationally costly method, and may run quite slowly depending on the user-specified maximum number of molecules within the fitting area. Note that the size of the fitting radius might need to be increased slightly to accommodate larger blobs.

References