Akaike information criterion

Astrophysics (Index)

About

Akaike information criterion

(AIC) (particular formula for comparison of statistical models)

The Akaike information criterion (AIC) is a value (a "score") calculated from a statistical model and sample of data. By doing so for two different models, each with the same sample of data, the utility of using each model with the data can be compared. Formula:

AIC = 2k - 2 ln(maxL)

k - number of parameters in the model.
maxL - maximum likelihood of the model's distribution.

Lower score is better. An example of the type of model that might be evaluated is an initial mass function (IMF).

Given some collected data, a statistical model (such as probability density function (PDF) or probability mass function (PMF)) can be developed aiming to match the "real" process that generated the data. Types of such models are unlimited: they can be polynomials of any degree, and can incorporate other functions such as powers, logs, roots, trigonometric functions, etc. Given functions of the same form except for constants, a criteria such as least squares is useful to choose between them, but given the other choices, something else is needed. A polynomial of sufficiently high degree can be constructed to exactly match the distribution of any sample data, and short of that, models with more parameters can come closer to matching the distribution. Criteria is needed to decide whether the model is overfitted, i.e., whether the model is so specific to the sample upon which it was based that it is unlikely to fit another sample from the same source. The aim of the AIC is to comparatively evaluate the models for such overfitting.

AIC is reasonable for certain kinds of models, and is tailored to large samples (many data points). A modified AIC (AIC corrected or corrected AIC, abbreviated AICc) essentially includes a second-order term which improves the score and is most useful given small samples.

Deviance information criterion (DIC) is a generalized version of AIC for similarly comparing hierarchical statistical models (multilevel statistical models), for modeling processes that have more than one source of variation. I think an example of this might be an IMF that also depends upon absorption redshift.

(statistics) Further reading:
https://en.wikipedia.org/wiki/Akaike_information_criterion
https://medium.com/geekculture/akaike-information-criterion-model-selection-c47df96ee9a8
https://www.scienceopen.com/document_file/0746f88a-f484-4fe5-b930-d91b0b4a850c/PubMedCentral/0746f88a-f484-4fe5-b930-d91b0b4a850c.pdf
https://www.scribbr.com/statistics/akaike-information-criterion/
https://iowabiostat.github.io/research-highlights/joe/Cavanaugh_Neath_2019.pdf

Index