- LEA is an R package dedicated to landscape genomics and ecological association tests. LEA can run analyses of population structure and genome scans for local adaptation. It includes statistical methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (sNMF, PCA); and identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (LFMM), and controlling the false discovery rate. LEA is mainly based on optimized C programs that can scale with the dimension of very large data sets.
sNMF: Inference of individual admixture coefficients, which is important for population genetic and association studies, is commonly performed using compute-intensive likelihood algorithms. With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. Reducing the computational burden of estimation algorithms however remains a major challenge.
sNMF is a fast and efficient program for estimating individual admixture coefficients based on sparse non-negative matrix factorization and population genetics. We already successfully applied sNMF to large human and plant genomic data sets. The performances of sNMF were then compared to the likelihood algorithm implemented in the computer program ADMIXTURE. Without loss of accuracy, sNMF computed estimates of admixture coefficients within run-times approximately 10 to 30 times faster than those of ADMIXTURE.
LFMM: Adaptation to local environments often occurs through natural selection acting on large number of alleles, each having a weak phenotypic effect. One way to detect those alleles is by identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures.
The LFMM computer program includes an integrated framework based on population genetics, ecological modeling and statistical techniques for screening genomes for signatures of local adaptation.
It implements fast algorithms using latent factor mixed models based on a variant of Bayesian principal component analysis in which residual population structure is introduced via unobserved factors.