Frequently Asked Questions

Can I run STRUCTURE analyses from the R command line using LEA?

Answer: Yes. The "snmf" function produces results very similar to STRUCTURE. It's faster and allows exploring the data more effciently. The choice of the number of clusters is based on a cross-validation criterion (same as ADMIXTURE), which may be more reliable than methods used for STRUCTURE.

How do we convert data in the STRUCTURE format (eg. microsatellites) in the lfmm/geno format?

Answer: Use the struct2geno() function from the latest release of LEA (from GitHub). Note that the function accepts data for haploid and diploid genotypes only.

Can "lfmm" and "lfmm2" analyse several environnemental variables in a single run?

Answer: Yes, lfmm2 performs multidimensional regressions and can test the set of environmental predictors simultaneously (or one after the other). lfmm can also analyse several variables, but it runs distinct models for each variable. We recommend avoiding this option (unless your clear with the process), and we suggest creating separate projects for each variable.

What should I do when i have many correlated environmental variables?

Answer: Since variables are used as proxies for ecological pressures, we suggest running principal component analysis (prcomp) for those variables, and considering principal components of environment predictors or groups of predictors (e.g., grouping temperature-related predictors, precipitation-related predictors, etc) as the variables to use in the lfmm analysis.

Can "lfmm" and "lfmm2" handle missing genotypes?

Answer: No, but LEA contains a missing data imputation function ("impute") that can preprocess the data and impute the missing genotypes. The "impute" function uses the results of the population structure analysis, and requires an "snmf" object/project for the genotype data.

Do I need to filter out the data before running "lfmm" and "lfmm2" tests?

Answer: Yes. We strongly recommend filtering the genomic data for MAFs greater than 5 to 10 percent (common variants). In lfmm2, we recommend to perform LD pruning in the estimation function (lfmm2), and then to use all genotypes in the test function (lfmm2.test).

How can i check the results of "lfmm" and "lfmm2"?

Answer: Use the calibrated p-values and display an histogram of those significance values. The histogram should look flat with a peak close to zero. If the shape is incorrect, change the number of factors (K).

How could i accelerate the "lfmm" analysis?

Answer: lfmm2 is a fast method for "lfmm" (not based on MCMC). Additionally, we suggest running distinct models for distinct chromosomes, and exploring values of K close to the number of ancestral populations in snmf/pca (plus one or two).

LEA: an R package for Landscape and Ecological Associations studies