BCM - Computational and Mathematical Biology


Olivier François, PhD 


Computer programs

  • TESS is a computer program that implements a Bayesian clustering algorithm for spatial population genetics. Is it particularly useful for seeking genetic barriers or genetic discontinuities in continuous populations. The method is based on a hierarchical mixture model where the prior distribution on cluster labels is defined as a Hidden Markov Random Field. Given individual geographical locations, the program seeks population structure from multilocus genotypes without assuming predefined populations. TESS takes input data files in a format compatible to existing non-spatial Bayesian algorithms (e.g. STRUCTURE). It returns graphical displays of cluster membership probabilities and geographical cluster assignments from its Graphical User Interface.
  • POPS The POPS program performs inference of ancestry distribution models. It uses a TESS-like interface to compute individual cluster membership and admixture proportions based on multilocus genotype data and their correlation with environmental and geographical variables. Similarly to species distribution models, POPS provides routines to project cluster memberships and admixture proportions under scenarios of environmental change. Typical uses of POPS are for evaluating how the population genetic structure of a species could be modified by climate change, or testing hypotheses about local adaptation and ecological speciation.
  • LFMM: Adaptation to local environments often occurs through natural selection acting on large number of alleles, each having a weak phenotypic effect. One way to detect those alleles is by identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures. The LFM computer program includes an integrated framework based on population genetics, ecological modeling and statistical techniques for screening genomes for signatures of local adaptation. It implements fast algorithms using latent factor mixed models based on a variant of Bayesian principal component analysis in which residual population structure is introduced via unobserved factors.
  • sNMF: Inference of population structure using sparse non-negative matrix factorization algorithms.
  • abc is an R contributed package. Approximate Bayesian computation (ABC) is devoted to the analysis of complex models. The R package abc that implements several ABC algorithms for performing parameter estimation and model selection.  
  • apTreeshape (Windows package) is an R contributed package dedicated to simulation and analysis of phylogenetic tree topologies using statistical indices. It is a companion library of the 'ape' package. It provides additional functions for reading, plotting, manipulating phylogenetic trees. It also offers convenient web-access to public databases, and enables testing null models of macroevolution using corrected test statistics. Trees of class "phylo" from the 'ape' package can be converted easily into a 'treeshape' format. Also contains new diversification rate shift tests.  
  • LEA is an R package dedicated to landscape genomics and ecological association tests. LEA can run analyses of population structure and genome scans for local adaptation. It includes statistical methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (snmf, pca); and identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (lfmm), and controlling the false discovery rate. LEA is mainly based on optimized C programs that can scale with the dimension of very large data sets. 

Other Interesting links

  • A very basic C code for the simple but efficient evolutionary algorithm MOSES (Mutation Or Selection Evolutionary Strategy) . Just run gcc Mos_tri.c -lm and ./a.out in order to see the performance of the algorithm on a specific test problem, the Rastrigin function (you should be able to modify the test-problem easily).
  • A more elaborated C++ program (archive) for the same algorithm.
  • GENECLUST: Bayesian clustering and MCMC using hidden Markov Random Fields R computer package. The package was developed by Sophie Ancelet (main contributor) Gilles Guillot and myself. This is a Linux version only. To install GENECLUST, log as a superuser and type "R CMD INSTALL Geneclust_0.1.tar.gz" (also requires the R packages deldir, fields, spatial to be installed on your computer). This package contains the source codes and the data sets used in our Genetics paper "Francois et al. (2006). Bayesian Clustering using hidden Markov Random Fields in Spatial Population Genetics". A windows C++ program implementing a similar (but faster) algorithm is available elsewhere click here.
  • GenBMap: is a user-friendly Microsoft Windows software for the study of the spatial genetic structure. The program allows an user to analyze genotypes and spatial coordinates simultaneously through a two dimensional graphical representation. The algorithm is described  in Cercueil et al (2007, TPB).
  • FASTRUCT: For Windows OS (contains user's guide). Bayesian model-based clustering programs have gained increased popularity in studies of population structure since the publication of the software STRUCTURE. These programs are generally acknowledged as performing well, but their running-time may be prohibitive. FASTRUCT is a non-Bayesian implementation of the classical model with no-admixture uncorrelated allele frequencies. This new program relies on the Expectation-Maximization principle, and produces assignment rivaling other model-based clustering programs. In addition, it can be several-fold faster than Bayesian implementations. The software consists of a command-line engine, which is suitable for batch-analysis of data, and a MS Windows graphical interface, which is convenient for exploring data.