TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes
Aritra Bose, Vassilis Kalantzis, Eugenia-Maria Kontopoulou, Mai Elkady, Peristera Paschou, Petros Drineas
Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative.We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires \\<5Â h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task.Source code and documentation are both available at https://github.com/aritra90/TeraPCA.Supplementary data are available at Bioinformatics online.
Genetics of the peloponnesean populations and the theory of extinction of the medieval peloponnesean Greeks
George Stamatoyannopoulos, Aritra Bose, Athanasios Teodosiadis, Fotis Tsetsos, Anna Plantinga, Nikoletta Psatha, Nikos Zogas, Evangelia Yannaki, Pierre Zalloua, Kenneth K. Kidd, Brian L. Browning, John Stamatoyannopoulos, Peristera Paschou, Petros Drineas
European Journal of Human Genetics 25(5), 637-645, 2017
Peloponnese has been one of the cradles of the Classical European civilization and an important contributor to the ancient European history. It has also been the subject of a controversy about the ancestry of its population. In a theory hotly debated by scholars for over 170 years, the German historian Jacob Philipp Fallmerayer proposed that the medieval Peloponneseans were totally extinguished by Slavic and Avar invaders and replaced by Slavic settlers during the 6th century CE. Here we use 2.5 million single-nucleotide polymorphisms to investigate the genetic structure of Peloponnesean populations in a sample of 241 individuals originating from all districts of the peninsula and to examine predictions of the theory of replacement of the medieval Peloponneseans by Slavs. We find considerable heterogeneity of Peloponnesean populations exemplified by genetically distinct subpopulations and by gene flow gradients within Peloponnese. By principal component analysis (PCA) and ADMIXTURE analysis the Peloponneseans are clearly distinguishable from the populations of the Slavic homeland and are very similar to Sicilians and Italians. Using a novel method of quantitative analysis of ADMIXTURE output we find that the Slavic ancestry of Peloponnesean subpopulations ranges from 0.2 to 14.4\%. Subpopulations considered by Fallmerayer to be Slavic tribes or to have Near Eastern origin, have no significant ancestry of either. This study rejects the theory of extinction of medieval Peloponneseans and illustrates how genetics can clarify important aspects of the history of a human population.