Meta-omics - overview

Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing (genome wide metagenomics and metatranscriptomics, or targeted to specific genes) is increasingly used in human and animal health, food safety, and environmental studies. 


Accurate identification of metagenomic sample content

One of the key questions relating to metagenomic samples is identifying all the organisms that are present in the mixture, while avoiding false positive calls. We approach this question from the perspective of utilizing all the sequencing reads' mappings to, often multiple, reference genomes.


Our approach is based on promiscuity of reads, i.e., reads mapping to multiple organisms, in contrast to current approaches that rely on the abundance of reads. Ranking the potential matches for each read, we demonstrate through simulations that the rank frequency distribution of true positive organisms’ reads peak at rank 1. To further enrich the true positives, we define a normalized score per organism, based on the promiscuity. Sorting by the score, the false positives sink to the bottom. Our preliminary experiments demonstrate that false positive organisms can be substantially reduced, without losing any true positives. Research on this topic was presented as a talk at the International Association for Food Protection 2016 annual meeting [1].

An application of Topological Data Analysis to the problem of separating truly present organism from false positives will be presented at APBC 2019 [6].

Characterization and comparison of metagenomes

We are exploring the use of RoDEO, our method for differential gene expression, for sample comparisons and OTU abundance comparisons. First results on this ongoing work were presented at the 13th International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (2016) [2].

An extended journal version including results from using the top most differentially abundant OTUs was published in LNCS (2017) [3].


  RoDEO top DA OTUs

We have also developed PRROMenade for efficient and accurate functional classification of metagenomic and metatranscriptomic reads in terms of a functional annotation hierarchy [8]. We take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. 

The functional characterization work has since been extended and applied with additional publicly available microbiome annotation tools [11], see also 

We have applied PRROMenade functional annotation with RoDEO processing on COVID-19 respiratory tract metatranscriptomes to detect functional signatures differentiating healthy and afflicted subjects' microbiomes [12].


We have also explored machine learning for phenotype prediction from human gut, oral, and skin microbiomes, specifically for the task of age prediction, to better understand age-related changes in the microbiome [9], see also

Sequencing the Food Supply Chain

The Consortium for Sequencing the Food Supply Chain (SFSC), founded by IBM Research and Mars, Inc., examines the global food chain - from farms, transport, processing facilities and distribution channels to restaurants and grocery stories - and applies genomics and analytics techniques to mitigate food borne illness and other risks in food management.

Our research on meta-omics is closely linked to the consortium efforts to understand and characterize microbiomes of food samples. For more information on the consortium, see Consortium for Sequencing the Food Supply Chain.

The outcomes include development of the MCAW compute service for processing the massive amount of metagenomic and metatranscriptomic data [4]. Results on this topic have been presented, among others, at Food Micro 2018 [5].

The consortium work also resulted in a pipeline and publication on food authentication from shotgun sequencing reads [7], see also

The latest consortium publication [10] focuses on the microbiome community of the food sequencing data introduced in the earlier companion publication [7].


Explainable AI and the microbiome

Utilizing AI to accurately predict host phenotypes from the microbiome can support the development of non-invasive diagnostics and condition monitoring approaches. Beyond accurate prediction, being able to explain the reasons behind the predictions is important for building trust in AI and promoting its adoption in healthcare and life sciences. We have developed an Explainable AI framework and applied it in predicting various host phenotypes from the skin microbiome [13], see also


AI for Healthy Living

Collaboration with University of California, San Diego to develop machine learning methods and software tools, and generate novel findings, that implicate the human microbiome in health and disease:



[1] Understanding False Positives in Mapping of Microbiome Sequence Data Using In-Silico Simulations, talk by Niina Haiminen, IAFP Annual Meeting, St. Louis, Missouri, Aug 2016.

[2] Dimension reduction of metagenome data using RoDEO improves phenotype prediction, CIBB, Stirling, UK, Sept 2016.

[3] Host phenotype prediction from differentially abundant microbes using RoDEO Lecture Notes in Computer Sciencepp. 27-41, Springer, 2017 

[4] Design of the MCAW compute service for food safety bioinformatics  IBM Journal of Research and Development 60(5/6), 2016 

[5] Deep metatranscriptomic sequencing indicates stable microbial community across seasons and suppliers for protein meal factory ingredient, talk  by Niina Haiminen, Food Micro Conference, Berlin, Germany, Sept 2018

[6] Signal enrichment with strain-level resolution in metagenomes using topological data analysis. BMC Genomics 20:2, 194, 2019

[7] Food authentication from shotgun sequencing reads with an application on high protein powders. npj Science of Food 3(24), 2019.

[8] Hierarchically labeled database indexing allows scalable characterization of microbiomes iScience 23(4), 2020.

[9] Human Skin, Oral, and Gut Microbiomes Predict Chronological Age mSystems 5(1), 2020.

[10]  Monitoring the microbiome for food safety and quality using deep shotgun sequencing npj Science of Food 5(3), 2021.

[11] Re-purposing software for functional characterization of the microbiome Microbiome 9(4), 2021.

[12] Functional profiling of COVID-19 respiratory tract microbiomes Scientific Reports 11(6433), 2021.

[13] Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences Scientific Reports 11(4565), 2021.