the Microbiome
We are still not quite finished with our contemplation of the sources of individual identity. I refer now to our microbiome. These are the microbes that share our body space and that inhabit our skin, our mucous membranes, and our gut.
Defining the Human Microbiome
Luke K Ursell,1 Jessica L Metcalf,1 Laura Wegener Parfrey,1 and Rob Knight1,2,*
Author information ► Copyright and License information ►
The publisher's final edited version of this article is available at Nutr Rev
See other articles in PMC that cite the published article.
Go to:
Abstract
Rapidly developing sequencing methods and analytical techniques are enhancing our ability to understand the human microbiome, and, indeed, how we define the microbiome and its constituents. In this review we highlight recent research that expands our ability to understand the human microbiome on different spatial and temporal scales, including daily timeseries datasets spanning months. Furthermore, we discuss emerging concepts related to defining operational taxonomic units, diversity indices, core versus transient microbiomes and the possibility of enterotypes. Additional advances in sequencing technology and in our understanding of the microbiome will provide exciting prospects for exploiting the microbiota for personalized medicine.
Go to:
Introduction
The human microbiota consists of the 10-100 trillion symbiotic microbial cells harbored by each person, primarily bacteria in the gut; the human microbiome consists of the genes these cells harbor[1]. Microbiome projects worldwide have been launched with the goal of understanding the roles that these symbionts play and their impacts on human health[2, 3]. Just as the question, “what is it to be human?”, has troubled humans from the beginning of recorded history, the question, “what is the human microbiome?” has troubled researchers since the term was coined by Joshua Lederberg in 2001 [4]. Specifying the definition of the human microbiome has been complicated by confusion about terminology: for example, “microbiota” (the microbial taxa associated with humans) and “microbiome” (the catalog of these microbes and their genes) are often used interchangeably. In addition, the term “metagenomics” originally referred to shotgun characterization of total DNA, although now it is increasingly being applied to studies of marker genes such as the 16S rRNA gene. More fundamentally, however, new findings are leading us to question the concepts that are central to establishing the definition of the human microbiome, such as the stability of an individual's microbiome, the definition of the OTUs (Operational Taxonomic Units) that make up the microbiota, and whether a person has one microbiome or many. In this review, we cover progress towards defining the human microbiome in these different respects.
Studies of the diversity of the human microbiome started with Antonie van Leewenhoek, who, as early as the 1680s, had compared his oral and fecal microbiota. He noted the striking differences in microbes between these two habitats and also between samples from individuals in states of health and disease in both of these sites [5, 6].Thus, studies of the profound differences in microbes at different body sites, and between health and disease, are as old as microbiology itself. What is new today is not the ability to observe these obvious differences, but rather the ability to use powerful molecular techniques to gain insight into why these differences exist, and to understand how we can affect transformations from one state to another.
Culture-independent methods for characterizing the microbiota, together with a molecular phylogenetic approach to organizing life's diversity, provided a fundamental breakthrough in allowing researchers to compare microbial communities across environments within a unified phylogenetic context (reviewed in [7]). Although host-associated microbes are presumably acquired from the environment, the composition of the mammalian microbiota, especially in the gut, is surprisingly different from free-living microbial communities [8]. In fact, an analysis of bacterial diversity from free-living communities in terrestrial, marine, and freshwater environments as well as communities associated with animals suggests that the vertebrate gut is an extreme [8]. In contrast, bacterial communities from environments typically considered extreme, such as acidic hot springs and hydrothermal vents, are similar to communities in many other environments[9]. This suggests that coevolution between vertebrates and their microbial consortia over hundreds of millions of years has selected for a specialized community of microbes that thrive in the gut's warm, eutrophic, and stable environment[8]. In the human gut and across human-associated habitats, bacteria comprise the bulk of the biomass and diversity, though archaea, eukaryotes, and viruses are also present in smaller numbers and should not be neglected[10, 11].
Interestingly, estimates of the human gene catalog and the diversity of the human genome pale in comparison to estimates of the diversity of the microbiome. For example, the Meta-HIT consortium reported a gene catalog of 3.3 million non-redundant genes in the human gut microbiome alone[3], as compared to the ∼22,000 genes present in the entire human genome[12]. Similarly, the diversity among the microbiome of individuals is immense compared to genomic variation: individual humans are about 99.9% identical to one another in terms of their host genome[13], but can be 80-90% different from one another in terms of the microbiome of their hand[14] or gut[15]. These findings suggest that employing the variation contained within the microbiome will be much more fruitful in personalized medicine, the use of an individual patient's genetic data to inform healthcare decisions, than approaches that target the relatively constant host genome.
Many fundamental questions about the human microbiome were difficult or impossible to address until recently. Some questions, such as the perennially popular “how many species live in a given body site?”, are still hard to answer, due to problems with definitions of bacterial species and with the rate of sequencing error. Other questions, such as “how does the diversity within a person over time compare to the diversity between people?”, or “how does the diversity between sites on the same person's body compare to the diversity between different people at the same site?”, or “is there a core set of microbial species that we all share?”, can now be answered conclusively. In the next section, we discuss some of the tools that have allowed these long-standing questions to be answered.
Go to:
Tools for Microbial Analysis
The drastic reduction in sequencing costs experienced over the past few years has made it possible to identify specific microbial taxa found within the human gut that are difficult or impossible to culture. Researchers are now able to generate millions of sequences per sample in order to assess differences in microbial communities between body sites and individuals. Our increased sequencing power has required the development of equally powerful computational tools to handle the burgeoning amount of sequence data produced by modern technologies[16]. There are several pipelines for analysis of microbial microbial community data such as mothur[17], w.A.T.E.R.S[18], the RDP pyroseqeuncing tools[19], and QIIME (pronounced “chime”)[20]. QIIME is a free, open-source platform for the analysis of high-throughput sequencing data that enables users to import raw sequence data and readily produce measures of inter- and intra-sample diversity. Consistency in the identification of operational taxonomic units (OTUs) and establishing agreed-upon measures of diversity within and between samples are crucial for the comparison of results across studies, although the concept of OTU is increasingly problematic as sequence data accumulate and explicitly phylogenetic approaches gain in popularity.
Beta diversity refers to the measurement of the degree of difference in community membership or structure between two samples. A recent review of taxon-based measurements of beta diversity found that some metrics, including Canberra and Gower distances, have increased power for discriminating clusters, while other metrics, such as chi-squared and Pearson correlation distances, are more appropriate for elucidating the effects of environmental gradients on communities[21]. A robust method for comparing the differences between microbial communities is UniFrac, which measures the proportion of shared branch lengths on a phylogenetic tree between samples[22]. Highly similar microbial communities result in UniFrac scores near 0, while two completely independent communities that do not share any branch length (i.e. they have a different evolutionary history) would result in a UniFrac score of 1. Principal coordinates analysis (PCoA) can then visualize the Unifrac distances between samples in two-dimensional or three-dimensional space, allowing for the clustering of similar communities or separation of distinct communities to be easily distinguished visually.
UniFrac as a measure of beta diversity, coupled to PCoA, has the ability to distinguish differences between communities utilizing as little as 10 sequences per sample[23]. It is important to recognize that increased sequencing depth is not always necessary to recover biologically meaningful results when those results are obvious. Thus, by choosing diversity measurements that are appropriate for a study design, researchers utilizing modern sequencing methods are able to characterize differences between samples at relatively low sequence coverage. This enables researchers to assess fine-grained spatial and temporal patterns by characterizing hundreds to thousands of samples, such as timeseries across multiple patients or environments. The functionality of UniFrac, as well as a multitude of diversity measurements are available in QIIME and can be readily compared.
In general, pipelines for analyzing 16S rRNA and shotgun metagenomic data have separate workflows. Some initial steps, such as demultiplexing (removing barcodes from and separating pooled samples) and quality filtering, are common to both pipelines. However, for 16S rRNA data, sequences must be grouped into OTUs, chimeric sequences generated by incomplete template extension must be removed, and phylogenetic trees must be constructed. In contrast, in the metagenomic pipeline, sequences must be assigned to functions as well as to taxonomy (either as whole reads or after assembly). Once taxon or gene function tables are constructed, the pipelines begin to converge, at least conceptually: the interest is then in 1) the composition of each sample, 2) finding the taxa or functions that discriminate among groups of samples (e.g. according to clinical parameters), and 3) in asking whether the samples cluster according to any measured clinical states (or according to time). One exciting emerging direction is comparing metagenomic and 16S rRNA clustering directly using a technique called Procrustes analysis that allows the PCoA plots to be combined[24]. Another powerful tool is the use of machine learning and statistical techniques to build predictive models of taxa[25] or functions[26] that discriminate between groups of samples.
A unique advantage of QIIME relative to other pipelines is its ability to exploit “sample metadata”, e.g. clinical information about subjects, to produce visualizations that make the main patterns in the data immediately apparent. Of particular interest, QIIME supports the MIMARKS (Minimum Information about a MARKer Sequence) standard[27] developed by the Genomic Standards Consortium[28], which is increasingly popular with other tools for microbial and community analysis such as MG-RAST[29], and has been adopted by the INSDC (International Nucleotide Sequence Database Consortium, which includes GenBank, EBI, and DDBJ) as the standard for metadata.
With these tools in hand, basic patterns of similarities and differences in the microbiota are now routine. The key challenge now is to extend analyses to include longitudinal studies and to understand the role of specific host and environmental factors in the development and maintenance of the microbiome.
Go to:
Dynamic interactions between human microbes and the environment
The gastrointestinal (GI) tract of a human infant provides a brand new environment for microbial colonization[30]. Indeed, the microbiota that an infant begins to acquire depends strongly on mode of delivery[31]. Twenty minutes after birth, the microbiota of vaginally delivered infants resembles the microbiota of their mother's vagina, while infants delivered via Cesarean section harbor microbial communities typically found on human skin[32]. The acquisition of microbiota continues over the first few years of life, as an infant's GI tract microbiome begins to resemble that of an adult as early as 1 year of life[33]. In one case-study following an infant's microbiota over the first 2.5 years of life, phylogenetic diversity increases significantly and linearly with time[34]. Additionally, significant changes in gut microbiota composition were apparent at five time points; starting a diet of breast milk, development of fever at day 92, introduction of rice cereal at day 134, introduction of formula and table foods at day 161, and antibiotic treatment and adult diet at day 371[34]. Interestingly, each dietary change was accompanied by changes in gut microbiota and the enrichment of corresponding genes. For example, as the infant began to receive a full adult diet, genes in the microbiome associated with vitamin biosynthesis and polysaccharide digestion became enriched[34].
The interaction between the human microbiota and the environment is dynamic, with human microbes flowing freely onto the surfaces we interact with everyday. Fierer et al. showed that human fingertips can transfer signature communities of microbes onto keyboards and these communities strongly differentiate individuals [35]. PCoA plots showed that it was possible to determine which fingers were typing on which keys, and which individuals were using which keyboards: it was even possible to link a person's hand to the computer mouse they use with up to 95% accuracy when compared to a database of other hands[35]. Overall, this study showed that microbial communities are constantly being transferred between surfaces, and that a dynamic interaction exists between environmental microbiota and different human body sites.
Go to:
Intrapersonal microbial diversity
Another interesting question that we are just beginning to answer is how stable the microbiome within an individual is over time. By defining what constitutes normal temporal variation in an individual over time, we will be better able to quantify and understand changes in microbial communities that result from dietary and pharmaceutical interventions. In the longest timeseries study to date, Caporaso et al. sampled two individual's microbial communities in the gut, oral cavity, and left and right palms over 396 time points spanning 15 months[36]. Communities at different body sites were readily distinguishable from one another using 3-D PCoA plots over a one year time span, even though the community structure within a given site was highly variable[36]. The level of diversity is also different between body sites, with the mouth and gut harboring the most diverse communities[37]. Taken together, these studies show that an individual's microbiota represents a highly variable and compartmentalized ecosystem.
Overall, it has yet to be conclusively proven that individuals, or even body sites, harbor a “core” set of specific bacterial taxa. For example, the Meta-HIT consortium defined a “core” set of lineages as those that were present in half of the subjects studied, although essentially no genes were present in all subjects studied[3]. Of course, it is important to recognize that sampling depth may be critical for distinguishing taxa that are absent from those that are merely very rare; the dynamic range of microbial abundance is also quite large, and even within the Meta-HIT “core” genes, 2000-fold ranges of abundance were not uncommon. Proving that a taxon is completely absent in the gut is not possible with these types of studies, so core calculations should always carry with them a caveat about sequencing depth. Another factor to consider when defining diversity and a core is that methodological artifacts can greatly increase the apparent numbers of OTUs in a sample (and hence reduce the apparent fraction that is shared). Both sequencing error[38, 39] and issues related to alignment, especially multiple sequence alignment[40-43], can inflate the number of OTUs immensely. It is important to ensure that the same methodological procedures were used when performing estimates of the core in terms of the fraction of individuals the core must be represented in, the minimum abundance, and the procedure for deciding which sequences count as “the same”. Finally, there is a key question about whether variation around a core is structured so that humans harbor only a few general types of microbiota profiles in a given body site: this is well established for the vagina[44] but more controversial in the gut[45]. In general, extreme caution must be applied when performing clustering procedures, as many will break up continuous variation into clusters where none exist[21]. Robust model selection procedures that incorporate the possibility that only continuous variation, not discrete clusters, exist remain to be developed within the context of microbial community analysis.
There is increasing evidence that individuals actually share a “core microbiome” rather than “core microbiota”. In a study of monozygotic and dizygotic twin pairs concordant for obesity or leanness, a subset of identifiable microbial genes, but not species, were shared between all individuals[15]. Remarkably, vastly different sets of microbial species yielded very similar functional KEGG pathways. However, deviations from this core microbiome were apparent in obese subjects, suggesting that it will be important to utilize metagenomic data in addition to determining microbial community composition with 16S marker gene studies when assessing differences between disease states. Understanding whether this principle holds true for other body sites will be fascinating; cross-biome metagenomic comparisons have been exceedingly rare to date[46, 47].