Home Virus Germs & Bacteria MetaRibo-Seq measures translation in microbiomes

MetaRibo-Seq measures translation in microbiomes

by World Health Now
0 comment


Mock community culturing

NR-2653 E. coli K-12 MG1655, NR-607 B. subtilis 168, and NR-45946 S. aureus RN4220 were obtained from BEI Resources. Bacteroides thetaiotaomicron VPI 5482 was obtained from ATCC (ATCC 29148). E. coli, B. subtilis, and S. aureus were grown individually in Luria-Bertani (LB) broth to an OD600 of 0.4 at 37 °C. Equal volumes of the bacteria were mixed thoroughly to create the three-member mock community. Metagenomics, metatranscriptomics, Ribo-Seq, MetaRibo-Seq, and proteomics were performed on this mixture. A second mock community was also created in which E. coli and B. thetaiotaomicron were grown anaerobically, both individually in Brain Heart Infusion (BHI) broth to an OD600 of 0.5 at 37 °C. Equal volumes of these bacteria were mixed to create a two-member mock community. Metatranscriptomics, MetaRibo-Seq, and proteomics were performed on this mixture.

Mock community metagenomics

Aliquots (25 mL) of the two mock communities were centrifuged in 50 mL tubes at 4000 × g at room temperature for 30 min. DNA was extracted from cellular pellets with DNA Stool Mini Kit (Qiagen) using the manufacturer’s protocols. Samples were then exposed to bead beating for 3 min at room temperature. One nanogram of DNA was used to create Nextera XT libraries according to the manufacturer’s instructions (Illumina).

Mock community MetaRibo-Seq

Aliquots (50 mL) of the community were centrifuged in 50 mL tubes at 4000 × g at room temperature for 30 min. Cell pellets were resuspended in 700 μL of RNAlater and stored at −80 °C for 1 week. These cells (150 mg) were suspended in 600 μL Qiagen RLT lysis buffer supplemented with 1% beta-mercaptoethanol, 0.3 U/μL Superase-In (Invitrogen), and 1.55 mM of chloramphenicol. This mixture was incubated at room temperature for 5 min. The suspension was subjected to bead beating for 3 min using 1.0 mm Zirconia/Silica beads. This was performed with a MiniBeadBeater-16, Model 607. The lysed solution was centrifuged at room temperature for 3 min at 21,000 × g to pellet cellular debris, and the supernatant was extracted to 2 mL tubes. The lysis supernatant was subjected to ethanol precipitation with 0.1% volume of 3 M sodium acetate and 2.5 M volumes of 100% ethanol. To precipitate, samples were incubated at −80 °C for 30 min, then centrifuged at 21,000 × g for 30 min at 4 °C. The pellet of RNA and RNA-protein complexes was resuspended in MNase buffer. The buffer contained 25 mM Tris pH 8.0, 25 mM NH4Cl, 10 mM MgOAc, and 1.55 mM chloramphenicol. One microliter of solution was diluted 20-fold and quantified with Qubit dsDNA HS Assay Kit (Invitrogen). MNase reaction mix was prepared as described20, except this was scaled down to an input of 80 μg of RNA and 1 μL of NEB MNase 500 U/μL in a total reaction volume of 200 μL. The MNase reaction was incubated at room temperature for 2 h. All following steps were performed identically20, except the tRNA removal steps were excluded. Briefly, 500 mL of polysome binding buffer was used to wash the Sephacryl S400 MicroSpin columns (GE Healthcare Life Sciences) three times—spinning the column for 3 min at 4 °C at 600 r.p.m. Polysome binding buffer consisted of 100 μL Igepal CA-630, 500 μL magnesium chloride at 1 M, 500 μL egtazic acid (EGTA) at 0.5 M, 500 μL of NaCl at 5 M, 500 μL Tris-HCl pH 8.0. at 1 M, and 7.9 mL of RNase-free water. The MNase reaction was applied to the column and centrifuged for 5 min at 4 °C. The flow-through was purified further with miRNAeasy Mini Kit (Qiagen) using the manufacturer’s protocols. Elution was performed at 15 μL volume. rRNA was depleted using RiboZero-rRNA Removal Kit for Bacteria (Illumina) using the manufacturer’s protocol, except all reaction volumes and amounts were reduced by 50%. This was purified with RNAeasy MinElute Cleanup Kit (Qiagen), eluting in 20 μL of water. The reaction, in 18 μL volume, was subjected to T4 PNK Reaction (NEB M0201S) with the addition of 1 μL Superase-In (Invitrogen), 2.2 μL 10× T4 PNK Buffer, and 1 μL T4 PNK (10 U/μL). This reaction was purified again with RNAeasy MinElute Cleanup (Qiagen). The concentration was determined with Qubit RNA HS Assay Kit (Illumina). With 100 ng of RNA as input, libraries were prepared using NEBNext Small RNA Library Prep Set for Illumina (NEB, E7330), using the manufacturer’s protocols. DNA was purified using Minelute PCR Purification Kit (Qiagen). Libraries were sequenced with 1 × 75 bp reads on a NextSeq 500.

Mock community Ribo-Seq

Before harvesting mock community 1, it was treated with 0.1 mg of chloramphenicol per mL of culture. After 2 min, 50 mL aliquots of the community were centrifuged in 50 mL tubes at 4000 × g at room temperature for 30 min. Cell pellets were resuspended in 500 μL Ribo-Seq lysis buffer20 (25 mM Tris pH 8.0, 25 mM NH4Cl, 10 mM MgOAc, 0.8% Triton X-100, 100 U/mL RNase-free DNase I, 0.3 U/μL Superase-In, 1.55 mM Chloramphenicol, and 17 μΜ 5′-guanylyl imidodiphosphate). Lysis was performed using bead beating for 3 min in this lysis buffer. Twenty-five A260 units of RNA, measured using Nanodrop 2000, were treated with 6000U of MNase for 2 h at room temperature using MNase buffer to dilute as necessary. Five hundred milliliters of polysome binding buffer (100 μL Igepal CA-630, 500 μL magnesium chloride at 1 M, 500 μL EGTA at 0.5 M, 500 μL of NaCl at 5 M, 500 μL Tris-HCl pH 8.0. at 1 M, and 7.9 mL of RNase-free water) was used to wash a Sephacryl S400 MicroSpin column (GE Healthcare Life Sciences) three times—spinning the column for 3 min at 4 °C at 600 × g. The MNase reaction was applied to the column and centrifuged for 5 min at 4 °C. The flow-through was collected and was then purified further with miRNAeasy Mini Kit (Qiagen) according to the manufacturer’s protocols, and the final sample was eluted from the miRNAeasy column in a volume of 15 μL in water. The sample was then taken forward for rRNA depletion using the MICROBExpress™ Bacterial mRNA Enrichment Kit (Invitrogen) according to the manufacturer’s protocols. This reaction was purified with RNAeasy MinElute Cleanup Kit (Qiagen) using the manufacturer’s protocols, eluting in 20 μL of water. The reaction, in 18 μL volume, was subjected to T4 PNK Reaction (NEB M0201S) with the addition of 1 μL Superase-In (Invitrogen), 2.2 μL 10× T4 PNK Buffer, and 1 μL T4 PNK (10 U/μL) for 1 h at 37 °C. This reaction was purified again with RNAeasy MinElute Cleanup (Qiagen) according to the manufacturer’s protocols and the final sample was eluted in 10 μL of water. The final concentration of RNA was determined with Qubit RNA HS Assay Kit (Illumina). With 100 ng of RNA as input, libraries were prepared using NEBNext Small RNA Library Prep for Illumina (NEB, E7330), according to the manufacturer’s protocols. DNA libraries were purified using Minelute PCR Purification Kit (Qiagen) using the manufacturer’s protocols. Libraries were sequenced with 1 × 75 bp reads on a NextSeq 550.

Mock community metatranscriptomics

Aliquots (50 mL) of the community were centrifuged in 50 mL tubes at 4000 × g for 30 min at room temperature. Cell pellets were resuspended in RNA-Seq lysis buffer (25 mM Tris pH 8.0, 25 mM NH4Cl, 10 mM MgOAc, 0.8% Triton X-100, 100 U/mL RNase-free DNase I, and 0.3 U/μL Superase-In). Lysis was performed using bead beating for 3 min in this lysis buffer. The mixture was centrifuged at 21,000 × g for 3 min at room temperature and the supernatant was collected. An equal volume of Phenol/Chloroform/Isoamyl Alcohol 25:24:1 (pH. 5.2) was applied and the sample was vortexed for 3 min. The mixture was centrifuged at 21,000 × g for 3 min at room temperature. The aqueous phase was extracted. This Phenol/Chloroform/Isoamyl Alcohol step was repeated once more. The final aqueous phase was ethanol precipitated using 2.5 volumes ethanol and 0.1 volumes sodium acetate. The resulting pellet was resuspended in 100 μL of water. The RNA was further purified using the RNAeasy Mini plus Kit (Qiagen) according to the manufacturer’s protocols. Any remaining DNA was degraded via Baseline-ZERO-DNase (Epicentre) according to the manufacturer’s protocols. RNA was fragmented for 15 min at 70 °C using RNA Fragmentation Reagent (Ambion) according to the manufacturer’s protocols. At this point, the MetaRibo-Seq and small metatranscriptomics protocol completely converge. The fragmented RNA was purified with miRNAeasy Mini Kit (Qiagen) according to the manufacturer’s protocols and rRNA was eluted in a final volume of 15 μL of water. The resultant RNA was taken forward for rRNA depletion using the MICROBExpress™ Bacterial mRNA Enrichment Kit (Invitrogen), which was used according to the manufacturer’s protocols. The resultant rRNA-depleted RNA was purified with an RNAeasy MinElute Cleanup Kit (Qiagen), eluting in 20 μL of water. The resulting RNA fragments, in 18 μL volume, were subjected to T4 PNK Reaction (NEB M0201S) with the addition of 1 μL Superase-In (Invitrogen), 2.2 μL 10× T4 PNK Buffer, and 1 μL T4 PNK (10U/μL) for 1 h at 37 °C. This reaction was purified again with RNAeasy MinElute Cleanup (Qiagen) according to the manufacturer’s protocols. The final concentration of purified RNA was determined with Qubit RNA HS Assay Kit (Invitrogen). With 100 ng as input, libraries were prepared using NEBNext Small RNA Library Prep Set for Illumina (NEB, E7330), using the manufacturer’s protocols. DNA was purified using MinElute PCR Purification Kit (Qiagen) according to the manufacturer’s protocols. Libraries were sequenced with 1 × 75 bp reads on a NextSeq 500.

Mock community metaproteomics

Aliquots of the community (50 mL) were centrifuged in 50 mL tubes at 4000 × g at room temperature for 30 min. The cell pellet was resuspended in 2% SDS, 100 mM dithiothreitol (DTT), and 20 mM Tris HCl, pH 8.8 with protease inhibitor. These cells were subjected to bead beating for 3 min. The samples were then centrifuged for 3 min and the clarified lysate supernatant was collected. Lysate was prepared using Filter aided Sample Preparation (FASP)35 with the same minor modifications previously documented22. Every following step involved a centrifugation step for 15 min at 14,000 × g. Protein concentrations were measured using Nanodrop 2000. Samples were diluted tenfold in 8 M urea and loaded into Microcon Ultracel YM-30 filtration devices (Millipore). They were washed in 8 M urea, reduced for 30 min in 10 mM DTT, and alkylated in 50 mM iodoacetamide for 20 min. Samples were washed three times in 8 M urea and two times in 50 mM ammonium bicarbonate. Trypsin (Pierce 90057) (1:100 enzyme-to-protein ratio) was added and incubated overnight at 37 °C. Into a new collection tube, samples were centrifuged and further eluded in 50 μL of 70% acetonitrile and 1% formic acid. The mixture was brought to dryness for 1 h using a Savant SPD121P SpeedVac concentrator at 30 °C, then resuspended in 0.2% formic acid22.

Metaproteomics

These methods apply to all metaproteomics performed in this work (including mock communities and fecal communities). LC-MS/MS analysis was performed by the Stanford University Mass Spectrometry Facility using the Thermo Orbitrap Fusion Tribrid. A Thermo Scientific Orbitrap Fusion coupled to a nanoAcquity UPLC system (Waters, M Class) was used to collect mass spectra (MS). Samples were loaded on a 25-cm sub 100-μm C18 reverse phase column packed in-house with a 80-min gradient at a flow rate of 0.45 µL/min. The mobile phase consisted of: A (water containing 0.2% formic acid) and B (acetonitrile containing 0.2% formic acid). A linear gradient elution program was used: 0–45 min, 6–20% (B); 45−60 min, 35% (B); 60−70 min, 45% (B); 70−71 min, 70% (B); 71−77 min, 95% (B); 77−80 min, 2% (B). Ions were generated using electrospray ionization in positive mode at 1.6 kV. MS/MS spectra were obtained using collision-induced fragmentation (CID) at a setting of 35 of arbitrary energy. Ions were selected for MS/MS in a data-dependent, top 15 format with a 30-s exclusion time. Scan range was set to 400–1500 m/z. Typical orbitrap mass accuracy was below 2 p.p.m., for analysis. A 12-p.p.m. window was allowed for precursor ions and 0.4 Da for the fragment ions for CID fragmentation detected in the ion trap. Prokka-predicted36 proteins were used as a reference database for protein detection using the Byonic proteomics search pipeline v 2.10.5 37. Byonic parameters include: spectrum-level FDR auto, digest cutter C-terminal cutter, peptide termini semi-specific, maximum number of missed cleavages 2, fragmentation type CID low energy, precursor tolerance 12.0 p.p.m., fragment tolerance 0.4 p.p.m., protein FDR cutoff 1%. These methods were performed by the Stanford Mass Spectrometry Facility (SUMS). Using spectral count output, normalized spectral abundance factor was calculated by in-house scripts.

Subject recruitment

MetaRibo-Seq was performed on fecal samples from individuals from a variety of health states. Informed consent was obtained for all participants. None of the participants received bacterial translation inhibitors. All subjects were recruited at Stanford University as a part of one of three IRB-approved protocols for tissue biobanking and clinical metadata collection (PIs: Dr. Ami Bhatt, Dr. Victor Henderson, Dr. David Miklos).

Fecal sample storage

Stool was immediately stored in 2 mL cryovials and frozen at −80 °C. Stool was not thawed until lysis. For RNA extraction applications, 1.3 g of fecal samples were preserved in 700 μL of RNALater (Ambion) at −80 °C.

Cell lysis for Ribo-Seq, metatranscriptomics, and MetaRibo-Seq

Stool (150 mg) was suspended in 600 μL Qiagen RLT lysis buffer supplemented with 1% beta-mercaptoethanol and 0.3 U/μL Superase-In (Invitrogen). For MetaRibo-Seq lysis, 1.55 mM of chloramphenicol was also added to this lysis solution, and the solution was incubated at room temperature for 5 min. The suspension was subjected to bead beating for 3 min using 1.0 mm Zirconia/Silica beads. This was performed with a MiniBeadBeater-16, Model 607. The lysed solution was centrifuged at room temperature for 3 min at 21,000 × g to pellet cellular debris, and the supernatant was extracted to 2 mL tubes.

Metagenomics

DNA was extracted from fecal samples with DNA Stool Mini Kit (Qiagen) using the manufacturer’s protocols. Samples were exposed to bead beating for 3 min. One nanogram of DNA was used to create Nextera XT libraries according to the manufacturer’s instructions (Illumina). Metagenomic libraries were sequenced with 2 × 101 bp reads on an Illumina HiSeq 4000 instrument.

MetaRibo-Seq

The lysis supernatant was subjected to ethanol precipitation with 0.1% volume of 3 M sodium acetate and 2.5 M volumes of 100% ethanol. To precipitate, samples were incubated at −80 °C for 30 min, then centrifuged at 21,000 × g for 30 min at 4 °C. This was a rough purification specifically implemented to enable suspension of concentrated RNA from reasonable input of fecal sample. The pellet of RNA and RNA−protein complexes was resuspended in MNase buffer. The buffer contained 25 mM Tris pH 8.0, 25 mM NH4Cl, 10 mM MgOAc, and 1.55 mM chloramphenicol. To resuspend, we quickly broke the pellet apart with a pipette tip and vortexed for 15 s. One microliter of solution was diluted 20-fold and quantified with Qubit dsDNA HS Assay Kit (Invitrogen). MNase reaction mix was prepared as described20, except this was scaled down to an input of 80 μg of RNA and 1 μL of NEB MNase 500 U/μL in a total reaction volume of 200 μL. The MNase reaction was incubated at room temperature for 2 h. All following steps were performed identically20, except the tRNA removal steps were excluded. Briefly, 500 mL of polysome binding buffer was used to wash the Sephacryl S400 MicroSpin columns (GE Healthcare Life Sciences) three times—spinning the column for 3 min at 4 °C at 600 r.p.m. Polysome binding buffer consisted of 100 μL Igepal CA-630, 500 μL magnesium chloride at 1 M, 500 μL EGTA at 0.5 M, 500 μL of NaCl at 5 M, 500 μL Tris-HCl pH 8.0. at 1 M, and 7.9 mL of RNase-free water. The MNase reaction was applied to the column and centrifuged for 5 min at 4 °C. The flow-through was purified further with the miRNAeasy Mini Kit (Qiagen) using the manufacturer’s protocols. Elution was performed at 15 μL volume of water. rRNA was depleted using RiboZero-rRNA Removal Kit for Bacteria (Illumina) using the manufacturer’s protocol, except all reaction volumes and amounts were reduced by 50%. This was purified with RNAeasy MinElute Cleanup Kit (Qiagen), eluting in 20 μL of water. The reaction, in 18 μL volume, was subjected to T4 PNK Reaction (NEB M0201S) with the addition of 1 μL Superase-In (Invitrogen), 2.2 μL 10× T4 PNK Buffer, and 1 μL T4 PNK (10U/μL). This reaction was purified again with RNAeasy MinElute Cleanup (Qiagen). The concentration was determined with Qubit RNA HS Assay Kit (Illumina). With 100 ng as input, libraries were prepared using NEBNext Small RNA Library Prep Set for Illumina (NEB, E7330), using the manufacturer’s protocols. DNA was purified using Minelute PCR Purification Kit (Qiagen). Libraries were sequenced with 1 × 75 bp reads on a NextSeq 500.

Small metatranscriptomics of fecal samples

We performed metatranscriptomics as follows: 15 μL of proteinase K (Ambion, 20 mg/mL) was added to 600 μL of lysate. After incubation for 10 min at room temperature, samples were centrifuged at 21,000 × g for 3 min and the supernatant was collected. An equal volume of Phenol/Chloroform/Isoamyl Alcohol 25:24:1 (pH. 5.2) was applied and vortexed for 3 min. The mixture was centrifuged at 21,000 × g for 3 min. The aqueous phase was extracted. This phenol chloroform step was repeated once more and the aqueous phase was extracted. This final aqueous phase was ethanol precipitated with 0.1% volume of 3 M sodium acetate and 2.5 M volumes of 100% ethanol. The resulting pellet was resuspended in 100 μL of water. The RNA was further purified using the RNAeasy Mini plus Kit (Qiagen) using the manufacturer’s protocols. Any remaining DNA was degraded via Baseline-ZERO-DNase (Epicentre) using the manufacturer’s protocols. RNA was fragmented for 15 min at 70 °C using RNA Fragmentation Reagent (Ambion) using the manufacturer’s protocols. At this point, the MetaRibo-Seq and small metatranscriptomics protocol completely converge. The fragmented RNA was purified with the miRNAeasy Mini Kit (Qiagen) using the manufacturer’s protocols. Elution was performed at 15 μL of water. rRNA was depleted using RiboZero-rRNA Removal Kit for Bacteria (Illumina) using half reactions of the manufacturer’s protocol. This was purified with the RNAeasy MinElute Cleanup Kit (Qiagen), eluting in 20 μL of water. The fragments, in 18 μL volume, were subjected to T4 PNK Reaction (NEB M0201S) with the addition of 1 μL Superase-In (Invitrogen), 2.2 μL 10× T4 PNK Buffer, and 1 μL T4 PNK (10 U/μL). This reaction was purified again with RNAeasy MinElute Cleanup (Qiagen) using the manufacturer’s protocols. The concentration was determined with Qubit RNA HS Assay Kit (Invitrogen). With 100 ng as input, libraries were prepared using NEBNext Small RNA Library Prep Set for Illumina (NEB, E7330), using the manufacturer’s protocols. DNA was purified using the MinElute PCR Purification Kit (Qiagen) using the manufacturer’s protocols. Libraries were sequenced with 1 × 75 bp reads on a NextSeq 500.

Differential centrifugation and FASP for metaproteomics

To remove human proteins, fecal samples were subjected to differential centrifugation. One hundred milligrams of fecal sample was suspended in 1× phosphate-buffered saline (PBS) in 1.7 mL Eppendorf tubes. The tubes were centrifuged at 600 × g for 1 minute at room temperature. The supernatant was collected in a clean Eppendorf tube and centrifuged at 10,000 × g for 1 minute at room temperature. The supernatant was decanted and the pellet was resuspended in 1 mL of PBS. The process was repeated once more. The final pellet was resuspended in 2% SDS, 100 mM DTT, and 20 mM Tris HCl, pH 8.8 with protease inhibitor. These cells were subjected to bead beating for 3 min with a MiniBeadBeater-16, Model 607. Zirconia/silica beads (1 mM) were used. Tubes were centrifuged for 3 min and clarified lysate in the supernatant was collected. Lysate was prepared using FASP35 with the same minor modifications previously documented22. Every step involved a centrifugation step for 15 min at 14,000 × g. Samples were diluted tenfold in 8 M urea and loaded into Microcon Ultracel YM-30 filtration devices (Millipore). They were washed in 8 M urea, reduced for 30 min in 10 mM DTT, and alkylated in 50 mM iodoacetamide for 20 min. Samples were washed three times in 8 M urea and two times in 50 mM ammonium bicarbonate. Trypsin (Pierce 90057) (1:100 enzyme-to-protein ratio) was added and incubated overnight at 37 °C. Into a new collection tube, samples were centrifuged and further eluded in 50 μL of 70% acetonitrile and 1% formic acid. The mixture was brought to dryness for 1 h using a Savant SPD121P SpeedVac concentrator at 30 °C, then resuspended in 0.2% formic acid22.

De novo assembly

Quality-trimmed metagenomic reads were assembled using metaSPAdes 3.7.0 38. For all samples, a maximum of 60 million metagenomic reads were used to generate assemblies. Samples sequenced to higher depth were randomly subsetted to 60 million for assembly purposes to both ensure relatively similar numbers of gene predictions and limit computational requirements in assembly and downstream predictions.

Read mapping, gene prediction and annotation

Reads were trimmed with trim galore version 0.4.0 using cutadapt 1.8.1 39 with flags –q 30 and –illumina. Reads were mapped to the annotated assembly using bowtie version 1.1.1 40. To avoid all possible conservation conflicts in downstream differential analysis, only perfect, unique short read alignments were considered. IGV41 was used to visualize coverage. Prokka v1.12 36 was used to predict genes from the metagenomics assemblies using the –meta option. Annotations were facilitated by many dependencies42,43,44,45. For small protein predictions, Prodigal42 was performed after lowering the size threshold from 90 bases to 15 bases.

Read density as a function of position

MetaRibo-Seq reads were mapped to their metagenomic assemblies. The assembly and aligned reads were analyzed with RiboSeqR46. Ribosome profiling counts for predicted coding sequences (CDSs) were determined with the sliceCounts function. CDSs were filtered to contain at least ten reads.

Taxonomic classification of technologies

Reads mapping specifically to Prokka-predicted36 coding regions were counted. We classified every predicted gene in these metagenomes using One Codex47. We determined the classification of reads based on the classification of the gene it mapped to. This enabled fair comparisons between technologies, as the small metatranscriptomics and MetaRibo-Seq reads can be too small to classify individually with k-mer-based approaches. Though metagenomic reads were long enough to be classified directly, they were also subject to the same analysis. Thus, all taxonomy plots represent entire gene classifications and are dependent on the assembly.

Differential analysis

The number of reads mapping to a given region was calculated with BEDtools multicov version 2.27.1 48. Strandedness was enforced for metatranscriptomics and MetaRibo-Seq. All differential analyses were performed using these counts with all conditions performed in duplicate via DESeq249. A gene was considered differential if it had log2fold change above 1 or below −1, while also reaching an FDR < 0.05. Results were reported as tables. In the case of translational regulation (sample E compared to sample E2), we modified the model to control for RNA levels (design = ~samplegroups + samplegroups:type,). Heatmaps were created using gplots50. Reads per kilobase million calculations were performed using in-house scripts.

Statistical analysis

All Pearson correlations were calculated in R using the Hmisc package51. Scatterplots were created with ggplot2 52. For all scatterplots and histograms shown, replicates reads were combined and treated as a single sample. Significance between Pearson’s r was assigned via cocor53. Significant differences between RPKM values were assigned using the Kolmogorov−Smirnov test. Significance was assigned as *p value < 0.05, ***p value < 0.001. Zou’s54 95% confidence intervals were considered significant (assigned as ***) if there is no overlap with 0 in the interval.

Protein clustering analysis for MetaRibo-Seq vs. transcriptomics

For analyses independent of gene annotation, proteins that were translated at different levels than transcribed, discussed in the differential analyses methods, were clustered using Cd-hit55 with 70% amino acid identity. Representative sequences were input into Blast2GO56 using the nr database.

Triplet periodicity analysis

Using the same default parameters as read density as a function of position, triplet periodicity was called using RiboSeqR46. To analyze triplet periodicity of specific genera, assembled contigs were classified using One Codex47. Contigs that classified into a specific genus were binned together. Only reads mapping specifically to these bins were considered.

Clustering of small proteins from HMP-I-II

Contigs from the 1773 HMPI-II metagenomes containing at least 5 Mbp of total contig sequence were downloaded from https://www.hmpdacc.org/hmasm2. MetaProdigal42, using a cutoff length of 15 bp, was used to predict genes. Small ORFs, encoding potential proteins 5−50 amino acids including start and stop codons, were considered. These small proteins were clustered using CD-Hit55 with parameters: -n 2, -p 1, -c 0.5, -d 200, -M 50000, -l 5, -s 0.95, -aL 0.95, -g 1. This resulted in 444,054 clusters, which were identical to those previously generated.

Identifying homologs of the ~444,054 clusters in samples A−E

For samples A−E, MetaProdigal42, using a cutoff length of 15 bp, was used to predict genes. Small ORFs, encoding potential proteins 5−50 amino acids including start and stop codons, were considered. Small proteins predicted in samples A−E were queried against representatives of each of the ~444,000 clusters, using BLASTp43 with word-size of 2. Hits were considered significant if: e value ≤ 0.05 and the length of the hit was between 90 and 110% of the length of the small protein.

Demonstrating protein synthesis of small gene families

For each sample (A−E), we considered all predicted genes, including these small genes. The total number of MetaRibo-Seq reads mapping to all of these genes was calculated. As previously described in the “Methods” section, bowtie 1.1.1 40 was used to map reads. The number of reads mapping to a given region was calculated with BEDtools multicov version 2.27.1 48. Strandedness was enforced. We calculated RPKM for all of these genes for each sample (A−E). If a given small protein demonstrates translation (MetaRibo-Seq RPKM > 10) and is homologous to one of the ~444,000 potential small gene families, we considered this evidence of protein synthesis of these small gene families.

Small protein statistical analysis

To test for enrichment in proportion of predictions with protein domains across our assigned confidence levels, hypergeometric distribution tests were performed.

Assessment of homology between small protein families

Using BLASTp43, we blasted the 2091 small protein families containing homologs with MetaRibo-Seq signal (RPKM > 10) against the initial 4000 gene families proposed previously. We defined rapid evolution as instances in which any homolog of the 2091 small protein clusters significantly hit (e value ≤ 0.05) representative protein sequences for the initially proposed 4000 small gene families.

Taxonomic classification of small protein families

Contigs containing any homolog of the 2091 small protein families were classified using Kraken2 v2.0.8 57 with a custom database constructed from RefSeq58 and GenBank59. To visualize the classifications within each small protein family, Krona60 was utilized.

Genomic neighborhood analysis of small protein families

MetaProdigal42 was used to annotate genes on contigs containing any homologs of the 2091 small protein families. Amino acid sequences of genes that are at maximum ten genes away from the small protein along these contigs were searched against the Conserved Domain Database (CDD)61, using RPS-blast43. A hit was considered significant if: e value ≤ 0.05 and the protein aligns to at least 80% of the PSSM’s length.

Cellular localization of small protein families

This was performed on all proteins within the 2091 small protein families proposed. To predict if these proteins are secreted, SignalP-5.0 62 was run with default parameters both with “gram+” and “gram−”. To predict if these proteins are transmembrane, TMHMM63 was run on the same set of proteins with default parameters. A small protein family was considered transmembrane/secreted if ≥80% of the members were predicted to be such.

Antimicrobial peptide identification

AmPEP64 was applied (default parameters) on representatives of the 2091 small protein families.

Guidelines for extraction of all contigs associated with a specific family of interest

We provide these small protein families at a DNA and amino acid level in Supplementary Data 2. If you would like to extract the contigs these regions are predicted from, please follow instructions previously presented under “Guidelines for extraction of all contigs associated with a specific family of interest”4. Additionally, we provide krona plots for each family (2091 .html files in total) in which all contigs for the family were taxonomically classified and can be interactively viewed.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.



Source link

You may also like

Leave a Comment