Background & Summary

Forests, and especially unmanaged natural forests, accumulate and store large amounts of carbon (C)1. A substantial fraction of this C stock – 73 ± 6 Pg or 8% of the total global forest C stock is contained within deadwood2. This C pool is transient because during its transformation by saprotrophic organisms most C is liberated as CO2 into the atmosphere3, while the rest is sequestered in soils as dissolved organic C, within microbial biomass or as a part of the soil organic matter – along with other nutrients4. Fungi in deadwood appear to be major decomposers using extracellular enzymes for the decomposition of recalcitrant plant biopolymers as shown in associated study5. Fungi also determine the bacterial community composition6,7. Bacterial fixation of atmospheric N2 was shown to substantially contribute to the nitrogen (N) increase in deadwood during decomposition5,8,9. In addition to bacteria and fungi, deadwood also hosts a suite of other organisms including archaea, viruses, protists, nematodes and insects, whose roles in deadwood are so far unknown. In order to understand the deadwood as a dynamic habitat, it is necessary to describe the composition of associated microorganisms with an emphasis on the major groups – fungi and bacteria – whose ecologies are often genus-specific10. Further, it is important to link deadwood-associated organisms to processes occurring at different stages of decomposition either by characterization of isolates7,11 or by cultivation-independent techniques.

In this Data Descriptor we present the comprehensive datasets of DNA and RNA-derived data and sample metadata to characterize deadwood organisms and their activity at various stages of decomposition (Table 1, Supplementary Table 1). The data derived from DNA representing the community composition and genomic potential, include 16S rRNA gene sequences and ITS2 sequences, metagenomics reads, metagenome assembly and bacterial metagenome-assembled genomes (MAGs12). The data derived from RNA are represented by the total RNA reads whose majority originates from ribosomal RNA and which are taxonomically assignable and thus can be used as a proxy for the PCR-unbiased view of community composition. Further, data contain metatranscriptome raw reads and assembly that represent the processes occurring in deadwood. The dataset characterizes decomposing trunks of the European beech (Fagus sylvatica L.) in a beech-dominated natural forest in the temperate Europe (Fig. 1). The metagenome was assembled from 25 DNA samples of deadwood with decomposition time ranging from young wood (<4 years since tree death) to almost completely decomposed wood (>41 years of decomposition). It was possible to perform the resolving of 58 high-quality metagenome assembled genomes (MAGs) with a total of 19.5 × 103 contigs spanning 10 bacterial phyla including those that are difficult to culture, such as Acidobacteria, Patescibacteria, Verrucomicrobia and Planctomycetes (Fig. 2, Supplementary Table 2). 16S rRNA gene and ITS2 amplicon data contribute to comparison of microbial diversity and occurrence patterns at the global scale using public databases GlobalFungi13 or Earth Microbiome14 (Supplementary Fig. 1). Deadwood metatranscriptome was assembled from 10 RNA samples spanning two age classes of decomposing deadwood (between 4 and 19 years old). The amount of raw and assembled data in individual data packages is summarised in the Table 2. Overview of data previously used to describe complementarity of fungal and bacterial roles in deadwood is specified in Data Records summary5.

Table 1 Sample metadata for 25 samples used for sequencing.
Fig. 1
figure 1

Study workflow and sequencing data sources. Available data packages are in bold. The age class 1 was <4 years since tree death, class 2 4–7 years, class 3 8–19 years, class 4 20–41 years and class 5 > 41 years (n = 5 per age class).

Fig. 2
figure 2

Phylogenetic tree of 58 high-quality MAGs based on set of bacterial single copy genes. Phyla (or classes of Proteobacteria) are color-coded, tree tips are labelled with order taxonomy obtained from GTDB database. Boxplots represent completeness and redundancy values of MAGs.

Table 2 Raw read counts in individual data packages (mean ± SE) and statistics of metagenome and metatranscriptome assembly.

The previous studies devoted to deadwood have not seen such a comprehensive set of information about the associated biota. These data significantly improve the width and resolution and thus the understanding of the biodiversity of deadwood associated biota and its function. Given that natural forests represent an essential ecosystem concerning C storage and nutrient cycling, the data within this Data Descriptor make it possible to fully appreciate the ecosystem-level roles that deadwood plays in forest ecosystems.

Methods

Study area and sampling

Deadwood was sampled in the core zone of the Žofínský prales National Nature Reserve, an unmanaged forest in the south of the Czech Republic (48°39′57″N, 14°42′24″E) as described earlier in the associated study5. The core zone had never been managed and any human intervention stopped in 1838 when it was declared as reserve. It thus represents a rare fragment of European temperate virgin forest left to spontaneous development. The reserve is situated at 730–830 m a.s.l., bedrock is almost homogeneous and consists of finely to medium-grainy porphyritic and biotite granite. Annual average rainfall is 866 mm and annual average temperature is 6.2 °C15.

Previous analysis indicated that deadwood age (time of decomposition) significantly affects both wood chemistry and the composition of microbial communities16,17. We thus randomly selected dead tree trunks that represented age classes 1–5 assigned based on the decomposition length18. Each age class was represented by five logs of 30–100 cm diameter (Table 1). The age class 1 was <4 years since tree death, class 2 4–7 years, class 3 8–19 years, class 4 20–41 years and class 5 > 41 years (n = 5 per age class); only trees that were not alive and not decomposed before downing were considered. DNA was extracted from all logs. Due to sample-specific RNA extraction yields, RNA of sufficient amount and quality was extracted from the subset of logs (age classes 2 and 3). Sampling was performed in November 2016. The length of each selected log (or the sum of the lengths of its fragments) was measured and four samples were collected at the positions of 20%, 40%, 60% and 80% of the log length by drilling. This was performed vertically from the middle of the upper surface through the whole diameter using an electric drill with an auger diameter of 10 mm. The sawdust from all four drill holes from each log was pooled and immediately frozen using liquid nitrogen, transported to the laboratory on dry ice and stored at −80 °C until further processing.

Sample processing, DNA and RNA extraction

Sample characteristics as pH, C, N and water content were measured as described in the associated study5. Similarly, workflow of nucleic acid preparation, ligation and sequencing was described previously. Briefly, wood samples (approximately 10 g of material) were homogenized using a mortar and pestle under liquid nitrogen prior to nucleic acid extraction and thoroughly mixed. Total DNA was extracted in triplicate from 200 mg batches of finely ground wood powder using a NucleoSpin Soil kit (Macherey-Nagel).

Total RNA was extracted in triplicate from 200 mg batches of sample using NucleoSpin RNA Plant kit (Macherey-Nagel) according to manufacturer’s protocol after mixing with 900 μl of the RA1 buffer and shaking on FastPrep-24 (MP Biomedicals) at 6.5 ms−1 twice for 20 s. Triplicates were pooled and treated with OneStep PCR Inhibitor Removal kit (Zymo Research), DNA was removed using DNA-free DNA Removal Kit (Thermo Fisher Scientific). The efficiency of DNA removal was confirmed by the negative PCR results with the bacterial primers 515F and 806R19. RNA quality was assessed using a 2100 Bioanalyzer (Agilent Technologies).

Analysis of deadwood-associated organisms

To estimate the relative representation of deadwood-associated organisms in deadwood, total RNA was sequenced since the majority of the RNA represents either small subunit ribosomal RNA or large subunits ribosomal RNA that allows the identification of organisms by BLASTing against the curated databases from SILVA20,21. Read abundances represent the abundances of ribosomes of each taxon and thus reflect the abundance of each taxon. Libraries for high-throughput sequencing of total RNA were prepared using TruSeq RNA Sample Prep Kit v2 (Illumina) according to the manufacturer’s instructions, omitting the initial capture of polyA tails to enable total RNA to be ligated. Samples were pooled in equimolar volumes and sequenced on an Illumina HiSeq 2500 (2 × 250 bases) at Brigham Young University Sequencing Centre, USA.

Metatranscriptomics and metagenomics

For metatranscriptome analysis, the content of rRNA in RNA samples was reduced as described previously5,22 using a combination of Ribo-Zero rRNA Removal Kit Human/Mouse/Rat and Ribo-Zero rRNA Removal Kit Bacteria (Illumina). Oligonucleotide probes from both types of Ribo-Zero kits were mixed together and added to each sample which allowed their annealing to rRNA and subsequent rRNA removal. The efficiency of the removal was checked using a 2100 Bioanalyzer and removal was repeated when necessary. Reverse transcription was performed with SuperScript III (Thermo Fisher Scientific). Libraries for high throughput sequencing were prepared using the ScriptSeq v2 RNA-Seq Library Preparation Kit (Illumina) according to the manufacturer’s instructions with a final 14 cycles of amplification by FailSafe PCR Enzyme (Lucigen).

The NEBNext Ultra II DNA Library Prep Kit for Illumina (New England BioLabs) was used to generate metagenome libraries according to the manufacturer’s instructions. Samples of the metagenome and metatranscriptome were pooled in equimolar volumes and sequenced on an Illumina HiSeq 2500 (2 × 250 bases) at Brigham Young University Sequencing Centre, USA.

Metagenome assembly and annotation were performed as described previously5. Briefly, Trimmomatic 0.3623 and FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) were used to remove adaptor contamination, trim low-quality ends of reads and omit reads with overall low quality (<30), sequences shorter than 50 bp were omitted. Combined assembly of all 25 samples was performed using MEGAHIT 1.1.324. Metagenome sequencing yielded on average 22.5 ± 7.2 million reads per sample that were assembled into 17,936,557 contigs over 200 bp in length.

Metatranscriptome (MT) assembly and annotation were performed as described previously5. Trimmomatic 0.3623 and FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) were used to remove adaptor contamination, trim low-quality ends of reads and omit reads with overall low quality (<30), sequences shorter than 50 bp were omitted. mRNA reads were filtered from the files using the bbduk.sh 38.26 program in BBTools (https://sourceforge.net/projects/bbmap/). Combined assembly was performed using MEGAHIT 1.1.324. Metatranscriptome sequencing yielded on average 31.3 ± 9.1 million reads per sample that were assembled into 1,332,519 contigs over 200 bp in length.

Identification and analysis of metagenome-assembled genomes

Bins that represent prokaryotic taxa present in the metagenome were constructed using MetaBAT225 as described previously5 with default settings except for the minimal length of contigs set to 2000 bp, which produced bins with overall better statistics than the minimal 2500 bp size. CheckM 1.0.1126 served for assigning taxonomy and statistics to bins with lineage_wf pipeline. Bins with a completeness score greater than 50% were selected for quality improvement using RefineM according to the instructions of the developers27. Briefly, scaffolds with genomic properties (GC content, coverage profiles and tetranucleotide signatures) whose values were different from those expected in each bin were excluded. These values were calculated based on the mean absolute error and correlation criteria. Next, the refined bins were further processed to identify and remove scaffolds with taxonomic assignments different from those assigned to the bin. Lastly, the scaffolds that possessed 16S rRNA genes divergent from the taxonomic affiliation of the refined bins were removed. The taxonomy of the bins was inferred by GTDB-Tk28. 58 bins with quality scores >50 (CheckM completeness value - 5 × redundancy value) were considered metagenome-assembled genomes (MAGs) as defined by27 and deposited in the NCBI database.

GToTree v1.5.3929 together with Prodigal30, HMMER331, Muscle32, trimAI33, FastTree234 were used to infer phylogeny of MAGs based on set of 74 bacterial single-copy gene HMM profiles with minimal marker share >25%.

ITS2 and 16S rRNA gene amplicon sequencing and analysis

Subsamples of DNA were used to amplify the fungal ITS2 region using barcoded gITS7 and ITS4 primers35 and the hypervariable region V4 of the bacterial 16S rRNA gene using the barcoded primers 515F and 806R19 in three PCR reactions per sample. PCR premix for ITS2 or 16S rRNA gene metabarcoding contained 5 μl of 5× Q5 Reaction Buffer, 5 μl of 5× Q5 High GC Enhancer and 0.25 μl Q5 High-Fidelity DNA Polymerase (New England Biolabs), 1.5 μl of BSA (10 mg ml−1), 0.5 μl of dNTPs Nucleotide Mix 10 mM (Bioline), 1 μl of each primer, 9.75 μl of H2O and 1 μl of template DNA. PCR conditions of fungal amplification were 5 min at 94 °C, 30 cycles of (30 s at 94 °C, 30 s at 56 °C and 30 s 72 °C) and 7 min at 72 °C. PCR conditions of bacterial amplification were 4 min at 94 °C, 25 cycles of (45 s at 94 °C, 60 s at 50 °C and 75 s 72 °C) and 10 min at 72 °C.

Three PCR reactions were pooled together, purified by MinElute PCR Purification Kit (Qiagen) and mixed in equimolar amount according to concentration measured on the Qubit 2.0 Fluorometer (Thermo Fisher Scientific). Sequencing libraries were prepared using the TruSeq PCR-Free Kit (Illumina) according to manufacturer’s instructions and sequencing was performed in-house on Illumina MiSeq (2 × 250 bases).

The amplicon sequencing data were processed using the pipeline SEED 2.1.0536. Briefly, paired-end reads were merged using fastq-join37. Sequences with ambiguous bases and those with a mean quality score below 30 were omitted. The fungal ITS2 region was extracted using ITS Extractor 1.0.1138 before processing. Chimeric sequences were detected using USEARCH 8.1.186139 and deleted, and sequences were clustered using UPARSE implemented within USEARCH40 at a 97% similarity level. The most abundant sequences were taken as representative for each OTU. The closest fungal hits at the species level were identified using BLASTn 2.5.0 against UNITE 8.141. Where the best fungal hit showed lower similarity than 97% with 95% coverage, the best genus-level hit was identified. The closest bacterial hit from SILVA SSU database r13821 was found by DECIPHER 2.18.1 package42 using IDTAXA algorithm with threshold 6043. Sequences identified as nonfungal and nonbacterial were discarded.

Data Records

Data described in this study are summarized in the Supplementary Tables 1 and 2 together with the NCBI accession numbers. Raw sequencing reads (total RNA, metatranscriptomics and metagenomics), assembly files and resolved MAGs have been deposited under NCBI BioProject accession number PRJNA60324044. In the associated study5 metatranscriptome assembly together with raw reads mapping was used for annotation of microbial functions, total RNA raw reads were used to infer mainly fungal and bacterial taxonomic composition, metagenome assembly and raw reads mapping served solely for MAGs identification. Amplicon data of bacterial 16S rRNA gene and fungal ITS2 that were not published previously, have been deposited under NCBI BioProject accession number PRJNA67267445.

Technical Validation

Deadwood samples were taken aseptically by using sterilized equipment and sterile RNase and DNase-free tubes. RNA and DNA were extracted in an RNase free environment. During the library preparation quantity and quality of the nucleic acids were measured with a Qubit 2.0 Fluorometer and 2100 Bioanalyzer, respectively. PCR with bacterial primers 515F and 806R, negative control containing PCR-grade water and positive control containing extracted bacterial DNA was used to confirm the success of the DNase degradation of RNA samples. 2100 Bioanalyzer measurement was used to confirm successful rRNA depletion. No positive or negative sequencing controls were used to obtain metagenomic and metatranscriptomic data. For 16S rRNA gene amplification, negative and positive controls in the form of PCR-grade water and bacterial DNA, respectively were included. The concentration of the 16S rRNA gene amplicons and controls was measured with a Qubit 2.0 Fluorometer and their quality were analysed using agarose gel electrophoresis. Equimolar pooling of all barcoded sequencing libraries was done according to the quantification using KAPA Library Quantification Kit (Roche).

Usage Notes

The metagenome and metatranscriptome data described in this Data Descriptor were used to demonstrate the complementarity of fungal and bacterial functions in the carbon and nitrogen cycling in decomposing deadwood and linked them to corresponding biogeochemical processes5. However, the analysis on the deposited data packages in the associated study5 focused solely on fungi and bacteria despite the presence of other groups of organisms in the studied deadwood. The present deposition of the metagenome assembly and total RNA sequencing data opens the opportunity for biologists interested in virus ecology46, bacterial metagenomics47,48 and ecology of eukaryota49,50 to explore the functional potential of the deadwood-associated biota through the analysis of the metagenome as well as to obtain taxonomic overview of all deadwood-associated organisms using total RNA that allows reliable taxonomic classification of taxa across the whole tree of life51,52. Amplicon data described here for the first time offer intra-comparison with metagenomes and metatranscriptomes as well as inter-comparison with further deadwood studies16,53,54 and analysis of cross-domain interactions6,55. Efforts to collect data and generalize microbial diversity patterns13,14,56 profit from fully annotated, accessible and metadata-rich sequences which we present here. The Data Descriptor further provides information for ecologists, biogeochemists and conservation biologists interested in the role of deadwood in ecosystem processes and deadwood associated biodiversity, an important topic of the present research in forest ecology57.