Introduction

The internal transcribed spacer (ITS) rDNA has been widely used as the primary barcode marker for fungi due to its high variability and being easily amplifiable by PCR. Additionally, the LSU region has been used to complement ITS-based identifications (Klaubauf et al. 2010; Schoch et al. 2012; Heeger et al. 2018; Ceballos-Escalera et al. 2022). ITS-driven barcoding and even species delimitation has a long history of being problematic in the order Xylariales (Ascomycota). For instance, the genus Annulohypoxylon, which was rendered paraphyletic in older ITS-based phylogenetic analyses (Sánchez-Ballesteros et al. 2000). Proteinogenic sequences derived from TUB2 and α-actin unequivocally resolved into a monophyletic clade comprising Annulohypoxylon (Hsieh et al. 2005), which is identical with Hypoxylon sect. Annulata sensu Ju and Rogers. The genus was later further subdivided, following a multi-genealogy and incorporating chemotaxonomic and morphological evidence, and Jackrogersella was split from Annulohypoxylon (Wendt et al. 2018). The concurrent resurrection of the Hypoxylaceae and the division of the Xylariaceae s. lat. into different families would not have been feasible based on rDNA data alone.

A lack of specificity of ITS data not only affects generic segregation, but also species recognition. Members of the genus Daldinia, as well as the Hypoxylon rubiginosum and H. fuscum complexes, are known to have nearly identical ITS sequences and can only be segregated by polyphasic approaches (Stadler et al. 2013; Kuhnert et al 2014; Lambert et al. 2021). Beneath homologous ITS sequences between well-resolved species, Stadler et al. (2020) even reported on finding heterogeneous sequences among the multiple copies of distinct ITS loci found in high-quality genomes generated with 3rd generation sequencing technology (Oxford Nanopore), polished with 2nd generation sequencing technology (Illumina). How these findings of the pilot study featuring 14 genome sequences extend to other members of the Hypoxylaceae have so far not been followed up upon. Here, we attempt to fill this gap by providing genome sequences for an additional 44 strains of Xylariales, including several ex-type cultures.

Materials and methods

A total of 44 genome sequences from Xylariales were newly generated and subsequently used for the current study. All details are provided in Table 1.

Table 1 Details of the genome sequences selected from Xylariales, including strain IDs, sequencing methods, and references where original sequence data have previously been described. O/I, Oxford Nanopore/Illumina; PB, PacBio. Type specimens are labeled with T (holotype), IT (isotype), PT (paratype), and ET (epitype)

Genomic DNA preparation and extraction

All fungal strains were grown in 500-mL Erlenmeyer flasks with 100 mL YMG medium) (10 g malt extract, 4 g glucose, 4 g yeast extract ad 1 L deionized water; pH 6,3) and then placed in a shaking incubator at 220 rpm and 25 °C, for 3 to 5 days depending on growth speed of the fungus. The mycelium was harvested using a Büchner funnel with filter paper (MN 640 w, Macherey-Nagel, Düren, Germany) by vacuum filtration, then frozen with liquid nitrogen and ground to fine powder in a mortar. DNA extraction was performed with the GenElute® Plant Genomic DNA Miniprep Kit (Sigma Aldrich, St. Louis, MO, USA), following manufacturer’s instructions.

Instructions for Nanopore and Illumina library preparations were followed as reported in Stadler et al. (2020).

Assembly

Basecalling of Nanopore reads was done live with the Guppy algorithm embedded in the MinKNOW platform on GridION (Oxford Nanopore Technologies). Adapters were trimmed using Porechop v0.2.4 (https://github.com/rrwick/Porechop). Genome assemblies were performed using canu v2.1.1 (Koren et al. 2017). This assembly was then polished with Illumina short read data using Pilon v1.24 (Walker et al. 2014), applying four cycles with bwa-mem v0.7.17 (Li 2013) and four further rounds with Bowtie2 v2.4.4 (Langmead and Salzberg 2012). Mitochondrial contigs and assembly artifacts were removed based on coverage and a BLAST (Camacho et al. 2009) search against published mitochondrial sequences. The remaining nuclear genome contigs were further curated and sorted with the “clean” option included in funannotate v1.8.16 (Palmer and Stajich 2020).

Identification of ITS, LSU, TUB2, RPB2, TEF1, and ACT1 sequence copies

Genome sequences were used to create individual BLAST databases in Geneious prime 2023.4 (https://www.geneious.com). Previously published sequences for ITS, LSU, TUB2, RPB2, ACT1, and TEF1 were used as homology search templates to locate each region in the genomes and mapped in the query genome. All sequences obtained were aligned using the ClustalW algorithm with standard settings provided by Geneious prime 2023.4 (https://www.geneious.com) and edited manually, if necessary. The ITS (ITS1, 5.8S, and ITS2) sequence of Hypoxylon sporistriatatunicum DSM 115550 (GenBank Acc. No. MN056426) was employed as a reference for obtaining the ITS region. The minimum and maximum percentages of homology, as well as maximum bp deviation, were obtained after sequence alignment.

Molecular phylogenetic inference

Extracted ITS, LSU sequences were aligned by using MAFFT online (http://mafft.cbrc.jp/ alignment/server/, Katoh et al. 2019). A maximum-likelihood phylogenetic tree was constructed using IQ-TREE v. 2.1.3 [-b 1000 -abayes -m MFP] (Minh et al. 2020), with selection of the appropriate nucleotide exchange model by ModelFinder (Chernomor et al. 2016; Kalyaanamoorthy et al. 2017) based on Bayesian information criterion. Branch support was calculated with non-parametric bootstrap (Felsenstein 1985) and approximate Bayes test (Anisimova et al. 2011). In total, 1000 bootstrap replicates were mapped onto the ML tree with the best (highest) ML score. Single locus trees were calculated following the same methodology and checked for congruence with the multigene phylogenetic tree.

GC content and ITS secondary structure prediction

Intragenomic polymorphism of two strains was remarkably high (Pa. papillatum ATCC 58729, Hyp. monticulosa MUCL 54604). We investigated the notion if this is due to the polymorphism actually representing pseudogenes by following the workflow of Kolařík et al. (2021) and Prahl et al. (2021). Briefly, similarity in p-distance was calculated in MEGA11 software (Kumar et al. 2018). The GC content of the ITS region and its components (ITS1, 5.8S and ITS2) was calculated in DaMBE version 6.4.42 (Xia and Xie 2001). Modeling of hybridization of a proximal stem containing the 5.8S and 28S motifs that delimitate the secondary structure of the ITS2 was done in the web interface Internal Transcribed Spacer 2 Ribosomal RNA Database, ITS2-DB, (http://its2.bioapps.biozentrum.uni-wuerzburg.de/, Ankenbrand et al. 2015) using its “Annotate” tool, which is based on hidden Markov models (HMMs) (Keller et al. 2009). In order to predict the folding of the ITS2 secondary structure, we used an expected value for detection of significant hits below 0.001 (E-value < 0.001) and HMMs for fungal organisms with a minimum size of 150 nucleotides. We selected only those secondary structures that were obtained by direct folding and preferred over other modeled secondary structures. Free energy structures of the ITS2 were obtained using the RNAfold online tool, which were visualized using a force directed graph layout (forna) (Kerpedjiev et al. 2015) via the web application provided by ViennaRNA Web Services (http://rna.tbi.univie.ac.at/).

Results

A comprehensive analysis was conducted on the genomes of 44 strains of Xylariales. We report the generation of 30 new genome sequences derived from Hypoxylaceae and one from Lopadostomataceae, enabling taxonomic marker studies of unprecedented detail in this group. Hence, we sought to evaluate the suitability of different frequently used protein-encoding genes (TUB2, RPB2) and we explored TEF1 and ACT1 for phylogenetic reconstruction as well for their suitability as potential barcodes and to encompass the extent of observable intragenomic variability of rDNA. With the exception of the here reported H. dussii genome harboring two copies of TEF1, all proteinogenic genes were found to contain only a single copy, while we encountered multiple copies and intragenomic polymorphisms in the ITS (ITS1, 5.8S, ITS2) and LSU region. As previously discussed (Stadler et al. 2020), we cannot exclude that the actual number of copies is higher in the genomes as alignment or sequencing errors can still not be excluded even if the current, 3rd generation technologies are used. Therefore, the numbers given below reflect at least those that are actually present in the assembled genomes, but possibly, more paralogs could exist in reality.

Intragenomic variation in the ITS and LSU regions

The analysis of the strains revealed interesting findings regarding identity and the number of copies found in the corresponding genome. The majority of the strains studied showed varying degrees of identity among ITS and LSU, i.e., sequences displaying 100% identity in the ITS region but not in the LSU region, as well as vice versa. Nevertheless, for 12 strains, sequences were identical (100%) or showed minor deviances for 11 strains 98–99.9% for both regions (ITS and LSU). The number of copies of ITS and LSU located in one genome was shown to diverge as well. In our data set, approximately 22% of the strains analyzed displayed different numbers of ITS and LSU copies. For instance, J. cohaerens showed the highest variety (14 ITS and 10 LSU copies), followed by Pa. papillatum with 11 ITS and nine LSU copies. The genomes of H. addis, H. dussii, H. guilanense, H. sporistriatatunicum, and Hyp. monticulosa were observed to harbor only one extra copy of ITS. For P. hunteri and X. hypoxylon, one extra copy of LSU was found. Even though we cannot be sure if these phenomena are due to sequencing or assembly artifacts, we would like to point them out here. As already discussed by Stadler et al. (2020), the real numbers of these loci in the genomes could actually even be higher.

Notably, H. petriniae exhibited the highest number of copies in its genome, with 40 ITS and 38 LSU copies, showing identity of 99.1 to 100% and 98.5 to 100%, respectively, followed by H. guilanense with 21 ITS and 20 LSU copies with an identity of 99.3 to 100% and 99.4 to 100%, respectively. These findings add to the record of observed genetic complexity and diversity within the studied strains. The results are summarized in Table 2.

Table 2 Details of the numbers of copies found in ITS and LSU and % identity divergence for all genome sequences studied

Two different genomes of H. rubiginosum were analyzed. The strain DSM 106870 is an endophyte isolated from Fraxinus excelsior as antagonist of Hymenoscyphus fraxineus (Halecker et al. 2020), and its genome was sequenced by Oxford Nanopore technology in the current study. The ex-epitype strain MUCL 52887 was isolated from the wood of Fagus sylvatica (Wendt et al. 2018), and its genome had been sequenced using PACBIO (Stadler et al 2020). We were unable to retrieve neither ITS nor LSU regions from MUCL 52887, whereas the genome sequence of strain DSM 10687 exhibited five copies for both. Nevertheless, the ITS obtained from strain MUCL 52887 through Sanger sequencing (H. rubiginosum MUCL 52887, GenBank acc. number KC477232 in Stadler et al. 2013) was found to be 100% identical to the ITS copies obtained from H. rubiginosum DSM106870 (Supplementary Information Fig. 7B). The LSU (H. rubiginosum MUCL 52887, GenBank acc. number KY610469 in Wendt et al. 2018) sequence contains a deletion, but otherwise shares 100% sequence identity with four out of five retrieved LSU sequences obtained from the H. rubiginosum DSM 106870 genome (Supplementary Information Fig. 7C).

Furthermore, we analyzed the four ITS and LSU copies of the H. rickii genome sequence in more detail, which displayed identity from 98.9 to 99.8%. Six additional ITS sequences were downloaded from GenBank, including two sequences acquired from the ex-type strain MUCL 53309 (acc. number KC968932 in Kuhnert et al. 2014, KY610416 in Wendt et al. 2018) from Martinique; the strain YMJ 25 (acc. number JQ009313 in Hsieh et al. 2005) from Mexico; strain H19R (acc. number AJ390408 in Sánchez-Ballesteros et al. 2000) from Mexico; the strain “Hyporck” (acc. number MN490062, “Unpublished”) from Malaysia; and the strain CFE-152 (acc. number MN653266, “Unpublished”) from India (Supplementary Information Fig. 7A). Remarkably, five sequences of these strains exhibited 100% identity with the ITS-2 sequence obtained from the genome of H. rickii (Supplementary Information Table 1). However, strain CFE-152 (acc. number MN653266) from India displayed identities ranging from 93.7 to 94.3 % against the sequences of H. rickii from the genome. A BLAST comparison against the ITS sequences deposited in GenBank revealed a closest hit with 95.99%, representing a Hypoxylon sp. (acc. number KR155059.1 and KR155047.1, “Unpublished”) followed by H. rickii strain MUCL 53309 with 95.13%, confirming the dubious identification of this strain. For the LSU region, a unique available sequence in GenBank (acc. number KY610416 in Wendt et al. 2018) was retrieved that shared 100% identity with the LSU 2 of the here reported genome sequence of H. rickii.

Phylogenetic tree reconstruction

The analysis featured representatives for each molecularly well-established genus within the Hypoxylaceae, specifically Annulohypoxylon (2 strains), Daldinia (2 strains), Hypomontagnella (3 strains), Hypoxylon (29 strains), Jackrogersella (3 strains), Pyrenopolyporus (1 strain), and Parahypoxylon (2 strains). Additionally, one representative from both Xylariaceae and Lopadostomataceae (Xylaria hypoxylon and Creosphaeria sassafras) were included as outgroups.

In our phylogenetic analyses, 44 sequences (or 45 when considering TEF1) were employed. The final MAFFT alignment consisted of 4884 nucleotides for RPB2, 2351 nucleotides for TUB2, 2534 nucleotides for TEF1, and 1946 for ACT1. The alignment of each locus is available in the Supplementary Information Tables 4-7.

Out of the individual protein coding phylogenetic trees, the phylogram of RPB2 showed the highest support in the core clades of Hypoxylaceae consisting of the genera Annulohypoxylon, Daldinia, Hypomontagnella, Hypoxylon, Jackrogersella, Pyrenopolyporus, and Parahypoxylon BS and PP support (1/100), as well as for the outgroup (Fig. 1). Similar results were obtained for the combined rooted phylogenetic tree (Fig. 2).

Fig. 1
figure 1

Molecular phylogenetic maximum likelihood (lLn = −67265.272) tree inferred from the whole RPB2 sequence using IQ-TREE. Support values were calculated following Bayesian inference methodology and from 1000 bootstrap replicates. Bayesian posterior probability scores ≥ 0.95/bootstrap support values ≥ 70 are indicated along branches. Type material is highlighted in bold letters

Fig. 2
figure 2

Molecular phylogenetic maximum likelihood (lLn = −148775.562) tree inferred from a multigene alignment featuring proteinogenic nucleotide sequences derived from whole ACT1, TUB2, TEF1, and RPB2 genes. Support values were calculated following Bayesian inference methodology and from 1000 bootstrap replicates. Bayesian posterior probability scores ≥ 0.95/bootstrap support values ≥ 70 are indicated along branches. Type material is highlighted in bold letters

In contrast, the inferred trees obtained from ACT1, TUB2, and TEF1 loci showed a low to moderate support for principal clades (or groups) when compared to previously inferred phylogenies described for the family Hypoxylaceae by Wendt et al. (2018), Lambert et al. (2019), Becker et al. (2020), and Cedeno-Sanchez et al. (2023). The phylogenetic tree of each locus is available in the Supplementary Information Figs. 2-4.

Following the idea presented by Stadler et al. (2020), we inferred a phylogenetic tree for all extracted ITS rDNA sequences. The final data matrix comprised 145 sequences. Sequence multiples were reduced to one unique representative per ITS copy to improve readability. The final MAFFT alignment consisted of 1482 nucleotides. The resulting phylogenetic tree displayed low to moderate support for the primary clades within Hypoxylaceae. Sequences extracted from the same strain were consistently resolving in the same clade with high support (Supplementary Information Fig. 1).

Intragenomic polymorphisms—deep paralogues

The intragenomic polymorphisms recovered from two strains (Pa. papillatum ATCC 58729, Hyp. monticulosa MUCL 54604) were remarkably large; hence, we investigated whether they could be regarded as pseudogenes. In both strains, we detected a prominent haplotype (nine and three copies in the first and second strain, respectively) that fully matched the reported sequences for the particular strain obtained through Sanger sequencing. Additionally, we found minor haplotypes (always represented by a single copy) that exhibited a divergence of 89–92% compared to the major haplotype and displayed significantly lower GC content (Supplementary Information Table 2). Notably, the minor haplotypes of both strains exhibited highly divergent sequences in the conservative 5.8S rDNA when compared to all other sequences in the dataset. The alignment of 5.8S rDNA of all strains involved 165 positions and 26 unique haplotypes. Excluding the minor haplotypes of Pa. papillatum and Hyp. monticulosa, there were 20 variable positions each. When including Pa. papillatum minor haplotypes, an additional 20 variable positions were found. Similarly, incorporating the minor copy of Hyp. monticulosa resulted in an additional 31 positions, making a total of 51 variable positions. This deep divergence found in ITS is also reflected in the adjacent LSU sequences. Specifically, in Pa. papillatum, the minor ITS haplotypes (2nd, 3rd) continued with sequences that differed from the major copy by 11.5% (2nd, first 1000 bp used) and 12.6% (3rd, first 1000 bp used), and they differed from each other by 8.7%. For Hyp. monticulosa, the LSU haplotypes presented in the genome differed in 1% (1000 bp used).

To investigate further deviations in rDNA, we studied hybridization of the proximal stem containing 5.8S and 28S (LSU) motifs. In the genome of Pa. papillatum ATCC58729, the dominant (i.e., 1st) haplotype exhibited the typical hybridization of the proximal stem with a Gibbs free energy (∆G) of −17 and enthalpy (∆H) of −140.2 for the ensemble. Its structure is formed by an imperfect stem that harbors one free nucleotide on the 5.8S strand and one free nucleotide on the LSU strand, forming typical bulge loops. However, in other two minor haplotypes, variations in the ∆G and ∆H were detected when compared with the major sequence, and assembly of the typical proximal stem was highly modified (Fig. 3).

Fig. 3
figure 3

AE Hybridization model of the proximal stem region and FI ITS2 secondary structure prediction in Hyp. monticulosa and Pa. papillatum. A Major haplotype (i.e., 1st haplotype) of H. papillatum, B minor haplotype (i.e., 2nd haplotype) of Pa. papillatum, C minor haplotype (i.e., 3rd haplotype) of Pa. papillatum, D major haplotype of Hyp. monticulosa, E minor haplotype of Hyp. monticulosa, F major haplotype (i.e., 1st haplotype) of Pa. papillatum, G minor haplotype (i.e., 2nd haplotype) of Pa. papillatum, H major haplotype of Hyp. monticulosa, and I minor haplotype of Hyp. monticulosa. The helices are indicated by Roman numerals (I–IV)

As an additional indicator of potential rDNA pseudogenes, we studied the secondary structure of ITS2 sequences. In almost all haplotypes, the standard pattern with four helices (I–IV) was observed, with helix III being the longest (Fig. 3). However, for the minor 3rd Pa. papillatum ATCC 58729 haplotype, the prediction failed due to the absence of similar models in the database (see http://its2.bioapps.biozentrum.uni-wuerzburg.de/). On the other hand, the direct modeling of another minor 2nd Pa. papillatum haplotype resulted in a structure with a much higher free energy (−28.49 kcal/mol) compared to the major haplotype (−44.78). In Hyp. monticulosa, both haplotype models exhibited specific free energy values (Supplementary Information Table 2).

Discussion

Recent genome sequencing projects showed that relying solely on rDNA marker for species identification and phylogenetic analysis can drive problematic species delimitations and incorrect identifications, especially in species complex, due to the presence of intragenomic polymorphisms (Stadler et al. 2020; Paloi et al. 2022; Bradshaw et al. 2023). To further explore this notion, we strived to obtain high-quality genomes for rDNA analysis, combining 2nd generation sequencing technologies such as Illumina with 3rd generation sequencing technologies like Oxford Nanopore, along with extensive usage of bioinformatic tools, which significantly increased the accuracy of the genome assembly (Stadler et al. 2020; Paloi et al. 2022; Hoang et al. 2022).

The results obtained here are well-aligned with the previous results by Stadler et al. (2020), which revealed high intragenomic polymorphisms in a pilot study featuring a smaller subset of strains accommodated within the Hypoxylaceae. Additional sequence analysis revealed the presence of deep rDNA paralogs. Intragenomic variation in the rDNA cistron can likely be traced back to be caused by nucleotide deletions, insertions, and substitutions within the genome (Bradshaw et al. 2023; Paloi et al. 2022).

Although the number of LSU copies did not differ significantly in most of the studied strains, probably due to the rDNA cistron being arranged in tandem throughout the genome (Torres-Machorro et al. 2010), considerable variations were observed in genomes derived from, e.g., J. cohaerens, Pa. papillatum, H. addis, H. dussii, H. guialense, H. sporistriatatunicum, Hyp. monticulosa, Py. hunteri, and X. hypoxylon, where more copies of ITS were found compared to LSU, and in some cases, only a partial LSU was recovered. Additionally, we were not able to retrieve ITS sequences from H. rubiginosum MUCL 52887 and found only one copy in Hyp. submonticulosa. The quality of third-generation sequencing technologies and their impending implications for studying phenomena such as polymorphisms of the rDNA cistron, including their multiplicity (Bradshaw et al. 2023), are widely discussed for different groups of fungi (Paloi et al. 2022; Stadler et al. 2020). The apparent absence, or at least failure to detect ITS sequences inside a genome sequence for technical reasons, is not an isolated case, as was recently discussed by Bradshaw et al. (2023). Here, the authors were unable to locate the ITS sequences for a quarter of all taxa evaluated and only a single ITS copy for half of the taxa studied.

The study by Stadler et al. (2020) reported polymorphisms in the ITS region for the species H. lienhwacheense, H. rickii, Hyp. monticulosa, and Py. hunteri. In our analysis, apart from Py. hunteri, all these species also exhibited such variations in the LSU region. On the other hand, the genomes of A. truncatum, D. concentrica, H. pulicicidum, and J. multiformis were found to possess polymorphisms in the LSU region but not in ITS (Table 2). Despite this, for the species studied here, the ITS rDNA showed to have a lower rate of polymorphisms than the LSU region. We expected the opposite, taking into consideration that the ITS region (~ 400–900 bp) contains two introns (ITS1 and ITS2) that are highly variable and a well-conserved small non-coding RNA (5.8S). Usually, the LSU, with ~ 3000–5000 bp, is highly conserved within species because of the crucial function it plays (ribosome function and protein synthesis; cf. Gregory et al. 2019), and many variations in its sequence could cause disruption in these processes.

After in-depth analyses of sequences retrieved from the genome of H. rickii, we concluded that only one of the four sections (where the ITS and LSU can be located in the genome) were amplified using Sanger sequencing methods (Supplementary Information Fig. 6) in the past. This is interesting because the target sites of commonly used primers (ITS1, ITS4, ITS5) located in the genome did not diverge, and thus should be amplified by PCR stochastically. We encountered a similar phenomenon when studying the Py. hunteri genome sequence, where only one of the five sections can be found in the literature (Supplementary Information Fig. 5). For both regions (ITS and LSU), however, one of the sequences retrieved from the genome presents mismatches with the primers ITS1 and ITS5, hence offering an explanation on why this region is not targeted for amplification. A second explanation would be a highly condensed rDNA cistron, with the consequence that these sections are not actively transcribed when the cell needs to produce ribosomes for protein synthesis.

Phylogenetic analysis of the ITS showed low to medium support for the main clades of the Hypoxylaceae reported in previous studies (Wendt et al. 2018; Lambert et al. 2019; Becker et al. 2020; Cedeño-Sanchez et al. 2023). This result is congruent with older phylogenetic studies solely relying on ITS rDNA data alone but does not reflect taxonomic advances of the last decade (compare with Sánchez-Ballesteros et al. 2000; Hsieh et al. 2005). Strikingly, sequences from the same strain consistently formed well-supported clades, confirming the reliability of our results. This result suggests that ITS can rather be used to estimate taxonomic affinities towards the different species complexes described for the Hypoxylaceae in a “quick-and-dirty”-fashion. Due to LSU showing a similar pattern of intragenomic polymorphism, we would consider it risky to apply it as complementary barcoding marker to ITS for identification to species level. On the contrary, all the protein-coding regions studied here clearly displayed better phylogenetic resolution and hence can be regarded as much more suited for barcoding for fungi due to being highly conserved and not displaying intragenomic polymorphisms, at least in the here studied strains.

Of note, we pioneered the retrieval of TEF1 data from all here investigated genomes, marking its inaugural inclusion in a Xylariales phylogeny. This locus has been well-established for other taxonomic groups, such as Amphisphaeriaceae, Cainiaceae, Cladosporiaceae, Clypeosphaeriaceae, Diatrypaceae, and Hyponectriaceae (Jaklitsch and Voglmayr 2012; Dai et al. 2014; Vicente et al. 2021, Samarakoon et al. 2022). However, applying this locus alone did not show any advantage in direct comparison to the others, at least for Hypoxylaceae. Other retrieved genes (TUB2 and RPB2) have already been successfully applied to infer well-resolved phylogenies (Hsieh et al. 2005; Wendt et al. 2018; Lambert et al. 2019; Becker et al. 2020; Cedeño-Sanchez et al. 2023). Comparing the results obtained in the study of Hsieh et al. (2005) with a partial ACT1 and in this study using a whole ACT1 gene, it is clear that the ACT1 gene alone is insufficient for solving the relationships among species in the Hypoxylaceae.

Our investigation revealed high intragenomic polymorphism of rDNA in distinct strains within the Hypoxylaceae, indicating the presence of paralogs. Specifically, we found multiplets of rDNA sequences in Hyp. monticulosa and Pa. papillatum genomes, which we propose to call deep rDNA paralogs. They uniformly exhibited a significantly lower GC content compared to the major copy and showed highly variable 5.8S rDNA sequences, which were otherwise found to be highly conserved across species. Some of the paralogs retained the necessary motifs for maintaining the required secondary structure of ITS2, while two paralogs exhibited thermodynamically less stable structure (i.e., it has a high free energy) which could not be modeled anymore for technical reasons (Supplementary Information Table 2). Specifically, in two minor haplotypes of Pa. papillatum, conservative motifs in 5.8S and LSU sequences were disrupted, resulting in a corresponding proximal stem structure with a highly atypical form. Low GC content and mutations in otherwise well-conserved sequence segments are typical signs for a pseudogene (see Kolařík and Vohník 2018 and Stadler et al. 2020 for review). In eukaryotes, the hybridized 5.8S and LSU rRNA parts, forming so-called proximal stems, have a free nucleotide on each side with approximately six base pairs in between. The structural pattern of this proximal stem is necessary for successful detection of its associated processing machinery (see Keller et al. 2009) and has been proposed as a diagnostic character for pseudogene detection (Harpke and Peterson 2007, 2008). In general, the 3rd ITS haplotype of Hyp. monticulosa deviates the most, as it exhibits the lowest GC content, an unpredictable ITS2 secondary sequence, and a highly disrupted proximal stem structure. Furthermore, the LSU sequence adjacent to this haplotype shows the highest observed divergence (11.5%) from the major haplotype. To conclude, it is highly likely that the captured deep paralogs represent pseudogenes (at least in the case of Hyp. monticulosa) or sequences in an advanced stage of pseudogene formation. Variations in rDNA identity spotted for the other explored genomes can be explained with nucleotide deletions, insertions, and substitutions in the genome.

Lastly, we want to stress that the major conclusion of this study is that it reinforces our opinion that the rDNA cistron alone is insufficient as a universal barcode marker for fungi (Paloi et al. 2022; Bradshaw et al. 2023), especially in the Hypoxylaceae (Stadler et al. 2020). We propose to evaluate the TUB2 gene as a new primary barcoding marker for Hypoxylaceae as a substitute, as ample reference sequences have already been obtained for Hypoxylon and many other genera of Xylariales in the past. A phylogeny based on RPB2 sequences resolved the hitherto accepted topology for Hypoxylaceae and helped to further stabilize the phylogeny, but the number of available sequences derived from type and reliably identified vouchers is far lower than in case of TUB2. The suitability of TUB2 for many taxa needs to be examined by the inclusion of additional vouchers belonging to the same species (complex) to assess interspecific variability. The reason is that there are no molecular data at all or only a single sequence dataset available for more than 50% of the described Hypoxylaceae taxa. Nevertheless, it has already been shown that the few Hypoxylon taxa for which multiple specimens were sequenced that the TUB2 locus shows little variability in, e.g., H. fragiforme, H. rubiginosum (according to data from GenBank arising from multiple independent studies), and H. fuscum (cf. Lambert et al. 2021). Our study contributes to a better understanding of the genetic diversity and evolutionary dynamics within Xylariales and emphasizes the need for holistic approaches, such as multi-gene sequencing in fungal barcoding endeavors and phylogenetic analyses.