Skip to main content
Advertisement
  • Loading metrics

Genomic survey maps differences in the molecular complement of vesicle formation machinery between Giardia intestinalis assemblages

  • Shweta V. Pipaliya,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, Canada

  • Joel B. Dacks,

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliations Division of Infectious Diseases, Department of Medicine, University of Alberta, Edmonton, Canada, Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice [Budweis], Czech Republic

  • Matthew A. Croxen

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Writing – review & editing

    mcroxen@ualberta.ca

    Affiliations Division of Diagnostic and Applied Microbiology, Department of Lab Medicine and Pathology, University of Alberta, Edmonton, Canada, Li Ka Shing Institute of Virology, University of Alberta, Edmonton, Alberta, Canada, Women and Children’s Health Research Institute, University of Alberta, Edmonton, Alberta, Canada, Alberta Precision Laboratories, Alberta Public Health Laboratory, Edmonton, Alberta, Canada

Abstract

Giardia intestinalis is a globally important microbial pathogen with considerable public health, agricultural, and economic burden. Genome sequencing and comparative analyses have elucidated G. intestinalis to be a taxonomically diverse species consisting of at least eight different sub-types (assemblages A-H) that can infect a great variety of animal hosts, including humans. The best studied of these are assemblages A and B which have a broad host range and have zoonotic transmissibility towards humans where clinical Giardiasis can range from asymptomatic to diarrheal disease. Epidemiological surveys as well as previous molecular investigations have pointed towards critical genomic level differences within numerous molecular pathways and families of parasite virulence factors within assemblage A and B isolates. In this study, we explored the necessary machinery for the formation of vesicles and cargo transport in 89 Canadian isolates of assemblage A and B G. intestinalis. Considerable variability within the molecular complement of the endolysosomal ESCRT protein machinery, adaptor coat protein complexes, and ARF regulatory system have previously been reported. Here, we confirm inter-assemblage, but find no intra-assemblage variation within the trafficking systems examined. This variation includes losses of subunits belonging to the ESCRTIII as well as novel lineage specific duplications in components of the COPII machinery, ARF1, and ARFGEF families (BIG and CYTH). Since differences in disease manifestation between assemblages A and B have been controversially reported, our findings may well have clinical implications and even taxonomic, as the membrane trafficking system underpin parasite survival, pathogenesis, and propagation.

Author summary

Giardia intestinalis, a globally significant microbial pathogen, displays taxonomic diversity, with at least eight sub-types (assemblages A-H) capable of infecting various hosts, including humans. This study examined 89 Canadian genomes of assemblage A and B G. intestinalis to explore vesicle formation and cargo transport machinery using comparative genomics. We found considerable variability in the molecular components of endolysosomal ESCRT protein machinery, adaptor coat protein complexes, and the ARF regulatory system. Notably, differences existed between assemblages A and B but not within the same assemblage in the examined trafficking systems, including the loss of ESCRTIII subunits and lineage-specific duplications in COPII machinery, ARF1, and ARFGEF families. These findings could have clinical and taxonomic implications, as the membrane trafficking system is critical for parasite survival, pathogenesis, and propagation.

Introduction

Giardia intestinalis is a protist pathogen responsible for the globally occurring water and food-borne diarrheal disease, Giardiasis. The World Health Organization estimates roughly 280 million cases, with greater incidence of disease occurring in resource poor regions with limited access to clean drinking water and sanitation infrastructure [1]. Ingestion of Giardia cysts contaminating food and water leads to establishment of gut infection. Chronic and recurring infections in children have developmental consequences and can extend beyond the acute intestinal disease.

Although morphologically identical, multi-locus sequence genotyping has sub-divided G. intestinalis into eight distinct assemblages (i.e., genotypes), A through H, with broad animal host tropism, including humans [2]. Of these, A and B have the greatest host range and are the predominant assemblages that infect humans to cause human giardiasis. Despite similarities in their ability to cause human disease, phylogenetic analyses place assemblage A closely related to the cattle-infecting assemblage E and cat assemblage F. Meanwhile, assemblage B is closer in relation to assemblage G, which infects murine hosts such as mice and rats [3].

Advances in genome sequencing technologies have resulted in an abundance of new genomic data from numerous human and animal-infecting Giardia isolates for comparative analyses. Of the human-infecting isolates, assemblage A, isolate WB was the first available genome and lent tremendous insights into molecular-level losses and sequence divergence in this parasite compared to other eukaryotes [4]. Since then, other assemblage A isolates, such as AS175 and AS98, as well as assemblage B isolates, GS, GS_B, and BAH15c1, have also been sequenced and made available [58]. Recent advances in long-read sequencing technologies have also allowed cost-efficient improvements of these previous assemblies [810]. Other animal-infecting isolates from dog assemblages, C and D, and cattle assemblage E have also been sequenced and assembled, which have shed light on the molecular differences between the various Giardia assemblages [11,12]. Investigating these additional animal isolates is necessary due to their implications on potential for anthropozoonotic transmission between humans and domesticated animals [13].

Fundamental biological and genetic differences have been enumerated between the two human-infecting strains. Although structurally otherwise identical, various cytogenetic differences are also present, such as the number of chromosomes per nuclei and their sizes [14,15]. Although generally lower compared to other polyploid eukaryotes, genetic allelic heterozygosity (ASH) comparisons between assemblages A and B have shown variances in the levels between the two. Assemblage A isolate WB is markedly lower in overall ASH (<0.01%) compared to assemblage B isolate, GS, which has a much higher genomic allelic divergence (0.5%) [16].

From a clinical standpoint, differences in human Giardiasis outcomes caused due to assemblages A and B are also variable and often conflicting. These reports have been documented through several cross-sectional clinical studies conducted across numerous countries (e.g., United Kingdom, Scotland, Sweden, United States, Saudi Arabia, Egypt) to understand if specific strains are attributable to a higher probability of developing symptomatic or chronic disease outcomes [1721]. While the precise impact of molecular-level attributes and variances within Giardia assemblage biology on the manifestation of symptoms and disease remains uncertain, gaining insight into the underlying parasite biology and genomic content of these assemblages is undeniably valuable for a more comprehensive understanding of this discourse.

The increased availability of genome sequencing data from numerous assemblage A and B isolates previously enabled large-scale and multi-family comparative genomic investigation and has brought forward growing evidence that suggests considerable inter-assemblage variability at the genetic level. Much of this has been found within multiple Giardia-specific virulence gene families, such as the cysteine-rich variant-specific surface proteins (VSPs) expressed by the trophozoites during a gut infection for parasite antigenic variation in order to modulate and evade the host immune cells and other high-cysteine proteins (HC) [22]. Recent genome-wide surveys have highlighted that isolates belonging to assemblage B encode upwards of 340 different VSPs compared to their assemblage A counterparts, which have smaller VSP repertoires (i.e., 136 in WB)[9,23]. Most notably, the assemblages show high proportions of assemblage-specific VSP and HCs [9].

Differences in the other encoded protein machinery, particularly those belonging to the membrane trafficking system pathways, are also evident. Previous comparative genomics and phylogenetic investigations performed by us and others showed the complement and evolution of the ESCRT proteins, vesicle coats, and the ARF regulatory system proteins. Within each of these systems, consistent patterns of inter-assemblage variabilities were identified that would have substantial implications on the molecular intricacies underpinning the processes occurring at the parasite endomembrane organelles. Given the small number of isolates from each of the two human-infecting assemblages included in these studies, a deeper investigation at a population level is necessary to confirm the initial findings of intra-assemblage differences and to assess what variability exists, if any within assemblages, i.e. at the intra-assemblage level.

Large-scale genomic data from assemblage A and B isolates are therefore essential to evaluate these trends. Recently, the British Columbia Centre for Disease Control Public Health Laboratory (BCCDC PHL) sequenced 89 G. intestinalis isolates belonging to either assemblage A or B [24]. Archived isolates were collected from fecal samples, surface waters, and beavers during periods of sporadic and regional Giardiasis outbreaks in across the province of British Columbia (Canada) between the years 1989 and 1995 as well as some from Hamilton, Ontario (Canada) [25]. A combination of PCR and genome sequencing classified these outbreak-associated Giardia isolates within assemblage AI, AII, and B [24,25]. In this study we leveraged the short-read sequencing data generated by the BCCDC PHL by first assembling the Giardia genomes, followed by a population-scale survey of the molecular complement with three important families of vesicle formation machinery, the ESCRT complexes, vesicle coats and adaptor proteins, and the ARF regulatory system proteins in these large collection of Giardia isolates [24].

Materials and methods

Paired-end read information and data retrieval

De novo genome assembly analyses with the BCCDC PHL isolates of G. intestinalis assemblage A and B were performed using the previously sequenced and archived paired-end inserts that were generated using the Illumina MiSeq. Cysts belonging to each isolate had been previously obtained from surface water, beaver, and human fecal samples and archived by the BCCDC PHL [24,25]. Detailed geographical sources for isolate retrieval and assemblage classification were provided through Tables 1 and S1 in the previous study by Tsui et al [24].

Raw sequencing reads are available at National Centre for Biotechnology Information (NCBI) Short Read Archive under the BioProject ID PRJNA280606. We downloaded the sequencing data from the European Nucleotide Archive (The European Molecular Biology Laboratory, Heidelberg, Germany). Metadata detailing sequencing run statistics and biosample accessions for the 89 isolates is provided through the S1 Table as well as in the previous study [24].

Read quality assessment, taxonomic classification, and de-contamination

Prior to assembly, the overall quality of the paired-end inserts was assessed using FastQC [26]. All datasets were inspected for non-Giardia contaminating sequences. To do so, paired-end reads belonging to each biosample were subject to taxonomic classification using the metagenomic classification software, Kraken2 (v. 2.0.8-beta), and against a custom-built database for k-mer matching against query sequences [27]. The database was built by retrieving taxonomic information and full-length genomes corresponding to archaeal, bacterial, viral, human, fungal, protozoa, and vector contaminant (UniVec_Core) datasets from the NCBI Reference Sequence Database [RefSeq]. In addition, genomes belonging to Giardia muris (GCA_006247105.1 UU_GM_1.1), G. intestinalis EP15 (GCA_000182665.1), G. intestinalis BAH15c1 (GCA_001543975.1), G. intestinalis GS (GCA_011634595.1), G. intestinalis GS_B (GCA_000498735.1), G. intestinalis WB (GCA_011634545.1), G. intestinalis AS175 (GCA_001493575.1), G. intestinalis ADH (GCA_000498715.1), G. intestinalis assemblage C pooled cysts (GCA_902221545.1), and G. intestinalis assemblage D pooled cysts (GCA_902221535.1) were also added to the library using the kraken2-build option. Sequences that remained unclassified or were assigned to taxa other than the Giardia genus (NCBI taxid: 5741) were considered contaminants. Datasets were flagged if 20% or higher number of the total reads were not characterized as Giardia. These bio-samples were removed from analyses and not subject to downstream assembly process or comparative genomics. For the remainder datasets with low contamination levels, decontamination was performed using the Python script, extract_kraken_reads.py, available through KrakenTools (https://github.com/jenniferlu717/KrakenTools) [27]. Post-decontamination Kraken2 output was visualized using KronaTools [28]. Kraken-build shell script with all parameters has been made available as S1 File. Per-sample read taxonomic assignment is summarized in S2 Table.

de novo genome assembly

Kraken2 analyses flagged three datasets to be highly contaminated with bacterial sequences and three with mixed assemblage taxonomic assignment. These six samples were not analyzed further. The remaining 83 paired-end filtered datasets were kept for de novo genome assembly using the ploidy-aware assembler MaSuRCA v.3.3.0, using 200,000,000 hashes from Jellyfish v.3.2.4 to generate contigs and scaffolds [10-13Mb] in FASTA format. MaSuRCA configuration script has been made available as S2 File [29].

Genome assembly completeness and contiguity evaluations

The resulting nucleotide assemblies were subject to post-assembly quality assessments by computing genome contiguity and completeness statistics. Contiguity was assessed using the N50 metric with the Quality Assessment Tool for Genome Assemblies (QUAST) v. 5.0.2, which also determined other contig statistics such as total length in base pairs and the final percent GC for each assembly [30]. This detailed report is made available through S3 Table.

In addition to genome contiguity, to assess gene set completeness and report on the number of evolutionarily conserved near-universal single-copy orthologs, Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 4.1.1 was used as a tool for a translated Hidden Markov Model-based searching against OrthoDB v.10 [31,32]. Run parameters for BUSCO consisted of the following: lineage dataset selection was eukaryote_odb10, analysis mode set to ’genome,’ and TBLASTN e-value cut-off was kept as the default 1x10-3. All other BUSCO parameters, including those for the in-built AUGUSTUS v. 3.2.3 gene prediction software, were kept to default. All BUSCO output scores were plotted and visualized using the ggplot2 library available through the R Tidyverse package [33]. Data visualization specifications were provided through an R script, which is available as S3 File [34].

Reference- mapped gene predictions and functional annotation

The nucleotide contigs from all isolates were used to predict gene features and annotations to screen for protein orthologs. To do so, Liftoff v.1.5.1 was used for annotation mapping from closely related reference genomes [35].

Of the currently available assemblage A and B genomes, those belonging to G. intestinalis assemblage A, isolate WB (UU_20 assembly, GCA_011634545.1) and assemblage B, isolate GS (GL50801 assembly, GCA_011634595.1) have been richly annotated and publicly available [5,8]. Therefore, UU_20 and GL50801 assemblies were used as references for gene prediction in the A and B assemblage genomes, respectively. Input and reference genomes were first indexed and aligned using Minimap2 v. 2.17 using the following parameters: -a -eqx–end-bonus 5 -N 50 -p 0.5 [36]. Genes were successfully aligned if the coverage along the coding sequence (CDS) met a threshold of 50% or higher in sequence similarity. They were used to generate output GFF files detailing coordinates, accessions of the mapped ORFs, and accompanying gene features. Unmapped gene accessions were parsed out in a separate file. Program usage parameters were specified as per Liftoff instructions to generate corresponding GFF files and the per-sample Liftoff mapped and unmapped summaries have been made available through S4 Table.

Genome output files were used for presence/absence analyses of the vesicle formation machinery in isolates of Giardia assemblage A and B, previously curated through comprehensive comparative genomics and phylogenetics in preceding investigations [3740]. Specifically, WB and GS accessions corresponding to subunits of the giardial ESCRT machinery, adaptins, retromer, COPII, COPI, clathrin, and the ARF regulatory system proteins, were searched into the newly generated GFF files. To consolidate these preliminary hits, assign orthology to any unmapped genes, and cross-identify and validate assemblage-specific orthologs/paralogs, TBLASTN (v. 2.2.29+) searches, with an e-value threshold set to 0.01, were also performed [41]. Orthologs of the identified vesicle formation machinery from G. intestinalis ADH, WB, GS, and GS_B were used as queries for homology searching into the contigs and scaffolds of all newly assembled genomes. Results of this survey are summarized as tile-plots, but specific scaffold locations from the TBLASTN hits have been made available through S5 Table.

Using these identified ORFs, we extracted the nucleotide sequences and performed their translation using Bedtools getfasta (S4 File). Both nucleotide as well as translated amino acid coding regions from each trafficking protein analyzed are summarized in S6 Table.

Results

Genome contiguity and completeness analyses yield uniform results across all isolates

Previously, Tsui and colleagues performed genome sequencing and assembly with 89 G. intestinalis assemblage A and B isolates archived at the BCCDC PHL to evaluate specific disease transmission dynamics underpinning the 1980s Giardiasis outbreak in British Columbia, with some of the isolates originating from New Zealand and Ontario to be used as reference [24]. Although the sequenced short paired-end reads from this analysis are publicly available, the assembled genomes are not. Therefore, it was necessary to perform de novo genome assembly prior to comparative genomic analyses with the vesicle formation machinery, as per the summarized workflow (Fig 1). To do so, read taxonomic classification was performed first to assess contamination and remove sequences that did not correspond to Giardia (Fig 1). Of the 89 initially retrieved datasets, six were removed from final genome analyses, either due to considerable levels of bacterial contamination or if the samples were characterized as ‘mixed’ assemblages. In the latter case, according to Tsui et al. [24], a mixed assemblage assignment was denoted if the sequencing reads mapped equally to both Giardia assemblage A and B. This does not imply existence of a genetically hybrid strain but is instead a by-product of gDNA contamination or its isolation from mixed cultures containing trophozoites belonging to both strains. Therefore, these data were treated as metagenomic. Because this investigation aimed to elucidate nuanced molecular-level differences between the two assemblages, samples with ‘mixed’ assemblage classification were also removed from analyses. This resulted in a final total of 83 datasets being used for downstream analyses, 42 of which were classified as assemblage A and 41 as assemblage B (S5 and S6 Tables).

thumbnail
Fig 1. Workflow for de novo assembly and reference-based gene predictions.

Steps included read-quality assessment and filtering using Kraken2 and FastQC, followed by de novo genome assembly using MaSuRCA. Assembled contigs and scaffolds were used to determine contiguity and completeness using metrics such as N50, the total number of contigs, and BUSCO scores. The resulting contigs were used for reference-based genome annotation using Liftoff for orthology mapping of genes and presence/absence analyses of the vesicle formation proteins. False negatives and existing orthology assignments were cross-validated using TBLASTN.

https://doi.org/10.1371/journal.pntd.0011837.g001

N50 scores and contig sizes for all assemblies were determined using QUAST. The results of this analysis calculated an average N50 of 70,543 bp and a median of 74,269 bp across all assemblies (S1 and S3 Tables). Genomes were also assembled in an average of 382 contigs that were ≥ 1kb in size (S1 and S3 Tables). In addition to genome contiguity, completeness was also assessed by determining the number of evolutionarily conserved single-copy orthologs of eukaryotic proteins by generating BUSCO scores where overall the percent BUSCO of complete plus fragmented genes was approximately 25 to 30% across all assemblies (Figs 1 and S1). While these are much lower than what is typically expected of a eukaryotic genome from animal, plant, or fungal lineages (i.e., 85–95%), low BUSCO scores in Giardia are a drawback of limited inclusion of diverse protist lineages within the eukaryote_odb10 database. In general, metamonad lineages are highly divergent in their sequences and ancestrally lack many of the proteins present in model eukaryotes, and therefore all suffer from poor BUSCO scores [4244]. Instead of using these as absolute measures, BUSCO scores were evaluated against those published for other Giardia assemblies. The short-read reference genome of assemblage A, isolate WB (GCA_011634545.1) was previously determined to have a BUSCO score of C:23.9%(S:0%, D:23.9%), F:5.9%, M:70.2%, n = 255, while the short-read reference genome from assemblage B, isolate GS has a BUSCO completeness score of 23.1% (https://www.uniprot.org/proteomes/UP000001548; https://www.uniprot.org/proteomes/UP000002488) [4,6]. Therefore, our re-assembled Canadian Giardia genomes are comparable, if not slightly more complete, in their BUSCO scores to those of the previously published short-read reference genomes belonging to assemblage A isolate WB and assemblage B isolate GS.

Overall GC content and genome sizes are comparable to other isolates of assemblages A and B

Aside from genome completeness and contiguity, the percent GC content [%GC] and genome size metrics for these new Giardia assemblies were also determined.

Of the 83 isolates assembled and kept for analyses, 42 belonged to assemblage AI or AII, and 41 to assemblage B. Previous genomic investigations with assemblage AI and AII isolates (i.e., DH, WB, and AS175) were approximated to be 48% GC rich [4,5,45]. A similar trend was observed across the 42 BC G. intestinalis assemblage A isolates, where the %GC across all isolates ranged between 47.8 and 48.9 (Fig 2B and S3 Table). A few outliers had slightly higher scores (i.e., 49 to 49.96%), but overall, mean and median %GC for assemblage A were 48.5% and 48.36%, respectively (Fig 2C and S3 Table). Similar to assemblage A, previously investigated assemblage B isolates also had %GC ranging between 47 and 49 (i.e., GS, GS_B, and BAH15c1) [57]. The results from this study are identical to those values, wherein the 41 BC B isolates had 47 to 49%GC, with a mean of 48.7% and a median of 48.9% (Fig 2A and 2C and S3 Table).

thumbnail
Fig 2. Percent GC content comparisons between assemblage A and B BCCDC PHL genome assemblies.

[A] depicts %GC for the 41 isolates belonging to assemblage B. These ranged between a minimum of 47.7% and a maximum of 49.3%. [B] depicts the %GC content for assemblage A isolates, which, like assemblage B, ranged between 47.8 and 49.7. [C] is a box-plot depiction of the percent ranges and calculated means for GC content for assemblage A and B isolates, where a similar overall % GC was present for both [i.e., ca. 48%].

https://doi.org/10.1371/journal.pntd.0011837.g002

Genome sizes were also evaluated and compared against the previously published Giardia genomes. In the literature, inter-assemblage variances within the overall genome sizes of assemblage A and assemblage B isolates have been noted, wherein assemblage AI and AII isolates (i.e., WB, DH, AS175, AS98, ISS17, and ZX15) generally ranged between 10.2 Mbp and 11.7 Mbp [46,9,45]. In contrast, assemblage B isolates (i.e., GS, GS_B, and GS/M clone H7) were comparatively larger, ranging between ca. 11 Mbp to 13 Mbp [46,9,45]. This trend was also consistent in this investigation. Apart from one outlier, the genome size range for all assemblage A isolates ranged between ca. 10.6 Mbp and 11.9 Mbp, with an average of 11.0 Mbp and a median of 10.9 Mbp (Fig 3B and 3C). Assemblage B isolates, on the other, were larger and ranged between 10.8 Mbp and 13.7 Mbp, with an average of 12 Mbp and a median of 11.9 Mbp (Fig 3A and 3C). The reference isolate GS has a genome size of 12 Mbp, whereas GS_B is 13 Mbp in size. Therefore, assemblies from this study are similar in size compared to the previously published isolates from the same assemblage (Fig 3A and 3C).

thumbnail
Fig 3. Genome size comparisons between assemblage A and B BCCDC PHL genome assemblies.

[A] depicts genome sizes for the 41 isolates belonging to assemblage B. Genome sizes are relatively uniform across all isolates and range between 10.9 Mbp and 13.7 Mbp. [B] represents genome sizes for the 42 isolates belonging to assemblage A which range between 10.6 Mbp and 12.3 Mbp. [C] is a box-plot depiction of the range and calculated mean of the genome sizes for all isolates belonging to each assemblage. Overall, we observe assemblage A genomes to be comparatively smaller in size than assemblage B genomes.

https://doi.org/10.1371/journal.pntd.0011837.g003

The GC content and genome size’s collective examination yielded identical values as the previously published pan-global isolates. These findings additionally validate assembly correctness and corroborate the previously observed trends at a greater population level. Although the two assemblages are similar in their overall %GC content, isolates belonging to assemblage B are consistently larger than A.

Gene predictions and functional annotations suggests close similarity to reference genomes

Past analyses have identified some differences in membrane-trafficking machinery complement between assemblages A and B [3941]. However, these trends were based on a limited number of samples (three to five) per assemblage, and so a much larger sampling is needed to confirm the results. Moreover, the question of whether there are intra-assemblage differences within A or B vesicle coat machinery (Adaptins, COPI, and Retromer), COPII subcomplex, ESCRT protein complexes, and the ARF regulatory protein machinery has not been assessed at a population-level and as a result, survey of these proteins was performed with these 83 Canadian genomes to comprehend the trends previously observed in the repertoire of the endo-lysosomal vesicle formation machinery.

To do so, we first used Liftoff to map gene features and annotations from closely related species of organisms by providing the genomes from assemblage A, isolate WB (UU_20 assembly; GCA_011634545.1) and assemblage B, isolate GS (GL50801 assembly; GCA_011634595.1) as references for mapping. Many of the annotations have been curated based on experimental evidence (i.e., molecular protein characterizations, transcriptomics, and proteomics) and are regularly updated by the Giardia research community. To supplement and cross-validate Liftoff annotations, TBLASTN analyses were also performed for genes of interest to eliminate false-negative absence artifacts due to improper or missed gene mapping.

As per recent estimations, the newly updated genome of G. intestinalis WB (UU_20 assembly) is predicted to encode 4,963 protein-coding genes and 85 non-coding RNA (ncRNAs) genes, yielding a total of 5,048 genes [8]. From this total, Liftoff mapped an average of 4,688 genes from the BC assemblage A genomes, with approximately 72 remaining unmapped (S4 Table). Similarly, albeit slightly lower, the total number of protein-coding and ncRNAs in the assemblage B reference GS genome (GL50801 assembly) is estimated at 4,470 and 92, respectively, resulting in a total of 4,562 predicted genes. An average of 4,481 genes were mapped across the BC assemblage B genomes, with 62 remaining unmapped (S4 Table). Unmapped genes may have resulted from sequence divergence or fragmented orthologs that did not share enough similarity with the target for a Minimap2 cut-off of 50%. The other possibility is that a subset of these unmapped genes are protein repertoires exclusive to the reference WB and GS genomes. Nonetheless, these findings suggest that for both assemblages A and B, approximately 98.2% of the genes and annotations from WB and GS genomes were successfully mapped, and therefore, encoded in the genomes of the newly assembled isolates.

ESCRT repertoire varies between assemblage A and B isolates but is consistent within assemblages

Previously, comparative genomic and phylogenetic analyses with the late-endosomal ESCRT subcomplexes were performed by us to trace their evolution across fornicates, including several human-infecting isolates of Giardia [39]. Here patterns of universal streamlining within ESCRTs were observed across the entire Giardia genus, such as the complete loss of the ESCRTI subcomplex [39]. However, aside from absences, Giardia-specific duplication events also occurred to yield several paralogs in the following subunits: ESCRTII-Vps36 (i.e., Vps36A, B, and C), ESCRTIII-Vps24 (i.e., Vps24A and Vps24B), ESCRTIIIA-Vps4 (i.e., Vps4A, B, and C), ESCRTIIIA-Vps46 (i.e., Vps46A and B), and ESCRTIIIA-Vps31 (i.e., Vps31A, B, and C) [39]. Comparisons between various pan-global isolates revealed distinct differences within the repertoire of the individual subunits and the number of paralogs that comprise the giardial ESCRTII, ESCRTIII, and ESCRTIIIA. To better elucidate these inter-assemblage differences within ESCRTs and to assess if these trends exist at a population level, the repertoire of the giardial ESCRT machinery in the newly assembled BC Giardia genomes was reconstructed. This was done by surveying the Liftoff annotations combined with TBLASTN searches into the nucleotide contigs and scaffolds. Overall, our findings confirm and extend past results of ESCRT retention and loss within and between assemblages.

All 42 assemblage A isolates are conserved in their Giardia repertoire of the ESCRT complexes without any variabilities at the individual isolate level (Fig 4A). Contrary to this, previously noted streamlining exists in several ESCRT machinery components in assemblage B isolates (Fig 4B). The most prominent of these was the loss of Vps20L. The survey of 41 new assemblage B isolates confirms that this crucial ESCRTIII component is universally absent across all sampled Giardia assemblage B genomes (Fig 4B). TBLASTN searches using Vps20L orthologs identified in pan-global assemblage A isolates (i.e., DH, WB, and AS175) were used as queries to rule out false-negative artifacts. Despite this approach, no protein hits resembling Vps20L orthology were identified. Similar trends were observed within one of the three Vps4 paralogs (i.e., Vps4C), which also remained unidentified in the pan-global isolates of assemblage B. Using both Liftoff and TBLASTN, only Vps4A and Vps4B were retrieved across all 41 isolates (Fig 4B). Therefore, the absence of Vps20L and Vps4 points towards a global loss of these components in this assemblage.

thumbnail
Fig 4. Tile-plot depictions of the ESCRT repertoire in the BCCDC PHL isolates.

[A] depicts the giardial ESCRT repertoire distribution in the newly assembled BCCDC PHL isolates compared to the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified within any assemblage A isolates. [B] shows the distribution of the giardial ESCRT repertoire in the newly assembled BCCDC PHL assemblage B isolates compared to the two pan-global reference assemblage B isolates, GS and GS_B. Assemblage-specific losses were identified within the ESCRTIII-Vps20L and ESCRTIII-Vps4C, and are consistent with the findings in the pan-global isolates, indicated in grey.

https://doi.org/10.1371/journal.pntd.0011837.g004

Altogether, these findings validate previous observations of molecular differences within the endo-lysosomal ESCRT subunits between the two assemblages and allow for an extension of those conclusions at a greater population level.

Vesicle coat complexes follow a pattern of inter-assemblage molecular differences

As with the ESCRTs, the evolution and the molecular complement of the heterotetrameric complexes (adaptins and COPI) has been addressed [38]. To consolidate those findings as well as to identify any other possible isolate-level differences, the Giardia complement of heterotetrameric adaptor complexes (HTACs) from 83 BCCDC assemblage A and B genomes were determined.

The giardial repertoire of adaptins, COPI, retromer, and clathrin heavy chain were conserved from an inter-assemblage standpoint (Fig 5). However, isolate-level variabilities were present within the adaptins.

thumbnail
Fig 5. Tile-plot depictions of COPII and retromer repertoire in the BCCDC PHL isolates.

[A] depicts the distribution of the giardial COPII and retromer components in the newly assembled genomes of the BCCDC PHL isolates, compared to the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified within any of the assemblage A genomes. [B] depicts the distribution of the giardial COPII and retromer repertoire in the newly assembled BCCDC PHL assemblage B genomes and compared with the two pan-global reference assemblage B isolates, GS and GS_B. Assemblage B-specific losses are evident within one of the paralogs of COPII-Sec24FII [Sec24C].

https://doi.org/10.1371/journal.pntd.0011837.g005

Unexpectedly, two instances of gene absences within components of the AP-1 subcomplex in genomes of assemblage B were noted. First was the lack of AP-1μ in the SRR1957167 assembly and the second absence was within the AP-1σ subunit in the VANC/96/UBC/129 assembly (Fig 6B). TBLASTN searching using AP-1μ and AP-1σ orthologs from other assemblage B genomes could not identify μ1 or σ1 subunits in these genomes other than those belonging to the AP-2 sub-complex. The absence of AP-1μ and AP-1σ, although surprising, is highly unlikely and should be ascertained as a technical assembly or sequencing-related issue. AP-1μ is absent from the SRR1957167 assembly belonging to assemblage B, which is an outlier as it has a comparatively smaller genome size (10.9 Mbp) and lower N50 (44, 795 bp) with its counterparts. Both attributes are likely a result of low starting sequencing depth compared to the other isolates and hence a cause for gene absence artifacts. In the case of AP-1σ, which is missing from the VANC/96/UBC/129 assembly, may have been due to low coverage k-mers in this region and therefore discarded during the assembly process. These are more likely scenarios than instances of actual loss, as the remaining 39 isolates from assemblage B encode both adaptin subunits.

thumbnail
Fig 6. Tile-plot depictions of heterotetrameric adaptor complexes and clathrin repertoire in the BCCDC PHL isolates.

[A] depicts the distribution of the previously identified HTACs and clathrin components in the newly assembled BCCDC PHL isolates and compared with the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified in any assemblage A genomes. [B] depicts the distribution of adaptin, COPI, and clathrin components in the newly assembled BCCDC PHL assemblage B genomes in comparison to the two pan-global reference assemblage B isolates, GS and GS_B. Although no large assemblage-wide losses are evident, instances of individual isolate-specific absences within AP-1 subunits [i.e., AP-1μ and AP-1σ] were present.

https://doi.org/10.1371/journal.pntd.0011837.g006

The molecular complement of COPII, retromer, and clathrin have not been examined at the same depth and as recently as the above systems and therefore were examined here. For nearly all components we found the identical repertoire across Retromer, Clathrin, and most of COPII. Unexpectedly, we find multiple paralogs of the COPII component Sec24, which we have termed Sec24A, B, and C across all 41 assemblage A genomes (Fig 5). However, COPII-Sec24C was not found in any of the assemblage B genomes, despite searching by TBLASTN analyses using assemblage A Sec24C sequences (Fig 5B).

Paralogues of the ARF regulatory system proteins continue to differ between the two assemblages

Finally, although ancestral losses shaped the fornicate ARF regulatory system, surprisingly tight conservations within the overall complement of ARF1, ARF GAPs, and ARF GEFs exists in Giardia and its free-living relatives [40]. Despite this, there remained critical differences in the encoded proteins of the two assemblages. Namely, paralogs of the fornicate-specific ARF1F proteins and Sec7 domain-containing ARF GEFs differed [40].

In this previous survey, of the three fornicate-specific ARF paralogues traced in G. intestinalis, assemblage A possessed all three ARF1 variants (i.e., ARF1FA, ARF1FB1, and ARF1B2), identified, whereas a secondary loss within one of the two ARF1FB paralogs (i.e., ARF1FB2) was identified in the 41 assemblage B examined [40]. Extending this to the 83 genomes, this was confirmed, whereby all new assemblage A genomes possessed the three paralogs, while ARF1FB1 was universally absent across all 41 assemblage B genomes (Fig 7B). Unlike the previously discussed systems where absences were strictly observed in assemblage B, that was not the case with the ARF regulatory system proteins. Assemblage A has been reported to lack one of the two BIGL proteins, while assemblage B did not possess one of the two Cytohesin identified in assemblages A and E [40]. Once again, the population-level survey confirms the absence of one of the two BIGL proteins from assemblage A and CYTHL from assemblage B (Fig 7), but no intra-assemblage variability.

thumbnail
Fig 7. Tile-plot depictions of the giardial ARF regulatory system protein repertoire in the BCCDC PHL isolates.

[A] depicts the distribution of the previously identified giardial ARF, ARF GEF, and ARF GAP repertoire in newly assembled BCCDC PHL Giardia genomes compared to the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. In the previous survey, isolate WB was shown to lack both paralogues of BIG [Pipaliya et. al., 2021]. However, loss within only one of the two BIGL paralogues is apparent in the new BCCDC assemblage A isolates. [B] depicts the distribution of the giardial ARF regulatory system proteins in the newly assembled BCCDC PHL assemblage B isolates in comparison to the two pan-global reference assemblage B isolates, GS and GS_B. Assemblage-wide losses within ARF1FB2 and one of the two CYTHL paralogs are also evident and consistent with the previous results.

https://doi.org/10.1371/journal.pntd.0011837.g007

Discussion

Past molecular evolutionary work has identified distinct differences within subunits of the trafficking complexes encoded by the two human-infecting G. intestinalis assemblages. Because the previous sampling was limited to few genomes, a population-scale investigation to better define the heterogeneity within these protein complements was necessary. The assembly of 83 full-length genomes belonging to Canadian isolates of G. intestinalis assemblage A and B permitted a large-scale comparative analysis of the vesicle formation machinery.

Genome size differences corroborates observed inter-assemblage cytogenic differences

Because it was assumed that the two human-infecting assemblages would be genetically distinct strains, an impartial examination into the differences between the trafficking complement and other genomic attributes (i.e., genome size and GC content) could have only been made appropriately through a de novo approach. Therefore, a de novo, instead of reference-aligned assembly, approach was pursued for genome assembly to limit reference bias and account for possible structural and sequence variants. Using MaSuRCA, we assembled a roughly equal number of genomes belonging to both assemblage A and B isolates (i.e., 42 assemblage A and 41 assemblage B genomes) for comparative assessment of %GC and genome sizes between the isolates analyzed in this study alongside publicly available reference genomes of WB and GS.

Although we did not observe major variability within the %GC content between assemblages or individual isolates, inter-assemblage differences exist within the overall genome size. The Canadian assemblage A isolates from this investigation ranged between 10.6 to 12.3 Mbp in size similar to other public assemblage A isolates (i.e., 10.3 to 12.08 Mbp) [4,5,46]. In contrast, Assemblage B genomes from our study (i.e., 11 to 13.7 Mbp) and those publicly available (i.e., 10.4 to 13.6 Mbp) were comparatively larger. This could be corroborated by pulsed-field gel electrophoresis studies which have allowed for the separation and comparative assessment of molecular weights. Analyses of the five Giardia chromosomes (Chr) belonging to isolate WB and GS revealed distinct differences where WB encodes Chrs that are 1.5, 2, 3, and 3.5 Mbp in size while GS has a slightly larger Chr1 (1.8 Mbp) due to a recombination event with Chr2 [47,48]. Optical mapping between these five chromosomes has also identified considerable structural rearrangements, inversions, and translocations [5]. Notably, even within the syntenic regions, only 70% sequence similarity was present between GS and WB. Comparisons between the number of protein-coding genes were also dissimilar, where ortholog overlap analyses identified 2962 heterologous open reading frames (ORFs) in assemblage B isolate GS_B. In contrast, assemblage A isolates, DH and WB, were reduced in this number by half, where only 1935 and 1067 unique ORFs were identified, respectively [5].

Potential impacts of missing vesicle formation machinery on secretory processes between the two assemblages

Previous functional work has determined the role of the vesicle formation machinery within Giardia to be critical to the parasite’s ability to facilitate secretory and uptake processes in trophozoites and encysting cells [39,4954]. Therefore, divergences in its molecular complement could implicate crucial functional variabilities within giardial trafficking processes at these organism-specific compartments.

Within ESCRTs, variations in ESCRTIII-Vps20L and ESCRTIIIA-Vps4 were identified. Cellular localization investigations by us and by previous others in the field have lent unparalleled insights into the promiscuity of ESCRTs at different giardial endomembrane compartments. Giardia ESCRTIII-Vps20L was primarily localized to the ER in conjunction with numerous ESCRT subunits [39]. While the specific dynamics of inter-ESCRT association by sequential recruitment onto various organellar membranes are currently unknown in Giardia, some conserved aspects such as cargo recognition and membrane remodelling must still exist at those different compartments. Therefore, the absence of critical subunits such as Vps20L in assemblage B isolates implies possible functional compensation mechanisms to fulfill those same roles or that there is divergence in the underlying pathway wherein Vps20L is simply not necessary in this assemblage. While the former scenario may very well be the case for the missing paralogs such as Vps4C, which may be having redundant cellular roles as other Vps4 proteins at the peripheral vacuoles in assemblage A isolates, the latter scenario could signify crucial biological differences. Differences in paralog functions and molecular association of the giardial trafficking machinery, including the ESCRTs (i.e., Vps4 and Vps46), have been hinted at previously and therefore is a plausible scenario, but one that still requires robust molecular and biochemical testing [55]. Nonetheless, these results posit a potentially differentiated underlying mechanism by which the existing ESCRT subunits function within the Giardia trophozoites of the two assemblages.

Like the ESCRTs, although most vesicle coats (i.e., adaptins, COPI, and retromer) are conserved in their complement between the two assemblages, differences were noted within the COPII components specifically in the number of lineage-specific paralogs of Sec24. Giardial COPII machinery is fundamental to ESV biogenesis and maturation to ensure the transport of cyst-wall material to the parasite surface [49,56]. Sec24, along with Sec23, forms pre-budding complexes upon cyst-wall protein recognition for a COPII-coat assembly at the ER-exit sites [49,57]. In the absence of one of the three Sec24 paralogues, it could be that the remaining two Sec24 proteins may be fulfilling this role in assemblage B isolates and that plasticity within this system exists to accommodate losses such as this one. A contrary scenario may also be possible, wherein absence in Sec24 paralogs could indicate differences in COPII-assembly dynamics. All fornicates lack Sec16, which typically stabilizes Sec23/Sec24 and Sec13/Sec31 coats during the vesicle budding process. Therefore, ancestral neofunctionalization of these lineage-specific paralogs may have occurred to compensate for missing Sec16 roles. A secondary loss of this protein in Giardia assemblage B could then have implications on the molecular dynamics of how the COPII-coat is stabilized in those parasites, which in turn could translate to altered ESV-biogenesis mechanisms or cyst-wall material trafficking. These postulations provide hypotheses for testing in encystation experiments in assemblage B models versus assemblage A.

Finally, differences in the ARF regulatory system proteins, especially the ARF1F GTPases, again may indicate functional redundancy of these paralogues in assemblage B at the PECs. Differences in the GEF complement may indicate variable regulation of the giardial ARFs. While testing these scenarios through fluorescent microscopy and proteomics experiments in the trophozoite stages of assemblage B isolates would be interesting, it is challenging to generate and establish parasite cultures of transgenic variants of assemblage B isolates, as they are slow growing and not amenable to stable or episomal integration of plasmids for recombination protein expression. Once technical advancements are made in this area of Giardia biology, these experiments should be pursued as they will open avenues to elucidate novel ARF biology and GTPase regulation mechanisms in Giardia and eukaryotes in general.

During revisions of this manuscript, Klotz and colleagues released 5 assemblage B genomes [9]. To determine whether the more recent and comprehensive iterations of the five publicly available assemblage B genomes (P387, P424, P458, P344, and P427) encode the absent paralogs of ESCRTIII (Vps20L, Vps4C), COPII (Sec24C), and ARF (ARF1B2, CYTHL) family proteins identified in assemblage A, we conducted both forward tBLASTn and reverse BLASTx searches on the nucleotide scaffolds from the Klotz et al study [9]. Our searches failed to retrieve correct orthologs of these proteins in reverse BLASTx, consistent with the findings in this study. It is important to note that only scaffolds from this study are currently available, therefore, we refrain from making definitive conclusions of absences which can only be made once these similarity searches at the protein-level are performed if protein predictions of these genomes become available in the future.

Giardia as a species-complex

Most tantalizingly, our findings support previously reported genomic and molecular complement-level differences at the population level. The definition of what constitutes ’species’ remains a hotly debated philosophical discussion within the fields of ecology, evolution, and medicine. These lines are especially blurred in microbiology. Historically, the biological species concept has defined two organisms to belong to the same species if, through sexual reproduction, they can produce viable progeny [58]. There is limited evidence for sexual reproduction or inter-assemblage genetic exchange in Giardia. Therefore, their species status under the strict umbrella of the biological species concept remains contested. On the other hand, the phylogenetic species concept defines species as a group of organisms with shared and unique evolutionary histories that possess a defined set of traits [59]. Phylogenetic placement using multi-locus sequence genotyping places G. intestinalis assemblages A and B as paraphyletic lineages nested within other animal-infecting assemblages [3,60,61]. This implies that zoonoses for human pathogenesis occurred convergently. Combined with these phylogenetic classifications, differences in genome attributes and encoded protein repertoires strengthen the notion that G. intestinalis assemblage A and B (and all other assemblages) can be defined as different species as per the phylogenetic-species concept. Additionally, previous genome comparisons estimate only 77% nucleotide-level sequence conservation and 78% amino-acid similarity in the protein-coding regions between assemblages A and B isolates [6]. Genome recombination analyses performed on assemblage A, B, and E isolates further supported this notion [62]. Therefore, from evolutionary, molecular, and genomic standpoints, the combined evidence further support the idea that G. intestinalis assemblages as separate species, as has been proposed by many other Giardia genomic and molecular biologists [9,6366].

We must acknowledge a notable limitation in our study in that our current approach, reliant on Liftoff for genome annotation using the WB and GS references, is constrained in its ability to perform de novo annotation and comprehensive pan-genome analyses, encompassing both core and accessory genes. Recent frameworks towards a pan-genome analysis have been introduced by Seabolt and colleagues which will be a valuable way forward for the Giardia genomic community in performing similar comparative analyses [65]. This alongside increased availability of contiguous and more polished genomes as reference assemblies will be valuable towards understanding molecular variances between different species and substrains of Giardia with an increased level of sensitivity and specificity [9]. Nonethless, our findings are that there exist limited but notable differences between the two human-infecting Giardia assemblages, but little to no variability within the Canadian isolates that we examined. These genomic differences, and unity respectively, point to potential cell biological explanations of membrane-trafficking and bolster the argument that these may in fact be cryptic species. Regardless, the 83 genome assemblies will provide a wealth of new genomic data for biologists to explore.

Supporting information

S1 File. Parameters for building Kraken2 database in order to perform taxonomic assessment of the sequenced Giardia paired-end reads.

https://doi.org/10.1371/journal.pntd.0011837.s001

(TXT)

S2 File. Parameters used for building contigs and scaffolds for each genome assembly using MaSuRCA and Illumina paired-end reads.

https://doi.org/10.1371/journal.pntd.0011837.s002

(TXT)

S3 File. RScript used to generate bar plots for BUSCO assessment score visualisation shown in S1 Fig.

https://doi.org/10.1371/journal.pntd.0011837.s003

(TXT)

S4 File. Bash script use to extract nucleotide and amino acid sequences detailed in S6 Table for open reading frames analysed in this study using GFF and FASTA files from each assembled Giardia genome.

https://doi.org/10.1371/journal.pntd.0011837.s004

(TXT)

S1 Table. Metadata summary of Giardia isolates collected and sequencing statistics collected by Tsui et al.

[24].

https://doi.org/10.1371/journal.pntd.0011837.s005

(CSV)

S2 Table. Kraken2 reports summarising percent of sequenced Giardia reads mapped at the species level.

Tables were used to assess which assemblies were deemed as highly contaminated as detailed in S1 Table.

https://doi.org/10.1371/journal.pntd.0011837.s006

(XLSX)

S3 Table. QUAST statistics for MaSuRCA assembly generated in this study.

https://doi.org/10.1371/journal.pntd.0011837.s007

(CSV)

S4 Table. Liftoff gene prediction statistics for MaSuRCA genome assemblies generated in this study.

https://doi.org/10.1371/journal.pntd.0011837.s008

(XLSX)

S5 Table. Analyses of the ARF GEF proteins in the Canadian Giardia genomes based on Liftoff predictions.

https://doi.org/10.1371/journal.pntd.0011837.s009

(XLSX)

S6 Table. Detailed description of scaffold loci, reference protein and assemblage IDs, gene families, and coding sequences for all trafficking proteins analysed from the MaSuRCA assemblies generated in this study.

https://doi.org/10.1371/journal.pntd.0011837.s010

(CSV)

S1 Fig. BUSCO Assessment results.

Summary plot depicting BUSCO results for the 89 Canadian assemblies generated in this study. Based on the 255 single-copy orthologs from the eukaryotic ortholog database [version 10], the percent of complete and fragmented genes [blue and yellow portions of the bars] conserved across Giardia assemblage A and B ranged from 25 to 30% across all assemblies. Red portion of the bar represents missingness.

https://doi.org/10.1371/journal.pntd.0011837.s011

(TIF)

References

  1. 1. Esch KJ, Petersen CA. Transmission and Epidemiology of Zoonotic Protozoal Diseases of Companion Animals. Clin Microbiol Rev. 2013;26(1):58–85. pmid:23297259
  2. 2. Heyworth MF. Giardia duodenalis genetic assemblages and hosts. Parasite Paris Fr. 2016/03/16 ed. 2016;23:13–13. pmid:26984116
  3. 3. Cacciò SM, Beck R, Lalle M, Marinculic A, Pozio E. Multilocus genotyping of Giardia duodenalis reveals striking differences between assemblages A and B. Int J Parasitol. 2008;38(13):1523–31. pmid:18571176
  4. 4. Morrison HG, McArthur AG, Gillin FD, Aley SB, Adam RD, Olsen GJ, et al. Genomic Minimalism in the Early Diverging Intestinal Parasite Giardia lamblia. Science. 2007;317(5846):1921. pmid:17901334
  5. 5. Adam RD, Dahlstrom EW, Martens CA, Bruno DP, Barbian KD, Ricklefs SM, et al. Genome Sequencing of Giardia lamblia Genotypes A2 and B Isolates (DH and GS) and Comparative Analysis with the Genomes of Genotypes A1 and E (WB and Pig). Genome Biol Evol. 2013;5(12):2498–511. pmid:24307482
  6. 6. Franzén O, Jerlström-Hultqvist J, Castro E, Sherwood E, Ankarklev J, Reiner DS, et al. Draft Genome Sequencing of Giardia intestinalis Assemblage B Isolate GS: Is Human Giardiasis Caused by Two Different Species? Petri W, editor. PLoS Pathog. 2009;5(8):e1000560. pmid:19696920
  7. 7. Wielinga C, Thompson RCA, Monis P, Ryan U. Identification of polymorphic genes for use in assemblage B genotyping assays through comparative genomics of multiple assemblage B Giardia duodenalis isolates. Mol Biochem Parasitol. 2015;201(1):1–4. pmid:26003141
  8. 8. Xu F, Jex A, Svärd SG. A chromosome-scale reference genome for Giardia intestinalis WB. Sci Data. 2020;7(1):38. pmid:32019935
  9. 9. Klotz C, Schmid MW, Winter K, Ignatius R, Weisz F, Saghaug CS, et al. Highly contiguous genomes of human clinical isolates of Giardia duodenalis reveal assemblage- and sub-assemblage-specific presence–absence variation in protein-coding genes. Microb Genomics. 2023 Mar 28;9(3). pmid:36976254
  10. 10. Pollo SMJ, Reiling SJ, Wit J, Workentine ML, Guy RA, Batoff GW, et al. Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation. Parasit Vectors. 2020;13(1):108. pmid:32111234
  11. 11. Jerlström-Hultqvist J, Franzén O, Ankarklev J, Xu F, Nohýnková E, Andersson JO, et al. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate. BMC Genomics. 2010;11(1):543. pmid:20929575
  12. 12. Kooyman FNJ, Wagenaar JA, Zomer A. Whole-genome sequencing of dog-specific assemblages C and D of Giardia duodenalis from single and pooled cysts indicates host-associated genes. Microb Genomics. 2019;5(12):e000302. pmid:31821130
  13. 13. Fantinatti M, Bello AR, Fernandes O, Da-Cruz AM. Identification of Giardia lamblia Assemblage E in Humans Points to a New Anthropozoonotic Cycle. J Infect Dis. 2016;214(8):1256–9. pmid:27511898
  14. 14. Adam RD. Chromosome-size variation in Giardia lamblia: the role of rDNA repeats. Nucleic Acids Res. 1992;20(12):3057–61. pmid:1620602
  15. 15. Tůmová P, Hofštetrová K, Nohýnková E, Hovorka O, Král J. Cytogenetic evidence for diversity of two nuclei within a single diplomonad cell of Giardia. Chromosoma. 2007;116(1):65–78. pmid:17086421
  16. 16. Ankarklev J, Jerlström-Hultqvist J, Ringqvist E, Troell K, Svärd SG. Behind the smile: cell biology and disease mechanisms of Giardia species. Nat Rev Microbiol. 2010;8(6):413–22. pmid:20400969
  17. 17. Cacciò SM, Ryan U. Molecular epidemiology of giardiasis. Mol Biochem Parasitol. 2008;160(2):75–80. pmid:18501440
  18. 18. Ferguson LC, Smith-Palmer A, Alexander CL. An update on the incidence of human giardiasis in Scotland, 2011–2018. Parasit Vectors. 2020;13(1):291. pmid:32513243
  19. 19. Minetti C, Lamden K, Durband C, Cheesbrough J, Fox A, Wastling JM. Determination of Giardia duodenalis assemblages and multi-locus genotypes in patients with sporadic giardiasis from England. Parasit Vectors. 2015;8:444–444. pmid:26338670
  20. 20. Minetti C, Lamden K, Durband C, Cheesbrough J, Platt K, Charlett A, et al. Case-Control Study of Risk Factors for Sporadic Giardiasis and Parasite Assemblages in North West England. J Clin Microbiol. 2015 Oct;53(10):3133–40. pmid:26157151
  21. 21. Noussa R. Basha El, Zaki Mayssa M., Hassanin Omayma M., Rehan Mohamed K., Omran Dalia. Giardia Assemblages A and B in Diarrheic Patients: A Comparative Study in Egyptian Children and Adults. J Parasitol. 2016;102(1):69–74. pmid:26509291
  22. 22. Prucca CG, Lujan HD. Antigenic variation in Giardia lamblia. Cell Microbiol. 2009;11(12):1706–15. pmid:19709056
  23. 23. Rodríguez-Walker M, Molina CR, Luján LA, Saura A, Jerlström-Hultqvist J, Svärd SG, et al. Comprehensive characterization of Cysteine-rich protein-coding genes of Giardia lamblia and their role during antigenic variation. Genomics. 2022;114(5):110462. pmid:35998788
  24. 24. Tsui CKM, Miller R, Uyaguari-Diaz M, Tang P, Chauve C, Hsiao W, et al. Beaver Fever: Whole-Genome Characterization of Waterborne Outbreak and Sporadic Isolates To Study the Zoonotic Transmission of Giardiasis. mSphere. 2018;3(2):e00090–18. pmid:29695621
  25. 25. Natalie Prystajecky, Tsui Clement K.-M., Hsiao William W. L., Uyaguari-Diaz Miguel I., Jordan Ho, Patrick Tang, et al. Giardia spp. Are Commonly Found in Mixed Assemblages in Surface Water, as Revealed by Molecular and Whole-Genome Characterization. Appl Environ Microbiol. 2015;81(14):4827–34. pmid:25956776
  26. 26. Andrews S. FastQC: a quality control tool for high throughput sequence data [Internet]. 2010. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
  27. 27. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):257. pmid:31779668
  28. 28. Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12(1):385.
  29. 29. Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics. 2013;29(21):2669–77. pmid:23990416
  30. 30. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinforma Oxf Engl. 2013;29(8):1072–5. pmid:23422339
  31. 31. Kriventseva EV, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simão FA, et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2019;47(D1):D807–11. pmid:30395283
  32. 32. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. pmid:26059717
  33. 33. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. Open Source Software. 2019;4(43):1686.
  34. 34. R Core Team. R: A Language and Environment for Statistical Computing [Internet]. R Foundation for Statistical Computing, Vienna, Austria; 2020. Available from: https://www.R-project.org/
  35. 35. Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;(btaa1016). Available from: https://doi.org/10.1093/bioinformatics/btaa1016 pmid:33320174
  36. 36. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. pmid:29750242
  37. 37. Leung KF, Dacks JB, Field MC. Evolution of the Multivesicular Body ESCRT Machinery; Retention Across the Eukaryotic Lineage. Traffic. 2008;9(10):1698–716. pmid:18637903
  38. 38. Marti M, Regös A, Li Y, Schraner EM, Wild P, Müller N, et al. An Ancestral Secretory Apparatus in the Protozoan Parasite Giardia intestinalis. J Biol Chem. 2003;278(27):24837–48.
  39. 39. Pipaliya SV, Santos R, Salas-Leiva D, Balmer EA, Wirdnam CD, Roger AJ, et al. Unexpected organellar locations of ESCRT machinery in Giardia intestinalis and complex evolutionary dynamics spanning the transition to parasitism in the lineage Fornicata. BMC Evol Biol; 2021; 19(127) pmid:34446013
  40. 40. Pipaliya SV, Thompson LA, Dacks JB. The reduced ARF regulatory system in Giardia intestinalis pre-dates the transition to parasitism in the lineage Fornicata. Int J Parasitol. 2021; 51(10):825–839. pmid:33848497
  41. 41. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. 1990;8.
  42. 42. Karnkowska A, Treitli SC, Brzoň O, Novák L, Vacek V, Soukal P, et al. The Oxymonad Genome Displays Canonical Eukaryotic Complexity in the Absence of a Mitochondrion. Mol Biol Evol. 2019;36(10):2292–312. pmid:31387118
  43. 43. Salas-Leiva DE, Tromer EC, Curtis BA, Jerlström-Hultqvist J, Kolisko M, Yi Z, et al. A free-living protist that lacks canonical eukaryotic DNA replication and segregation systems. Nat Commun. 2021; 12(1):6003
  44. 44. Tanifuji G, Takabayashi S, Kume K, Takagi M, Nakayama T, Kamikawa R, et al. The draft genome of Kipferlia bialata reveals reductive genome evolution in fornicate parasites. PLOS ONE. 2018;13(3):e0194487. pmid:29590215
  45. 45. Ankarklev J, Franzén O, Peirasmaki D, Jerlström-Hultqvist J, Lebbad M, Andersson J, et al. Comparative genomic analyses of freshly isolated Giardia intestinalis assemblage A isolates. BMC Genomics. 2015;16(1):697. pmid:26370391
  46. 46. Weisz F, Lalle M, Nohynkova E, Sannella AR, Dluhošová J, Cacciò SM. Testing the impact of Whole Genome Amplification on genome comparison using the polyploid flagellated Giardia duodenalis as a model. Exp Parasitol. 2019;207:107776. pmid:31628895
  47. 47. Upcroft JA, Chen N, Upcroft P. Mapping variation in chromosome homologues of different Giardia strains. Mol Biochem Parasitol. 1996;76(1):135–43. pmid:8920002
  48. 48. Upcroft JA, Krauer KG, Upcroft P. Chromosome sequence maps of the Giardia lamblia assemblage A isolate WB. Trends Parasitol. 2010;26(10):484–91. pmid:20739222
  49. 49. Faso C, Konrad C, Schraner EM, Hehl AB. Export of cyst wall material and Golgi organelle neogenesis in Giardia lamblia depend on endoplasmic reticulum exit sites: ER exit sites in Giardia lamblia. Cell Microbiol. 2013;15(4):537–53.
  50. 50. Hehl AB, Marti M, Köhler P. Stage-Specific Expression and Targeting of Cyst Wall Protein–Green Fluorescent Protein Chimeras in Giardia. Mol Biol Cell. 2000;11(5):1789–800. pmid:10793152
  51. 51. Miras SL, Merino MC, Gottig N, Rópolo AS, Touz MC. The giardial VPS35 retromer subunit is necessary for multimeric complex assembly and interaction with the vacuolar protein sorting receptor. Biochim Biophys Acta BBA—Mol Cell Res. 2013;1833(12):2628–38. pmid:23810936
  52. 52. Rivero MR, Miras SL, Feliziani C, Zamponi N, Quiroga R, Hayes SF, et al. Vacuolar Protein Sorting Receptor in Giardia lamblia. PLOS ONE. 2012;7(8):e43712. pmid:22916299
  53. 53. Touz MC, Kulakova L, Nash TE. Adaptor Protein Complex 1 Mediates the Transport of Lysosomal Proteins from a Golgi-like Organelle to Peripheral Vacuoles in the Primitive Eukaryote Giardia lamblia. Mol Biol Cell. 2004;15(7):3053–60.
  54. 54. Zumthor JP, Cernikova L, Rout S, Kaech A, Faso C, Hehl AB. Static Clathrin Assemblies at the Peripheral Vacuole—Plasma Membrane Interface of the Parasitic Protozoan Giardia lamblia. Johnson PJ, editor. PLOS Pathog. 2016;12(7):e1005756. pmid:27438602
  55. 55. Saha N, Dutta S, Datta SP, Sarkar S. The minimal ESCRT machinery of Giardia lamblia has altered inter-subunit interactions within the ESCRT-II and ESCRT-III complexes. Eur J Cell Biol. 2018;97(1):44–62. pmid:29224850
  56. 56. Stefanic S, Morf L, Kulangara C, Regos A, Sonda S, Schraner E, et al. Neogenesis and maturation of transient Golgi-like cisternae in a simple eukaryote. J Cell Sci. 2009;122(16):2846–56.
  57. 57. Zamponi N, Zamponi E, Mayol GF, Lanfredi-Rangel A, Svärd SG, Touz MC. Endoplasmic reticulum is the sorting core facility in the Golgi-lacking protozoan Giardia lamblia. Traffic. 2017;18(9):604–21.
  58. 58. de Queiroz K. Ernst Mayr and the modern concept of species. Proc Natl Acad Sci. 2005;102(suppl 1):6600. pmid:15851674
  59. 59. de Queiroz K. Species Concepts and Species Delimitation. Syst Biol. 2007;56(6):879–86. pmid:18027281
  60. 60. Huey CS, Mahdy MAK, Al-Mekhlafi HM, Nasr NA, Lim YAL, Mahmud R, et al. Multilocus genotyping of Giardia duodenalis in Malaysia. Infect Genet Evol. 2013;17:269–76. pmid:23624189
  61. 61. Lee H, Jung B, Lim JS, Seo MG, Lee SH, Choi KH, et al. Multilocus genotyping of Giardia duodenalis from pigs in Korea. Parasitol Int. 2020;78:102154. pmid:32531468
  62. 62. Xu F, Jerlström-Hultqvist J, Andersson JO. Genome-Wide Analyses of Recombination Suggest That Giardia intestinalis Assemblages Represent Different Species. Mol Biol Evol. 2012;29(10):2895–8. pmid:22474166
  63. 63. Andrews RH, Adams M, Boreham PFL, Mayrhofer G, Melonis BP. Giardia intestinalis: Electrophoretic evidence fora species complex. 1989; 19(2):183–90
  64. 64. Cacciò SM, Lalle M, Svärd SG. Host specificity in the Giardia duodenalis species complex. Infect Genet Evol. 2018;66:335–45. pmid:29225147
  65. 65. Seabolt MH, Roellig DM, Konstantinidis KT. Genomic comparisons confirm Giardia duodenalis sub-assemblage AII as a unique species. Front Cell Infect Microbiol. 2022;12:1010244. pmid:36325462
  66. 66. Wielinga C, Williams A, Monis P, Thompson RCA. Proposed taxonomic revision of Giardia duodenalis. Infect Genet Evol. 2023;111:105430. pmid:36972861