==================== Data and genotypes ==================== We have already produced input pangenome VCFs for several datasets from high-quality, haplotype-resolved assemblies that can be used as input to PanGenie. These files were used to produce genotyping results for the HGSVC and HPRC projects. Genotypes for 3,202 samples from the 1000 Genomes Project produced based on these VCFs are also linked below. **Note**: results produced by different versions of PanGenie are not directly comparable, since newer versions of PanGenie produce more accurate genotyping results. ---------------------- PanGenie v1.0.0 ---------------------- .. csv-table:: :header: "Dataset", "PanGenie input VCF", "Callset VCF", "1000G Genotypes (n=3,202)" :width: 100% :widths: auto :align: center "HGSVC-GRCh38 (freeze3, 64 haplotypes)", `bubble-VCF `_ , `callset-VCF `_, `1000G-VCF `_ (PanGenie v1.0.0) "HGSVC-GRCh38 (freeze4, 64 haplotypes)", `bubble-VCF `_ , `callset-VCF `_, `1000G-VCF `_ (PanGenie v1.0.0) "HPRC-GRCh38 (88 haplotypes)", `bubble-VCF `_, `callset-VCF `_ , `1000G-VCF `_ (PanGenie v1.0.0) related publications: | Ebert, P., Audano, P.A., Zhu, Q., Rodriguez-Martin, B., Porubsky, D., Bonder, M.J., | Sulovari, A., Ebler, J. et al. | *Haplotype-resolved diverse human genomes and integrated analysis of structural variation* | Science, 372(6537), 2022 | doi: ``_ | Liao W.-W., Asri M., Ebler J., Doerr D., Haukness M., Hickey G., Lu S., Lucas J. K., | Monlong J., Abel H. J., et al. | *A draft human pangenome reference* | Nature, 617(7960), 2023 | doi: ``_ ---------------- PanGenie v2.1.1 ---------------- .. csv-table:: :header: "Dataset", "PanGenie input VCF", "Callset VCF", "1000G Genotypes (n=3,202)" :width: 100% :widths: auto :align: center "HPRC-CHM13 (88 haplotypes)", `bubble-VCF `_ , `callset-VCF `_, `1000G-VCF `_ (PanGenie v2.1.1) --------------------- PanGenie v3.1.0 --------------------- .. csv-table:: :header: "Dataset", "PanGenie input VCF", "Callset VCF", "1000G Genotypes (n=3,202)" :width: 100% :widths: auto :align: center "HGSVC3+HPRC-CHM13 (214 haplotypes)", `bubble-VCF `_, `callset-VCF `_, `1000G-VCF `_ (PanGenie v3.1.0) related publication: | Logsdon, G. A., Ebert, P., Audano, P. A., Loftus, M., Porubsky, D., Ebler, J., et al. | *Complex genetic variation in nearly complete human genomes* | Nature, 644(8076), 2025 | doi: ``_ ----------------- PanGenie v4.2.1 ----------------- .. csv-table:: :header: "Dataset", "PanGenie input VCF", "Callset VCF", "1000G Genotypes (n=3,202)" :width: 100% :widths: auto :align: center "HPRC2-CHM13 (462 haplotypes)", `bubble-VCF `_, `callset-VCF `_ , `1000G-VCF `_ (PanGenie v4.2.1) "HPRC2-GRCh38 (462 haplotypes)", `bubble-VCF `_, `callset-VCF `_ , "not available" In all cases, the bubble-VCFs provided in the second column were given as input to PanGenie. The callset-VCFs (third column) were used to convert the genotyped VCFs into a biallelic, callset representation. We show the exact commands to be used below:: PanGenie-index -v -r -t -o PanGenie -f -i -s -j -t -o cat _genotyping.vcf | python3 convert-to-biallelic.py > _genotyping_biallelic.vcf The script ``convert-to-biallelic.py`` can be found `here `_.