{"id":2564,"date":"2024-02-16T08:17:47","date_gmt":"2024-02-15T23:17:47","guid":{"rendered":"https:\/\/nagasakilab.csml.org\/en\/?page_id=2564"},"modified":"2025-03-18T13:03:39","modified_gmt":"2025-03-18T04:03:39","slug":"jogo-lilr_caller_v1","status":"publish","type":"page","link":"https:\/\/nagasakilab.csml.org\/en\/jogo-lilr_caller_v1","title":{"rendered":"JoGo-LILR Caller v1"},"content":{"rendered":"\n<p>JoGo-LILR Caller is a software to call the haplotype pattern of complex LILR region especially LILRB3-LILRA6 region with various CN patterns. JoGo-LILR takes the short read sequencing data and outputs diploid CNs and probable haplotype from short read sequencing data.<\/p>\n\n\n\n<p><strong>Download<\/strong><\/p>\n\n\n\n<p>The package of JoGo-LILR Caller can be downloaded from here (<strong><a href=\"https:\/\/nagasakilab.csml.org\/data\/JoGo-LILR_v1.tgz\">package of JoGo-LILR_v1.tgz<\/a><\/strong>)<\/p>\n\n\n\n<p><strong>Required Python3 Packages<\/strong><\/p>\n\n\n\n<p>numpy pandas scipy plotly.express kaleido<\/p>\n\n\n\n<p><strong>Supported Reference Assembly<\/strong><\/p>\n\n\n\n<p>First you need to align short read sequencing data to GRCh38 based-reference assembly. JoGo-LILR Caller was developed by using the same reference assembly that was used in the 1000 Genomes project (<a href=\"https:\/\/github.com\/igsr\/1000Genomes_data_indexes\/blob\/master\/data_collections\/1000_genomes_project\/README.1000genomes.GRCh38DH.alignment\">https:\/\/github.com\/igsr\/1000Genomes_data_indexes\/blob\/master\/data_collections\/1000_genomes_project\/README.1000genomes.GRCh38DH.alignment<\/a>.), named hs38DH.<\/p>\n\n\n\n<p>If the major reference assembly is GRCh38 coordinate, JoGo-LILR Caller will work but our team does not test intensively.<\/p>\n\n\n\n<p>The hs38DH.fa can be downloaded from<\/p>\n\n\n\n<p>FTP ftp:\/\/<a href=\"http:\/\/ftp.1000genomes.ebi.ac.uk\/vol1\/ftp\/technical\/reference\/GRCh38_reference_genome\/GRCh38_full_analysis_set_plus_decoy_hla.fa\">ftp.1000genomes.ebi.ac.uk\/vol1\/ftp\/technical\/reference\/GRCh38_reference_genome\/GRCh38_full_analysis_set_plus_decoy_hla.fa<\/a><\/p>\n\n\n\n<p>or<\/p>\n\n\n\n<p>HTTPS&nbsp;<\/p>\n\n\n\n<p>The same reference assembly is included in the download package (package of JoGo-LILR_v1.tgz).<\/p>\n\n\n\n<p><strong>Step1 Alignment<\/strong><\/p>\n\n\n\n<p>If you already have the aligned result to GRCh38 coordinate bam or cram file you can skip this step.<\/p>\n\n\n\n<p>The usual protocol of whole-genome short read sequencing data is paired-end mode and the data can be aligned by using alignment tools, e.g. BWA and bowtie2. In JoGo-LILR Caller, the aligned result using bwa was used but can be used for other alignment tool.<\/p>\n\n\n\n<p>For the paired-end fastq file, test_R1.fa and test_R2.fa.<\/p>\n\n\n\n<p>bwa mem&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code has-background has-small-font-size\" style=\"background-color:#ededed\"><code><mark style=\"background-color:#eeeeee\" class=\"has-inline-color\">bwa mem hs38DH.fa test_R1.fq test_R2.fq | samtools view -bS &gt; test.tmp.bam\n<\/mark>\n<mark style=\"background-color:#eeeeee\" class=\"has-inline-color\">samtools sort test.tmp.bam -o test.bam\n<\/mark>\n<mark style=\"background-color:#eeeeee\" class=\"has-inline-color\">samtools index test.bam<\/mark><\/code><\/pre>\n\n\n\n<p>&nbsp;Then you will obtain test.bam and test.bam.bai<\/p>\n\n\n\n<p>The JoGo-LILR Caller also accepts cram format instead of bam.<\/p>\n\n\n\n<p><strong>Step2 JoGo-LILR Preprocessing<\/strong><\/p>\n\n\n\n<p>For the aligned cram or bam file, JoGo-LILR calls the coverages for four specific regions in LILRB3 and LILRA6. To normalize GC bias and other factors (e.g., global depth of coverage), JoGo-LILR caller uses CNVNator.<\/p>\n\n\n\n<p>JoGo-LILR package includes the singularity image for customized CNVNator for JoGo-LILR and you don\u2019t need to install CNVNator independently.&nbsp;<\/p>\n\n\n\n<p>python3 .\/jogo-lilr-preprocess.py \u2013id HG007 &#8211;bam input\/HG007.bam<\/p>\n\n\n\n<p>For a cram file you might need to specify the cache directory of reference assembly. For hs38DH.fa case, the cache file is bundled and would work with the following environment setting. The setting is not required for bam file.<\/p>\n\n\n\n<p>export REF_CACHE=..\/input\/hs38DH\/cache\/%2s\/%2s\/%s<\/p>\n\n\n\n<p>export REF_PATH=..\/input\/hs38DH\/cache\/%2s\/%2s\/%s<\/p>\n\n\n\n<p><strong>Note<\/strong>: The message &#8216;Can&#8217;t find directory &#8216;bin_1000&#8217;\u2026&#8217; is shown when you process the step2 command, but you don&#8217;t need to worry about this output.<\/p>\n\n\n\n<p><strong>Step3 JoGo-LILR Diploid Copy Number Calling<\/strong><\/p>\n\n\n\n<p>After calling the coverages for four specific regions in LILRB3 and LILRA6 in Step2, the diploid copy numbers of LILRB3 and LILRA6 can be jointly called with the prior information from Hapmap 3,204 samples (with default parameter).<\/p>\n\n\n\n<p>After executing the example in Step2, output\/result.HG007.for_caller.txt will be created. By taking the preprocessing file as input diploid copy number for LILRB3 and LILRA6 can be called with the following command.<\/p>\n\n\n\n<p>python3 .\/jogo-lilr-caller.py &#8211;region_cnv output\/result.HG007.for_caller.txt &#8211;targetsamples HG007<\/p>\n\n\n\n<p>After executing the command, the following files will be created.<\/p>\n\n\n\n<p>test.diplod_stable.target.tsv<\/p>\n\n\n\n<p>sampleid&nbsp; &nbsp; &nbsp; &nbsp; selected_cnv_type &nbsp; &nbsp; &nbsp; selected_cnv_type_distance&nbsp; &nbsp; &nbsp; X(LILRB3+LILRA6)&nbsp; &nbsp; &nbsp; &nbsp; Y(LILRB3core\/LILRA6core)&nbsp; &nbsp; &nbsp; &nbsp; cluster_id&nbsp; &nbsp; &nbsp; group &nbsp; popname gpopname<\/p>\n\n\n\n<p>HG007 &nbsp; &nbsp; &nbsp; CN5_B2A3&nbsp; &nbsp; &nbsp; &nbsp; 0.3639&nbsp; 5.3328&nbsp; 0.5194&nbsp; 5 &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp; global&nbsp; global<\/p>\n\n\n\n<p>The selected_cnv_type is the called copy number of LILRB3 and LILRA6. CN5_B2A3 means, the total copies of LILRB3 and LILRA6 are 5. The total copies of LILRB3 is 2 and LILRA6 is 3.<\/p>\n\n\n\n<p>test.diplod_stable.all.{pdf\/png\/html} is the clustering plot with prior dataset (default is 3204 samples from Hapmap data).<\/p>\n\n\n\n<p>test.diplod_stable.target.{pdf\/png\/html} is the selected dataset specified in &#8211;targetsamples from test.diplod_stable.all.{pdf\/png\/html}.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/hrC44yR2NsnUIjocxk6SXAiWngC0pKOl3KQ9MlKzLUziTu6YDFKVAcAB3xNoJcI3kKBEah815CnJ5pvP8mwB7xsIPSR48a-9Y_ttv95eX8Wd4isCHXcpJ98QCnG_DeCuXkyF6BSYPpZgafBbeSBmNF8\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/eMqg-2vyiS_VtTk9oYMwrjtY91ejmkSAaNW0Bo7szz-9SUkRzFKEEXhP6mbr4ER73Bu5182bmFuI7H7CkWy0Z7kKP74mlDwIMdIJNiUA4Di5G22C46lSBJUsHRGW_G6JsIqWi-RPfsFl-nHlBi6tRTE\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>Step4 JoGo-LILR Haploid Copy Number Calling<\/strong><\/p>\n\n\n\n<p>After calling the coverages for four specific regions in LILRB3 and LILRA6 in Step2, the haploid copy numbers of LILRB3 and LILRA6 can be jointly called with the prior information from Hapmap 3,204 samples (with default parameter).<\/p>\n\n\n\n<p>After executing the example in Step2, output\/result.HG007.for_caller.txt will be created. By taking the preprocessing file as input diploid copy number for LILRB3 and LILRA6 can be called with the following command.<\/p>\n\n\n\n<p>python3 .\/jogo-lilr-caller.py &#8211;region_cnv output\/result.HG007.for_caller.txt &#8211;targetsamples HG007 &#8211;distance_type haploid<\/p>\n\n\n\n<p>&nbsp;After executing the command, the following files will be created.<\/p>\n\n\n\n<p>output\/test.haploid.target.tsv<\/p>\n\n\n\n<p>sampleid&nbsp; &nbsp; &nbsp; &nbsp; selected_cnv_type &nbsp; &nbsp; &nbsp; selected_cnv_type_distance&nbsp; &nbsp; &nbsp; X(LILRB3+LILRA6)&nbsp; &nbsp; &nbsp; &nbsp; Y(LILRB3core\/LILRA6core)&nbsp; &nbsp; &nbsp; &nbsp; cluster_id &nbsp; group &nbsp; popname gpopname<\/p>\n\n\n\n<p>HG007_0.1 &nbsp; &nbsp; &nbsp; CN5_B1A1_B1A2 &nbsp; 0.3639&nbsp; 5.3328&nbsp; 0.5194&nbsp; 5 &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp; global&nbsp; global<\/p>\n\n\n\n<p>The selected_cnv_type is the called copy number of LILRB3 and LILRA6. CN5_B1A1_B1A2 means, the total copies of LILRB3 and LILRA6 are 5. The combination of haploid is B1A1 (LILRB3 is 1 and LILRA6 is 1)and B1A2 (LILRB3 is 1 and LILRA6 is 2).<\/p>\n\n\n\n<p>test.haploid.all.{pdf\/png\/html} is the clustering plot with prior dataset (default is 3204 samples from Hapmap data).<\/p>\n\n\n\n<p>test.haploid.target.{pdf\/png\/html} is the selected dataset specified in \u2013targetsamples from test.haploid.all.{pdf\/png\/html}.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/dX6uZKDhzo7ZIIbx9V4CcRRKk5GmmVOkn7ShXeoS5eWYKG6wK9744jS6FIeV0GT2GrcYejA5fM9y4LhNxPZ2lBEwPqcY_uttt5nVEOTbrrDJpswInDv1LfiPhhdpeAxjY-RYemhTXXYJ1y1dnDyt63E\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/ASAylhxPRmjxkwPD8uNuaHTikI9dLYNodKSYyOrWprQZ3QGkgpR_SDkuZbhBDpfl9vLBNCfEeSqTtgfoDzqKHDSrfrIMAPY68oO0fEHg7RldLQDxk4YOFEqobm6Xx54fQOqHL-BA43K7QujT511NTck\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>Step5 Multiple Sample Joint-calling<\/strong><\/p>\n\n\n\n<p>In Step3 and Step4, only one sample is called in jogo-lilr-caller.py. If you merge each preprocessing result in Step2 with the header, jogo-lilr-caller can call multiple samples at once.<\/p>\n\n\n\n<p>python3 .\/jogo-lilr-caller.py &#8211;region_cnv output\/result.HG007_and_HG008.for_caller.txt &#8211;targetsamples HG007,HG008 &#8211;distance_type haploid<\/p>\n\n\n\n<p><strong>Step6 Calculate probable haplotype pair of LILRB3 and LILRA6<\/strong><\/p>\n\n\n\n<p>By executing Step4, all possible patterns of haplotype pairs are listed in output\/test.haploid.target.tsv (in the example of Step4).<\/p>\n\n\n\n<p>If multiple candidates exist, the script informs the most probable pair of haplotype. If multiple candidates exist, all possible pairs of haplotype with probabilities are informed (with \u2013all options).<\/p>\n\n\n\n<p>python3 calculate_allelic_probability.py &#8211;input output\/test.haploid.target.tsv<\/p>\n\n\n\n<p>sampleid&nbsp; &nbsp; &nbsp; &nbsp; selected_cnv_type &nbsp; &nbsp; &nbsp; selected_cnv_type_distance&nbsp; &nbsp; &nbsp; X(LILRB3+LILRA6)&nbsp; &nbsp; &nbsp; &nbsp; Y(LILRB3core\/LILRA6core)&nbsp; &nbsp; &nbsp; &nbsp; cluster_id &nbsp; group &nbsp; popname gpopname&nbsp; &nbsp; &nbsp; &nbsp; SampleCNType_bestguess&nbsp; SampleCNType_bestguess_probability&nbsp; &nbsp; &nbsp; note<\/p>\n\n\n\n<p>HG007_0.1 &nbsp; &nbsp; &nbsp; CN5_B1A1_B1A2 &nbsp; 0.3639&nbsp; 5.3328&nbsp; 0.5194&nbsp; 5 &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp; global&nbsp; global&nbsp; B1A1\/B1A2 &nbsp; &nbsp; &nbsp; 1.0<\/p>\n\n\n\n<p>For multiple candidate case, the output is like<\/p>\n\n\n\n<p>NA21102 CN4_B1A1_B1A1_CN4_B1A0_B1A2 &nbsp; &nbsp; 0.092 &nbsp; 3.9552&nbsp; 0.9362&nbsp; 3 &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp; global&nbsp; global&nbsp; B1A1\/B1A1 &nbsp; &nbsp; &nbsp; 0.9868&nbsp; B1A0\/B1A2 &nbsp; 0.0132<\/p>\n\n\n\n<p>In this case, the most probable pair if B1A1 and B1A1 as 98.68% and the second candidate is B1A0 and B1A2 as 1.32%<\/p>\n\n\n\n<p><strong>Step7 Calculate probable haplotype pair of LILRB3 and LILRA6 for trio dataset<\/strong><\/p>\n\n\n\n<p>For the trio dataset, the most probable pair can be calculated to keep the Mendelian rule from the haploid estimated result pairs for Step4.&nbsp;<\/p>\n\n\n\n<p>In the following example, pack the father, mother, child to one line by pack_pedigree_estimated_result.py with &#8211;ped information (the format is the same as PLINK).&nbsp;<\/p>\n\n\n\n<p>&gt;python3 pack_pedigree_estimated_result.py &#8211;input ..\/test\/paper.hapmap3202.allele_stable.tsv &#8211;ped ..\/test\/1kGP.3202_samples.pedigree_info.txt &#8211;output output\/hapmap3202_finalreport.estimated.by_pedigree.all.tsv<\/p>\n\n\n\n<p>The pedigree_performance_check.py command checks the consistency with Mendelian rule among father, mother and child.&nbsp;&nbsp;<\/p>\n\n\n\n<p>&gt;python3 pedigree_performance_check.py &#8211;input output\/hapmap3202_finalreport.estimated.by_pedigree.all.tsv &#8211;output output\/hapmap3202_finalreport.estimated.by_pedigree.all.withQC.tsv<\/p>\n\n\n\n<p>&gt;cat output\/hapmap3202_finalreport.estimated.by_pedigree.all.withQC.tsv | grep -v NON_PEDIGREE &gt; output\/hapmap3202_finalreport.estimated.by_pedigree.all.withQC.onlyped.tsv<\/p>\n\n\n\n<p>The calculate_trio_probability.py calculates the most probable pair from multiple candidate pairs with Mendelian rule.<\/p>\n\n\n\n<p>&gt;python3 calculate_trio_probability.py &#8211;input output\/hapmap3202_finalreport.estimated.by_pedigree.all.withQC.onlyped.tsv &#8211;output output\/hapmap3202_finalreport.estimated.by_pedigree.all.withQC.best_guess.onlyped.tsv<\/p>\n","protected":false},"excerpt":{"rendered":"<p>JoGo-LILR Caller is a software to call the haplotype pattern of complex LILR region especially LILRB3-LILRA6 region with various CN patterns. JoGo-LILR takes the short read sequencing data and outputs diploid CNs and probable haplotype from short read sequencing data. &hellip; <a href=\"https:\/\/nagasakilab.csml.org\/en\/jogo-lilr_caller_v1\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"passster_activate_protection":false,"passster_protect_child_pages":"","passster_protection_type":"password","passster_password":"JJathW#YQ!h4","passster_activate_overwrite_defaults":"","passster_headline":"","passster_instruction":"","passster_placeholder":"","passster_button":"","passster_id":"","passster_activate_misc_settings":"","passster_redirect_url":"","passster_hide":"no","passster_area_shortcode":"","footnotes":""},"class_list":["post-2564","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/pages\/2564","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/comments?post=2564"}],"version-history":[{"count":12,"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/pages\/2564\/revisions"}],"predecessor-version":[{"id":2785,"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/pages\/2564\/revisions\/2785"}],"wp:attachment":[{"href":"https:\/\/nagasakilab.csml.org\/en\/wp-json\/wp\/v2\/media?parent=2564"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}