HiC数据处理之HiCup

345

1. 前言

除了hicpro外,有时间再试试hicup,juicer和HiCpipe吧,先试试hicup

相关链接如下:

https://stevenwingett.github.io/HiCUP/
https://github.com/StevenWingett/HiCUP
https://www.bioinformatics.babraham.ac.uk/projects/hicup/

2. 安装和简单使用

# 安装依赖软件
mamba create -n hicup -y
mamba activate hicup
mamba install -y perl bowtie2 R r-tidyverse r-plotly pandoc samtools

# 安装hicup
cd ~/software
wget https://github.com/StevenWingett/HiCUP/archive/refs/tags/v0.9.2.tar.gz # 20230216更新的最新版,而hicpro这个曾经红极一时的软件已经很久没有维护了
tar -zxvf v0.9.2.tar.gz

# 运行
cd /mnt/caigui/41_sk_genome_129M/39_hicup
# 建立基因组索引
bowtie2-build xiaocuiyun_genome_20211217.fa SK.index
# 消化基因组
~/software/HiCUP-0.9.2/hicup_digester --genome sk --re1 ^GATC,mboi xiaocuiyun_genome_20211217.fa
# 创建hicup配置文件
~/software/HiCUP-0.9.2/hicup --example
# 修改配置文件
vim hicup_example.conf

# 我修改的内容如下
# Outdir: ./results
# Threads: 40
# Bowtie2: /home/caigui/mambaforge/envs/hicup/bin/bowtie2
# Index: /mnt/caigui/41_sk_genome_129M/39_hicup/SK.index
# Digest: /mnt/caigui/41_sk_genome_129M/39_hicup/Digest_sk_mboi_None_15-04-24_23-05-2023.txt
# Format: #这里我留空,让它自己检测版本
# #FASTQ files to be analysed, placing paired files on adjacent lines
# /mnt/caigui/41_sk_genome_129M/37_hicpro/fastq/sample1/reads_R1.fastq.gz
# /mnt/caigui/41_sk_genome_129M/37_hicpro/fastq/sample1/reads_R2.fastq.gz

#运行hicup
mkdir results
nohup ~/software/HiCUP-0.9.2/hicup --config hicup_example.conf &

3. 使用后

# 查看log,发现生成html失败了,手动复制命令运行了一下缺成功了
# 命令如下:
R -e "rmarkdown::render('/home/caigui/software/HiCUP-0.9.2/r_scripts/hicup_reporter.rmd', params=list(summary_file='/mnt/caigui/41_sk_genome_129M/39_hicup/reads_R1_2.hicup.bam.HiCUP_summary_report_bbedRpBcSm_15-11-34_23-05-2023.txt', ditag_lengths_file='/mnt/caigui/41_sk_genome_129M/39_hicup/reads_R1_2.ditag_size_distribution_report.txt'), intermediates_dir='/mnt/caigui/41_sk_genome_129M/39_hicup/', output_file='/mnt/caigui/41_sk_genome_129M/39_hicup/reads_R1_2.hicup.bam.HiCUP_summary_report_bbedRpBcSm_15-11-34_23-05-2023.html')" > /dev/null 2>&1

# 为什么自动生成失败了呢?继续查看log
# Found R at '/usr/bin/R'
# 难道我当初运行hicpro的时候没有进入hicpro环境?

# 重新我进入hicpro环境后再运行,Found到的R就是对的了。

4. 格式转换

ls ~/software/HiCUP-0.9.2/Conversion/
# hicup2fithic  hicup2gothic  hicup2hicpipe  hicup2homer  hicup2juicer  hicup2ncc
# hicup的一大亮点就是提供了向很多格式转换的脚本,比如homer,juicer,和hicpipe

~/software/HiCUP-0.9.2/Conversion/hicup2juicer -h
# 可以将bam/sam文件转换为.hic格式
# 然后plotgardener就可以读.hic文件,应该就能用了

# 转换成hic格式前还要预转换一次,将bam转为prejuicer
~/software/HiCUP-0.9.2/Conversion/hicup2juicer results/reads_R1_2.hicup.bam
# 会在results文件夹下生成reads_R1_2.hicup.bam.prejuicer

head -n 1 reads_R1_2.hicup.bam.prejuicer
# 1	1	Chr10	10000002	0	0	Chr10	10143035	1	42	42

# 这个文件还需要用juicer的pre命令生成.hic文件
# https://github.com/aidenlab/juicer/wiki/Pre

# 下载juicer_tools
wget https://github.com/aidenlab/Juicebox/releases/download/v2.20.00/juicer_tools.2.20.00.jar

# 运行pre工具
# 这里sk.chromosomes.size.tab是染色体大小文件,中间tab隔开
java -Xms512m -Xmx2048m -jar juicer_tools.2.20.00.jar pre results/reads_R1_2.hicup.bam.prejuicer results/reads_R1_2.hicup.bam.hic ./sk.chromosomes.size.tab

# 这样就生成了一个reads_R1_2.hicup.bam.hic的.hic文件

# 折腾这个还不如直接用juicer处理数据呢?