HiC数据处理之HiCPro
1. 前言
处理hicpro,比较好的还有hicup,juicer和HiCpipe 可视化的话,考虑使用HiCExplorer,HiCPlotter,juicer和hicpro hicpro的作者已经很久没有维护了,安装和使用过程中有很多坑,需要非常注意!
2. Try
# 安装,github下载压缩包
tar -zxvf HiC-Pro-3.1.0.tar.gz
cd HiC-Pro-3.1.0
# 修改environment.yml中的name: HiC-Pro_v3.1.0为name: hicpro
# 创建conda环境
mamba env create -f environment.yml
# 额外的一步,没有这一步的话,下一步的make configure会卡住🌟🌟🌟
mamba install -y R=4.2
# 安装
vim ~/software/HiC-Pro-3.1.0_1/config-install.txt # 修改PREFIX =/home/caigui/software
make configure
make install
# 数据预处理
cd /mnt/caigui/41_sk_genome_129M/37_hicpro
# 复制基因组文件过来
cp ../08_newgenome/xiaocuiyun_genome_20211217.fa .
# step1 生成酶切后文件
~/software/hicpro/bin/utils/digest_genome.py -r ^GATC -o sk.digest.genome.bed xiaocuiyun_genome_20211217.fa
# step2 生成染色体大小文件
samtools faidx xiaocuiyun_genome_20211217.fa
cat xiaocuiyun_genome_20211217.fa.fai | awk '{print $1"\t"$2}' > sk.chromosomes.size.tab
# step3 构建索引
cp xiaocuiyun_genome_20211217.fa index/
bowtie2-build --threads 40 -f index/xiaocuiyun_genome_20211217.fa index/xiaocuiyun_genome_20211217.fa
# 准备原始数据文件
cd /mnt/caigui/41_sk_genome_129M/37_hicpro/RAWDATA/sample1
cp /mnt/caigui/33_SKDeNovo/01_data/08_allhic/rawdata/combine/*.gz . #据说这里不能软链接,我也没试
# 最终的目录样式,要非常注意基因组和索引文件都要放在一个文件夹内,而且索引的名称为基因组文件的全名,否则运行时会报错🌟🌟🌟
# ├── config_hicpro.txt
# ├── fastq
# │ └── sample1
# │ ├── reads_R1.fastq.gz
# │ └── reads_R2.fastq.gz
# ├── index
# │ ├── xiaocuiyun_genome_20211217.fa
# │ ├── xiaocuiyun_genome_20211217.fa.1.bt2
# │ ├── xiaocuiyun_genome_20211217.fa.2.bt2
# │ ├── xiaocuiyun_genome_20211217.fa.3.bt2
# │ ├── xiaocuiyun_genome_20211217.fa.4.bt2
# │ ├── xiaocuiyun_genome_20211217.fa.rev.1.bt2
# │ └── xiaocuiyun_genome_20211217.fa.rev.2.bt2
# ├── sk.chromosomes.size.tab
# ├── sk.digest.genome.bed
# 复制config文件
cd /mnt/caigui/41_sk_genome_129M/37_hicpro
cp ~/software/HiC-Pro_3.1.0/config-hicpro.txt .
# 编辑它,这里要注意BOWTIE2_IDX_PATH最后的文件夹名称后面不要加“/”,否则会报错🌟🌟🌟
# 我要编辑的部分如下:
# SORT_RAM = 100000M 🌟🌟🌟 这个用默认的1000M分析过程中会报错,这里扩大100倍试试
# N_CPU = 40
# BOWTIE2_IDX_PATH = /mnt/caigui/41_sk_genome_129M/37_hicpro/index
# REFERENCE_GENOME = xiaocuiyun_genome_20211217.fa
# GENOME_SIZE = /mnt/caigui/41_sk_genome_129M/37_hicpro/sk.chromosomes.size.tab
# GENOME_FRAGMENT = /mnt/caigui/41_sk_genome_129M/37_hicpro/sk.digest.genome.bed
# LIGATION_SITE = GATCGATC
# 注意:以上修改,需要添加路径(/mnt/caigui/41_sk_genome_129M/37_hicpro/)的和不需要添加路径的(REFERENCE_GENOME),都是必须的。否则会报错,而且还是运行了几天后才报错!
# 非常坑人一定要注意!!!
# 运行
nohup ~/software/hicpro/bin/HiC-Pro -i /mnt/caigui/41_sk_genome_129M/37_hicpro/fastq -o ./Resultes -c config_hicpro.txt &
# 转换为.hic格式
~/software/hicpro/bin/utils/hicpro2juicebox.sh -i Resultes/hic_results/data/sample1/sample1.allValidPairs -g sk.chromosomes.size.tab -j ../38_hicup/juicer_tools.2.20.00.jar
ls sample1.allValidPairs.hic #在当前目录下生成一个.hic文件
3. 后记
这个软件的安装和运行过程中,坑非常多,我遇到的所有坑都列在上面了⬆️!