sudophredを使ってみた - 気の向くままに

phredPhrapしてconsedで見る時にリファレンスになるシークエンスを加える。
exonをシークエンスしてる時に、consed上で「どこからがexonなのか分からない」というのが発端。
00A18がemsemblのゲノム配列からsudophredで作ったもの。それ以外は自分がシークエンサーで実際に読んだもの。

大体の流れはこんな感じです。

# step1. 見たい遺伝子のexonのゲノム配列を取ってくる
# step1-1. ensemblからexonの領域を取ってくる（ここではABCB1という遺伝子で、ENST*で指定している）
echo "select display_id, transcript_stable_id, rank, chr_name,
exon_chrom_start, exon_chrom_end from
hsapiens_gene_ensembl_structure__structure__main WHERE
transcript_stable_id LIKE 'ENST00000265724';" | mysql -A -u anonymous
-h ensembldb.ensembl.org -D ensembl_mart_38 > ABCB1.txt

# step1-2. 取ってきた領域に対応するゲノム配列を取ってくる、getSeq4ppcp.rbのコードは後述
ruby getSeq4ppcp.rb ABCB1.txt
# 各exonごとのFASTAファイルの出力が得られる
exon1.ref exon2.ref exon3.ref .. exon15.ref exon16.ref exon17.ref

# step2. FASTAファイルを元に疑似のシークエンス波形ファイル（scfファイル、またphdファイルとpolyファイルも）を生成する。
# step2-1. exon16.refを対応するディレクトリのedit_dirに配置（./16/edit_dir/以下に置く）する。ここでは16番目のexonとした。
# step2-2. edit_dirにてsudophredを走らせる。
sudophred exon16.ref
# primer IDがないとかの警告は出るが問題ない。

# step3. 後はいつもと同じ
phredPhrap
consed

getSeq4ppcp.rbはbiorubyのBio::Ensembl::Human.exportviewを使いました。
これ便利だわー。開発者の人達ありがとう、役に立ってます！

#! /usr/bin/env ruby

# How to use
# $0.rb query.txt

begin
  require 'bio'

  io = File.open(ARGV.shift)
  io.each do |line|
    next if line =~ /display_id/
    line.chomp!
    display_id, transcription_stable_id, rank, chr_name, exon_chrom_start, exon_chrom_end = line.split(/\t/)
#    break if rank.to_i >= 3  # For debug
    out = File.open("exon#{rank}.ref", "w")
    seq = Bio::Ensembl::Human.exportview(chr_name, exon_chrom_start, exon_chrom_end)
    result = ["> ", rank, seq.sub!(">", " ")].join("")
    puts     result  # For check
    out.puts result
    out.close
  end
  io.close
rescue => ex
  p ex.message
end

参考にしたサイト（公開に感謝！）
構成的 Bio::Ensembl
http://d.hatena.ne.jp/nakao_mitsuteru/20060414/p2
ichan氏 Ensembl Gene ID からゲノムの位置を取ってくるワンライナー
 http://itoshi.tv/d/?date=20060502-5
piyo-chan氏メモ　genomic structureを知るのに便利なSQL文
http://d.hatena.ne.jp/piyo-chan/20060623/p1
transcriptのID（例：ENST00000333535)でエクソン情報を引き出すには
http://d.hatena.ne.jp/piyo-chan/20060522/p1