1) be longer than 500 nt or shorter than 20 nt;
2) contain characters other than a c g t A C G T;
Example fasta file:
Select one from the available species-specific BP models.
You can also limit the output to the AG-dinucleotide exclusion zone (AGEZ) only.
Submit and please wait. (Running time is directly proportional to input size)
Results are shown in plain text, tab delimited, one line per BP candidate.
seq_id - Sequence Identifier
agez - AG dinucleotide Exclusion Zone length
ss_dist - Distance to 3' splice site
bp_seq - BP sequence (nonamer; from -5 to +3 relative to the BP adenine)
bp_scr - BP sequence score using a variable order Markov model
y_cont - Pyrimidine content between the BP adenine and the 3' splice site
ppt_off - Polypyrimidine tract offset relative to the BP adenine
ppt_len - Polypyrimidine tract length
ppt_scr - Polypyrimidine tract score
svm_scr - Final BP score using the SVM classifier
Generally only consider BPs with svm_score > 0|
Keep BPs close to the AGEZ (distance to 3'ss is approx. within AGEZ + 9nt), with BP-score > 0 and with svm_score > 0.
If there is only one such BP, take that one. If there is more than one, you may keep the one with the highest svm_score if you only need one BP.
If there are none, drop the BP_score > 0 condition and consider the BPs with svm_score > 0.
This script will generate a single BP per intron fulfilling this conditions (if it exists) from the SVM-BPfinder output:
perl calculate_best_BP_per_intron.pl < svm_ppfinder_output.txt > best_bp_per_intron.txt
see (Rodor et al. 2016) for an appplication of this
The SVM-BPfinder software as a standalone tool is available at https://bitbucket.org/regulatorygenomicsupf/svm-bpfinder.