Modeling Dependencies in Protein-DNA Binding Sites:
Analysis of Aligned Data
To test whether dependencies can be found in DNA binding sites, we have extracted aligned sites from the TRANSFAC database, version 6.2. We have used TRANSFAC's original alignment, and built 95 datasets for proteins having at least 20 known binding sites. For each group, we've performed a 10-fold cross validation test, learning a model on 90% of the sites, and then calculating the log-likelihood of the rest 10%.

The following images show the differences in the average log-likelihood per instance on the test data, when comparing all learned models versus the learned PSSM.
A single sided paired-t-test was used to determine whether the dependency models' log-loss distributions significantly over-performed the PSSM. (p-value < 0.05)