python - Motif search with Gibbs sampler -


i beginner in both programming , bioinformatics. so, appreciate understanding. tried develop python script motif search using gibbs sampling explained in coursera class, "finding hidden messages in dna". pseudocode provided in course is:

gibbssampler(dna, k, t, n)     randomly select k-mers motifs = (motif1, …, motift) in each string         dna     bestmotifs ← motifs     j ← 1 n         ← random(t)         profile ← profile matrix constructed strings in motifs                    except motifi         motifi ← profile-randomly generated k-mer in i-th sequence         if score(motifs) < score(bestmotifs)             bestmotifs ← motifs     return bestmotifs 

problem description:

code challenge: implement gibbssampler.

input: integers k, t, , n, followed collection of strings dna. output: strings bestmotifs resulting running gibbssampler(dna, k, t, n) 20 random starts. remember use pseudocounts!

sample input:

 8 5 100  cgcccctctcgggggtgttcagtaaccggcca  gggcgaggtatgtgtaagtgccaaggtgccag  tagtaccgagaccgaaagaagtatacaggcgt  tagatcaagtttcaggtgcacgtcggtgaacc  aatccaccagctccacgtgcaatgttggccta 

sample output:

 tctcgggg  ccaaggtg  tacaggcg  ttcaggtg  tccacgtg 

i followed pseudocode best of knowledge. here code:

def buildprofilematrix(dnamatrix):     profilematrix = [[1 x in xrange(len(dnamatrix[0]))] x in xrange(4)]     indices = {'a':0, 'c':1, 'g': 2, 't':3}     seq in dnamatrix:     in xrange(len(dnamatrix[0])):                     profilematrix[indices[seq[i]]][i] += 1     probmatrix = [[float(x)/sum(zip(*profilematrix)[0]) x in y] y in profilematrix]     return probmatrix def profilerandomgenerator(profile, dna, k, i):     indices = {'a':0, 'c':1, 'g': 2, 't':3}     score_list = []     x in xrange(len(dna[i]) - k + 1):         probability = 1         window = dna[i][x : k + x]     y in xrange(k):         probability *= profile[indices[window[y]]][y]     score_list.append(probability)     rnd = uniform(0, sum(score_list))     current = 0     z, bias in enumerate(score_list):         current += bias         if rnd <= current:             return dna[i][z : k + z] def score(motifs):     profilematrix = [[0 x in xrange(len(motifs[0]))] x in xrange(4)]     indices = {'a':0, 'c':1, 'g': 2, 't':3}     seq in motifs:         in xrange(len(motifs[0])):                         profilematrix[indices[seq[i]]][i] += 1     score = len(motifs)*len(motifs[0]) - sum([max(x) x in zip(*profilematrix)])     return score random import randint, uniform     def gibbssampler(k, t, n):      dna = ['cgcccctctcgggggtgttcagtaaccggcca',     'gggcgaggtatgtgtaagtgccaaggtgccag',     'tagtaccgagaccgaaagaagtatacaggcgt',     'tagatcaagtttcaggtgcacgtcggtgaacc',     'aatccaccagctccacgtgcaatgttggccta']     motifs = []     in [randint(0, len(dna[0])-k) x in range(len(dna))]:         j = 0         kmer = dna[j][i : k+i]         j += 1         motifs.append(kmer)     bestmotifs = []     s_best = float('inf')     in xrange(n):         x = randint(0, t-1)     motifs.pop(x)     profile = buildprofilematrix(motifs)     motif = profilerandomgenerator(profile, dna, k, x)     motifs.append(motif)     s_motifs = score(motifs)     if s_motifs < s_best:         s_best = s_motifs         bestmotifs = motifs return [s_best, bestmotifs]  k, t, n =8, 5, 100             best_motifs = [float('inf'), none]  # repeat gibbs sampler search 20 times. repeat in xrange(20):     current_motifs = gibbssampler(k, t, n)     if current_motifs[0] < best_motifs[0]:         best_motifs = current_motifs # print , save answer. print '\n'.join(best_motifs[1])             

unfortunately, code never gives same output solved example. besides, while trying debug code found weird scores define mismatches between motifs. however, when tried run score function separately, worked perfectly.

each time run script, output changes, anyway here example of 1 of outputs input present in code:

example output of code

tatgtgta tatgtgta tatgtgta ggtgttca tatacagg 

could please me debug code?!! spent whole day trying find out what's wrong although know might silly mistake made, eye failed catch it.

thank all!!

finally, found out wrong in code! in line 54:

motifs.append(motif) 

after randomly removing 1 of motifs, followed building profile out of these motifs randomly selecting new motif based on profile, should have added selected motif in same position before removal not appended end of motif list.

now, correct code is:

motifs.insert(x, motif) 

the new code worked expected.


Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

java - Copying object fields -

c++ - Clear the memory after returning a vector in a function -