Skip to content

Finding a Protein Motif

Rosalind Problem

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif (N{P}[ST]{P}), output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614

Python Playground

ids = '''A5GIU0 P29460_I12B_HUMAN P01215_GLHA_HUMAN P02186 P00748_FA12_HUMAN A2A2Y4 Q5FTZ8 P08709_FA7_HUMAN P01045_KNH2_BOVIN Q3T0C9 P06765_PLF4_RAT A3N0C7 P00743_FA10_BOVIN P40225_TPO_HUMAN ''' # ids is given as input UniProt Protein IDs # Write your code here ans = '''P29460_I12B_HUMAN 125 135 222 303 P01215_GLHA_HUMAN 76 102 P00748_FA12_HUMAN 249 433 A2A2Y4 90 359 407 Q5FTZ8 49 62 P08709_FA7_HUMAN 205 382 P01045_KNH2_BOVIN 47 87 168 169 197 204 280 Q3T0C9 15 38 P06765_PLF4_RAT 82 A3N0C7 59 P00743_FA10_BOVIN 218 P40225_TPO_HUMAN 197 206 234 255 340 348 ''' Ex().has_output(ans) success_msg("Great job!")