First of all, sorry about the extensive size of the title, i could't find a better what to explain where i want to get to with this bash script.
I have a very large file (multifasta) that looks like this:
and goes on....
I also have two patterns:
'GATA' and 'TCAT'
I already know that those 2 patterns exist in every line that doesn't begin with '>', sometimes more than once. So, my objective is to print the '>' line and then get the distance between all the combination of the two patterns in the next line to it, like this:
29 #distance between the only 'GATA' and the first 'TCAT'
41 #distance between the only 'GATA' and the second 'TCAT'
2 #distance between the only 'GATA' and the first 'TCAT'
9 #distance between the only 'GATA' and the second 'TCAT'
4 #distance between the first 'GATA' and first 'TCAT'
23 #distance between the first 'GATA' and second 'TCAT'
6 #distance between the second 'GATA' and the second 'TCAT'
In the third block, there is no distance between the second 'GATA' and the first 'TCAT' because the second pattern appears before the first pattern.
I tried the following code:
while IFS= read -r line;
if [[ "$line" == ">"* ]];
count=$(sed -n /GATA/,/TCAT/p' | wc -c);
done < $file
That gives me the following output:
That output gives me just the first '>' line and a really weird and wrong distance between my two patterns, that suggest that i might be doing at least two things wrong, the loop itself and the sed command.
I'm sorry if this was a confusing post and i will be here to clarify things if necessary. I will appreciate any help i can get, or tips or useful links.
Thank you all,