| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
|
| Hi, I have a file with ids in a single column, something like: s13096 s4246229 s11047887 s6487465 s2970532 My second file has: 1 162714453 1.025e-02 0.425510 0.998131 s1416261 3.972e-01 1 162721094 2.056e-02 0.033229 1.000000 s4246229 3.416e-01 1 162723858 1.870e-02 0.459364 0.999169 s518111 8.294e-01 1 162740882 4.166e-02 0.387120 0.998131 s11047887 5.831e-01 1 162742200 3.818e-03 0.458333 0.986916 s12089136 7.388e-01 1 162746103 9.311e-01 0.499480 0.999169 s2970532 2.320e-01 Now, I would like to compare the 2 files and if the ids in the first file matches to the ids (6th column) in the second file, then print row of the 2nd file. I tried with: gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2 but it is not working. Am I doing something wrong? How can I compare the fileds and print the rows in the 2nd file? Thanks in advance. Regards, Ezhil |
|
#2
|
| On Thursday 28 August 2008 12:47, ezhil05-at-gmail.com wrote: > Hi, > > I have a file with ids in a single column, something like: > s13096 > s4246229 > s11047887 > s6487465 > s2970532 > > My second file has: > 1 162714453 1.025e-02 0.425510 > 0.998131 s1416261 3.972e-01 > 1 162721094 2.056e-02 0.033229 > 1.000000 s4246229 3.416e-01 > 1 162723858 1.870e-02 0.459364 > 0.999169 s518111 8.294e-01 > 1 162740882 4.166e-02 0.387120 > 0.998131 s11047887 5.831e-01 > 1 162742200 3.818e-03 0.458333 > 0.986916 s12089136 7.388e-01 > 1 162746103 9.311e-01 0.499480 > 0.999169 s2970532 2.320e-01 > > Now, I would like to compare the 2 files and if the ids in the first > file matches to the ids (6th column) in the second file, then print > row of the 2nd file. I tried with: > > gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2 That should be gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2 also, make sure that your fields are really separated by spaces or tabs and not other spurious unprintable characters. -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
#3
|
| On Aug 28, 12:04*pm, pk > On Thursday 28 August 2008 12:47, ezhi...@gmail.com wrote: > > > > > Hi, > > > I have a file with ids in a single column, something like: > > s13096 > > s4246229 > > s11047887 > > s6487465 > > s2970532 > > > My second file has: > > *1 * * * 162714453 * * * *1.025e-02 * * * 0.425510 > > 0.998131 * * * *s1416261 * * * 3.972e-01 > > *1 * * *162721094 * * * * 2.056e-02 * * * 0.033229 > > 1.000000 * * * *s4246229 * * * 3.416e-01 > > *1 * * *162723858 * * * * 1.870e-02 * * * 0.459364 > > 0.999169 * * * *s518111 * * * 8.294e-01 > > *1 * * *162740882 * * * * 4.166e-02 * * * 0.387120 > > 0.998131 * * * *s11047887 * * *5.831e-01 > > *1 * * *162742200 * * * * 3.818e-03 * * * 0.458333 > > 0.986916 * * * *s12089136 * * *7.388e-01 > > *1 * * *162746103 * * * * 9.311e-01 * * * 0.499480 > > 0.999169 * * * *s2970532 * * * 2.320e-01 > > > Now, I would like to compare the 2 files and if the ids in the first > > file matches to the ids (6th column) in the second file, then print > > row of the 2nd file. I tried with: > > > gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2 > > That should be > > gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2 > > also, make sure that your fields are really separated by spaces or tabs and > not other spurious unprintable characters. > > -- > All the commands are tested with bash and GNU tools, so they may use > nonstandard features. I try to mention when something is nonstandard (if > I'm aware of that), but I may miss something. Corrections are welcome. Hi, Thanks a lot. It's working. Could you please expalin me the command? Thanks again, Ezhil |
|
#4
|
| On Thursday 28 August 2008 13:24, ezhil05-at-gmail.com wrote: >> > gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2 >> >> That should be >> >> gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2 >> >> also, make sure that your fields are really separated by spaces or tabs >> and not other spurious unprintable characters. > > Hi, > > Thanks a lot. It's working. Could you please expalin me the command? The NR==FNR part is the same. The second part (the code executed when file2 is read) could be written as $6 in a {print $0} that is, if $6 is used as an index in the array a (ie, a[$6] exists - albeit empty), then print the line. Since "print" will print $0 by default, the above code can be shortened to $6 in a {print} but since the default action to be performed when a condition is true is "print", then the action can be omitted, yielding just $6 in a Your code wasn't working because you used a[$6] as the condition, and that just tests whether a[$6] has a non-null value, *not* whether the element a[$6] exists (ie: has been referenced) in the array. -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
![]() |
| Thread Tools | |
| Display Modes | |