Extract fields after comparing 2 files.

This is a discussion on Extract fields after comparing 2 files. within the shell forums in Operating Systems category; Hi, I have a file with ids in a single column, something like: s13096 s4246229 s11047887 s6487465 s2970532 My second file has: 1 162714453 1.025e-02 0.425510 0.998131 s1416261 3.972e-01 1 162721094 2.056e-02 0.033229 1.000000 s4246229 3.416e-01 1 162723858 1.870e-02 0.459364 0.999169 s518111 8.294e-01 1 162740882 4.166e-02 0.387120 0.998131 s11047887 5.831e-01 1 162742200 3.818e-03 0.458333 0.986916 s12089136 7.388e-01 1 162746103 9.311e-01 0.499480 0.999169 s2970532 2.320e-01 ...

Go Back   Database Forum > Operating Systems > shell

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-28-2008, 07:47 AM
Default Extract fields after comparing 2 files.

Hi,

I have a file with ids in a single column, something like:
s13096
s4246229
s11047887
s6487465
s2970532

My second file has:
1 162714453 1.025e-02 0.425510
0.998131 s1416261 3.972e-01
1 162721094 2.056e-02 0.033229
1.000000 s4246229 3.416e-01
1 162723858 1.870e-02 0.459364
0.999169 s518111 8.294e-01
1 162740882 4.166e-02 0.387120
0.998131 s11047887 5.831e-01
1 162742200 3.818e-03 0.458333
0.986916 s12089136 7.388e-01
1 162746103 9.311e-01 0.499480
0.999169 s2970532 2.320e-01

Now, I would like to compare the 2 files and if the ids in the first
file matches to the ids (6th column) in the second file, then print
row of the 2nd file. I tried with:

gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2

but it is not working. Am I doing something wrong? How can I compare
the fileds and print the rows in the 2nd file?

Thanks in advance.

Regards,
Ezhil

Reply With Quote
  #2  
Old 08-28-2008, 08:04 AM
Default Re: Extract fields after comparing 2 files.

On Thursday 28 August 2008 12:47, ezhil05-at-gmail.com wrote:

> Hi,
>
> I have a file with ids in a single column, something like:
> s13096
> s4246229
> s11047887
> s6487465
> s2970532
>
> My second file has:
> 1 162714453 1.025e-02 0.425510
> 0.998131 s1416261 3.972e-01
> 1 162721094 2.056e-02 0.033229
> 1.000000 s4246229 3.416e-01
> 1 162723858 1.870e-02 0.459364
> 0.999169 s518111 8.294e-01
> 1 162740882 4.166e-02 0.387120
> 0.998131 s11047887 5.831e-01
> 1 162742200 3.818e-03 0.458333
> 0.986916 s12089136 7.388e-01
> 1 162746103 9.311e-01 0.499480
> 0.999169 s2970532 2.320e-01
>
> Now, I would like to compare the 2 files and if the ids in the first
> file matches to the ids (6th column) in the second file, then print
> row of the 2nd file. I tried with:
>
> gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2


That should be

gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2

also, make sure that your fields are really separated by spaces or tabs and
not other spurious unprintable characters.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Reply With Quote
  #3  
Old 08-28-2008, 08:24 AM
Default Re: Extract fields after comparing 2 files.

On Aug 28, 12:04*pm, pk wrote:
> On Thursday 28 August 2008 12:47, ezhi...@gmail.com wrote:
>
>
>
> > Hi,

>
> > I have a file with ids in a single column, something like:
> > s13096
> > s4246229
> > s11047887
> > s6487465
> > s2970532

>
> > My second file has:
> > *1 * * * 162714453 * * * *1.025e-02 * * * 0.425510
> > 0.998131 * * * *s1416261 * * * 3.972e-01
> > *1 * * *162721094 * * * * 2.056e-02 * * * 0.033229
> > 1.000000 * * * *s4246229 * * * 3.416e-01
> > *1 * * *162723858 * * * * 1.870e-02 * * * 0.459364
> > 0.999169 * * * *s518111 * * * 8.294e-01
> > *1 * * *162740882 * * * * 4.166e-02 * * * 0.387120
> > 0.998131 * * * *s11047887 * * *5.831e-01
> > *1 * * *162742200 * * * * 3.818e-03 * * * 0.458333
> > 0.986916 * * * *s12089136 * * *7.388e-01
> > *1 * * *162746103 * * * * 9.311e-01 * * * 0.499480
> > 0.999169 * * * *s2970532 * * * 2.320e-01

>
> > Now, I would like to compare the 2 files and if the ids in the first
> > file matches to the ids (6th column) in the second file, then print
> > row of the 2nd file. I tried with:

>
> > gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2

>
> That should be
>
> gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2
>
> also, make sure that your fields are really separated by spaces or tabs and
> not other spurious unprintable characters.
>
> --
> All the commands are tested with bash and GNU tools, so they may use
> nonstandard features. I try to mention when something is nonstandard (if
> I'm aware of that), but I may miss something. Corrections are welcome.


Hi,

Thanks a lot. It's working. Could you please expalin me the command?

Thanks again,
Ezhil
Reply With Quote
  #4  
Old 08-28-2008, 08:49 AM
Default Re: Extract fields after comparing 2 files.

On Thursday 28 August 2008 13:24, ezhil05-at-gmail.com wrote:

>> > gawk 'NR == FNR{a[$1];next} a[$6]{print $0}' file1 file2

>>
>> That should be
>>
>> gawk 'NR == FNR{a[$1];next} $6 in a' file1 file2
>>
>> also, make sure that your fields are really separated by spaces or tabs
>> and not other spurious unprintable characters.

>
> Hi,
>
> Thanks a lot. It's working. Could you please expalin me the command?


The NR==FNR part is the same.
The second part (the code executed when file2 is read) could be written as

$6 in a {print $0}

that is, if $6 is used as an index in the array a (ie, a[$6] exists - albeit
empty), then print the line. Since "print" will print $0 by default, the
above code can be shortened to

$6 in a {print}

but since the default action to be performed when a condition is true
is "print", then the action can be omitted, yielding just

$6 in a

Your code wasn't working because you used a[$6] as the condition, and that
just tests whether a[$6] has a non-null value, *not* whether the element
a[$6] exists (ie: has been referenced) in the array.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 01:33 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Integrated by bbpixel2008 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.