[Info-Ingres] base64(), UTF-16, UTF-8 madness

This is a discussion on [Info-Ingres] base64(), UTF-16, UTF-8 madness within the Ingres Database forums in Other Databases category; Hi Everyone, This is a little off-topic but, I'm at my wits end on this one.... I've been asked to write an OME function that does a base64 encoding on nvarchar and nchar types. Now this seems simple enough... * Allow for Ingres being little endian when storing the unicode (UTF- 16) characters. ie U+671D is stored as 1D67 * Allow for standard rules on 'short strings' by padding with zero bytes, and overwriting output with a requisite number of '='.. * Divide the input into 6bit chunks and then use that value as an offset into the standard base64 ...

Go Back   Database Forum > Other Databases > Ingres Database

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 05-24-2007, 07:02 AM
Default [Info-Ingres] base64(), UTF-16, UTF-8 madness

Hi Everyone,

This is a little off-topic but, I'm at my wits end on this one....

I've been asked to write an OME function that does a base64 encoding
on nvarchar and nchar types.

Now this seems simple enough...
* Allow for Ingres being little endian when storing the unicode (UTF-
16) characters.
ie U+671D is stored as 1D67

* Allow for standard rules on 'short strings' by padding with zero
bytes, and overwriting output with a requisite number of '='..

* Divide the input into 6bit chunks and then use that value as an
offset into the standard base64 array of characters ie. A - Z, a - z,
0-9, +, /.

So 671D is 0110 (6) 0111(7) 0001(1) 1101(D) 0000 0000
Which in groups of 6 becomes:
011001 (25) == Z, 110001 (49) == x, 110100 (52) == 0

Hence we should get a return of 'Zx0='.

Trouble is that's not what MySQL gives my programmers on the same
data. It insists that this is a string starting with 's6\'. I've counter
checked this conversion with some web based conversion utilities and
they seem to agree.

So it occurred that the problem was that MySQL must be using UTF-8
to represent the character. Which is cool, so I thought I can convert the
UTF-16 into UTF-8 and convert the output of that into base64.

Trouble is that in UTF-8, U+671D becomes E6 9C 9D, which when
converted to base64 becomes the string: '5pyd'. I've confirmed this
UTF-16 --> UTF-8 conversion using Ingres to copy the nvarchar into a
file and running 'od -ax' on that file.

If I decode the s6\ string it means that my first UTF-8 character must be
B3 AF D1. But that's not well formed UTF-8!

Does anyone have any idea what I'm doing wrong?

Martin Bowes
--
Random Duckman Quote #114:
King Chicken: How dare you insult me in front of my wife, whose still
dangerously coherent.



Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 08:39 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Integrated by bbpixel2009 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.