On a recent task, I had to import a simple text file for data enrichment purposes, and the "name" column when imported to SQL SERVER, got messed up, event after I changed the file encoding to UTF-8.
So basically, the data was like this:
"Jão Pedro Valênciano"
When it should be like this:
"João Pedro Valenciano"
After some googling, I found some dead end alleys and a good answer. On the SQL TEAM forum there is thread on the subject, proposing a function that solves the issue.
Bellow is the function and the link:
CREATE FUNCTION dbo.Utf8_to_utf16 (@s VARCHAR(8000))
returns NVARCHAR(4000)
BEGIN
IF @s IS NULL
RETURN NULL
DECLARE @n INT,
@r NVARCHAR(4000),
@cn INT,
@octets INT,
@ch NVARCHAR(2)
SET @r = N''
WHILE 1 = 1
BEGIN
-- dubious: unexpected octets (0x80-0xBF, 0xF8-0xFF) are treated like 0x00-0x7F
SET @n = Patindex('%[ร€-รท]%', @s COLLATE latin1_general_bin
)
IF @n = 0
BEGIN
SET @r = @r + @s
BREAK
END
ELSE
BEGIN
SET @r = @r + Substring(@s, 1, @n-1)
SET @cn = Ascii(Substring(@s, @n, 1))
IF @cn <= 0xDF
BEGIN
SET @octets = 2
SET @ch = Nchar(( @cn & 0x1F ) * 0x40 + ( Ascii(
Substring(@s, @n + 1, 1))
&
0x3F ))
END
ELSE IF @cn <= 0xEF
BEGIN
SET @octets = 3
SET @ch = Nchar(( @cn & 0x0F ) * 0x1000 + ( Ascii(
Substring(@s, @n + 1, 1)
)
& 0x3F ) *
0x40
+
(
Ascii(Substring(@s, @n + 2, 1)
)
&
0x3F
))
END
ELSE
BEGIN
-- code point in a supplementary plane: output UTF-16 surrogate pair
SET @octets = 4
SET @ch = Nchar((@cn & 0x07) * 0x100 + (Ascii(Substring(
@s
, @n
+1
, 1)
)
&
0x3F
) * 0x04 + (
Ascii(Substring(@s, @n+2, 1)) & 0x30) / 0x10 +
0xD7C0)
+ Nchar((Ascii(Substring(@s, @n+2, 1)) & 0x0F)
*
0x40
+ (
Ascii(
Substring(@s, @n+3, 1)) & 0x3F) + 0xDC00)
END
SET @r = @r + @ch
SET @s = Substring(@s, @n + @octets, 8000)
END
END
RETURN @r
END
go
Link to it: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=62406
Use it like this:
SELECT dbo.utf8_to_utf16 ('João Pedro Valênciano')
The result is:
João Pedro Valênciano
Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06
Nenhum comentário:
Postar um comentário
Leave your comment here!