Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 3rd, 2006, 1:20 PM   #1
nkomp18
Newbie
 
Join Date: May 2006
Posts: 1
Rep Power: 0 nkomp18 is on a distinguished road
Unicode Problems

Hi there!

I am running into a bit of a problem with Unicode. I am trying to make a game that will work in many languages, English, Greek, Japanese, Chinese etc.

So far so good. I have created a class that converts a String into unicode characters. For instance
String s = stringToUnicode("lie");
s=="\u0077\u0061\u0074";

I have also created a method that does the opposite. So if I say
String s = unicodeToString("\u0077\u0061\u0074");
then s == "lie";

However when I am trying to read the string "\u0077\u0061\u0074" from an XML (UTF-8) file, then there's no way on earth to convert it into a character. For some inexplicable reason it returns as a string and the escape \u characters cannot be dereferenced. For example \u0077 cannot be expressed as 1 character (char c = '\u0077') but it is returned as a string String s = "\u0077" and I have tried so many things but it seems impossible to make a character again!!

Any help please?

Ideally I am looking to convert this string "\u0077\u0061\u0074" into a string of actual characters. It works if I do it in a Java class, but it doesn't work if this string is returned from a file.

Thank you
nkomp18 is offline   Reply With Quote
Old May 4th, 2006, 5:13 AM   #2
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 770
Rep Power: 3 Jimbo is on a distinguished road
What's wrong with just using Strings normally?
Quote:
Originally Posted by Java 1.5 Documentation
A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values).
Jimbo is offline   Reply With Quote
Old May 4th, 2006, 6:23 AM   #3
lectricpharaoh
SEXY SHOELESS GOD OF WAR!
 
lectricpharaoh's Avatar
 
Join Date: Jun 2005
Location: Wet west coast of Canada
Posts: 1,193
Rep Power: 5 lectricpharaoh will become famous soon enough
Quote:
Originally Posted by nkomp18
Ideally I am looking to convert this string "\u0077\u0061\u0074" into a string of actual characters. It works if I do it in a Java class, but it doesn't work if this string is returned from a file.
As Jimbo points out, Java Strings are already Unicode. Actually, since the Java char type is 16-bit, all text in Java should be Unicode. So, as I see it, you have two solutions:

First, you can use Unicode text files. This should allow you to read them from your Java programs just fine.

Second, you can rewrite your function to convert a sequence of bytes (not chars) into a Java String. This way, you can read it from the file if it is ASCII (but if the file is Unicode, you've got problems unless you employ some logic to check for this somehow). Note I say ASCII here instead of UTF-8; the latter is a more complex format where characters range from one to four bytes (more details here).
__________________
And once again, Probability proves itself willing to sneak into a back alley and service Drama as would a copper-piece harlot.
- Vaarsuvius, Order of the Stick
lectricpharaoh is online now   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 2:42 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC