Package emissary.util
Class CharsetUtil
- java.lang.Object
-
- emissary.util.CharsetUtil
-
public class CharsetUtil extends Object
A collection of utilities for dealing with different character sets in Java. Mainly with the aim of getting to UTF-8. The j* routines generally take Java CharSet names while the non j* routines take derived charset names. This class contains an interpretation in Java of the GPL method isUTF8, available in C from http://billposer.org/Software/unidesc.html and the copied routine is called LegalUTF8P in Get_UTF32_From_UTF8i.c Copyright (C) 2003-2006 William J. Poser (billposer@alum.mit.edu) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA or go to the web page: http://www.gnu.org/licenses/gpl.txt. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= This class contains the Apache Licensed isUnicodeString which is from Jakarta POI http://jakarta.apache.org/poi Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static char[]
byteToCharArray(byte[] bArray)
Convert bytes to chars using platform default encodingstatic char[]
getUtfCharArray(byte[] byteArray, String charSet, int start, int end)
Get an array of UTF-8 characters from the input bytesstatic String
getUtfString(byte[] data, String charSet)
Get a string in the specified encodingstatic String
getUtfString(String s, String charSet)
Get a string in the specified encoding from the input Stringstatic boolean
hasMultibyte(String value)
See if string has multibyte chars (No longer based on org.apache.poi.util.StringUtil) It would be a bad idea to call this with a very large stringstatic boolean
isAscii(String s)
test for ascii nessstatic boolean
isUtf8(byte[] data)
do these bytes represent a valid utf8 string?static boolean
isUtf8(byte[] data, int offs, int dlen)
Check for valid utf8 data.static boolean
isUtf8(String s)
Do the bytes behind this string represent valid utf8?static char[]
jGetUtfCharArray(byte[] byteArray, String charSet, int start, int end)
Get an array of UTF-8 characters from the input bytes
-
-
-
Method Detail
-
getUtfCharArray
public static char[] getUtfCharArray(byte[] byteArray, String charSet, int start, int end)
Get an array of UTF-8 characters from the input bytes- Parameters:
byteArray
- the input bytescharSet
- derived charSet of the input arraystart
- index into input array to start copyingend
- index into input array to stop copying- Returns:
- array of UTF8 char
-
jGetUtfCharArray
public static char[] jGetUtfCharArray(byte[] byteArray, @Nullable String charSet, int start, int end)
Get an array of UTF-8 characters from the input bytes- Parameters:
byteArray
- the input bytescharSet
- JAVA charSet of the input arraystart
- byte index into input array to start copyingend
- byte index into input array to stop copying- Returns:
- array of UTF8 char
-
getUtfString
public static String getUtfString(String s, String charSet)
Get a string in the specified encoding from the input String
-
getUtfString
@Nullable public static String getUtfString(byte[] data, String charSet)
Get a string in the specified encoding- Parameters:
data
- input bytescharSet
- the JAVA charset- Returns:
- JUCS2 string or null if error
-
byteToCharArray
public static char[] byteToCharArray(byte[] bArray)
Convert bytes to chars using platform default encoding- Parameters:
bArray
- the input data
-
isAscii
public static boolean isAscii(String s)
test for ascii ness- Parameters:
s
- string to test- Returns:
- true if string is ascii
-
isUtf8
public static boolean isUtf8(String s)
Do the bytes behind this string represent valid utf8?- Parameters:
s
- string to test- Returns:
- true if string is utf8
-
isUtf8
public static boolean isUtf8(byte[] data)
do these bytes represent a valid utf8 string?- Parameters:
data
- the bytes to check- Returns:
- true if valid utf8
-
isUtf8
public static boolean isUtf8(byte[] data, int offs, int dlen)
Check for valid utf8 data. Borrowed from the unidesc package (GPL) by Bill Poser, converted from C to Java. The check runs from offs to dlen-1- Parameters:
data
- the bytes to check for validityoffs
- beginning offset to checkdlen
- ending offset of the range- Returns:
- true if valid utf8
-
hasMultibyte
public static boolean hasMultibyte(@Nullable String value)
See if string has multibyte chars (No longer based on org.apache.poi.util.StringUtil) It would be a bad idea to call this with a very large string- Parameters:
value
- string to test- Returns:
- true if string has at least one multibyte char
-
-