Home How do I get a consistent byte representation of strings in C# without manually specifying an encoding?
Reply: 27

How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

Agnel Kurian
Agnel Kurian Published in 2009-01-23 13:39:54Z

How do I convert a string to a byte[] in .NET (C#) without manually specifying a specific encoding?

I'm going to encrypt the string. I can encrypt it without converting, but I'd still like to know why encoding comes to play here.

Also, why should encoding be taken into consideration? Can't I simply get what bytes the string has been stored in? Why is there a dependency on character encodings?

Brad Bruce
Brad Bruce Reply to 2015-07-18 21:46:49Z

Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!

Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)

For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.

Just do this instead:

static byte[] GetBytes(string str)
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;

static string GetString(byte[] bytes)
    char[] chars = new char[bytes.Length / sizeof(char)];
    System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
    return new string(chars);

As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.

Additional benefit to this approach:

It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!

It will be encoded and decoded just the same, because you are just looking at the bytes.

If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.

gkrogers Reply to 2009-01-23 13:43:18Z
byte[] strToByteArray(string str)
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    return enc.GetBytes(str);
Peter Mortensen
Peter Mortensen Reply to 2015-04-24 09:52:05Z

It depends on the encoding of your string (ASCII, UTF-8, ...).

For example:

byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString);
byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);

A small sample why encoding matters:

string pi = "\u03a0";
byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi);
byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);

Console.WriteLine (ascii.Length); //Will print 1
Console.WriteLine (utf8.Length); //Will print 2
Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //Will print '?'

ASCII simply isn't equipped to deal with special characters.

Internally, the .NET framework uses UTF-16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

See Character Encoding in the .NET Framework (MSDN) for more information.

cyberbobcat Reply to 2009-01-23 13:43:58Z
// C# to convert a string to a byte array.
public static byte[] StrToByteArray(string str)
    System.Text.ASCIIEncoding  encoding=new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);

// C# to convert a byte array to a string.
byte [] dBytes = ...
string str;
System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
str = enc.GetString(dBytes);
Zhaph - Ben Duguid
Zhaph - Ben Duguid Reply to 2009-01-23 14:03:30Z

You need to take the encoding into account, because 1 character could be represented by 1 or more bytes (up to about 6), and different encodings will treat these bytes differently.

Joel has a posting on this:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Hans Passant
Hans Passant Reply to 2009-01-23 14:15:26Z

The key issue is that a glyph in a string takes 32 bits (16 bits for a character code) but a byte only has 8 bits to spare. A one-to-one mapping doesn't exist unless you restrict yourself to strings that only contain ASCII characters. System.Text.Encoding has lots of ways to map a string to byte[], you need to pick one that avoids loss of information and that is easy to use by your client when she needs to map the byte[] back to a string.

Utf8 is a popular encoding, it is compact and not lossy.

iliketocode Reply to 2016-08-12 18:38:55Z

I'm not sure, but I think the string stores its info as an array of Chars, which is inefficient with bytes. Specifically, the definition of a Char is "Represents a Unicode character".

take this example sample:

String str = "asdf éß";
String str2 = "asdf gh";
EncodingInfo[] info =  Encoding.GetEncodings();
foreach (EncodingInfo enc in info)
    System.Console.WriteLine(enc.Name + " - " 
      + enc.GetEncoding().GetByteCount(str)
      + enc.GetEncoding().GetByteCount(str2));

Take note that the Unicode answer is 14 bytes in both instances, whereas the UTF-8 answer is only 9 bytes for the first, and only 7 for the second.

So if you just want the bytes used by the string, simply use Encoding.Unicode, but it will be inefficient with storage space.

Joel Coehoorn
Joel Coehoorn Reply to 2017-09-25 21:13:44Z

The first part of your question (how to get the bytes) was already answered by others: look in the System.Text.Encoding namespace.

I will address your follow-up question: why do you need to pick an encoding? Why can't you get that from the string class itself?

The answer is in two parts.

First of all, the bytes used internally by the string class don't matter, and whenever you assume they do you're likely introducing a bug.

If your program is entirely within the .Net world then you don't need to worry about getting byte arrays for strings at all, even if you're sending data across a network. Instead, use .Net Serialization to worry about transmitting the data. You don't worry about the actual bytes any more: the Serialization formatter does it for you.

On the other hand, what if you are sending these bytes somewhere that you can't guarantee will pull in data from a .Net serialized stream? In this case you definitely do need to worry about encoding, because obviously this external system cares. So again, the internal bytes used by the string don't matter: you need to pick an encoding so you can be explicit about this encoding on the receiving end, even if it's the same encoding used internally by .Net.

I understand that in this case you might prefer to use the actual bytes stored by the string variable in memory where possible, with the idea that it might save some work creating your byte stream. However, I put it to you it's just not important compared to making sure that your output is understood at the other end, and to guarantee that you must be explicit with your encoding. Additionally, if you really want to match your internal bytes, you can already just choose the Unicode encoding, and get that performance savings.

Which brings me to the second part... picking the Unicode encoding is telling .Net to use the underlying bytes. You do need to pick this encoding, because when some new-fangled Unicode-Plus comes out the .Net runtime needs to be free to use this newer, better encoding model without breaking your program. But, for the moment (and forseeable future), just choosing the Unicode encoding gives you what you want.

It's also important to understand your string has to be re-written to wire, and that involves at least some translation of the bit-pattern even when you use a matching encoding. The computer needs to account for things like Big vs Little Endian, network byte order, packetization, session information, etc.

Michael Buen
Michael Buen Reply to 2009-01-26 06:29:52Z
BinaryFormatter bf = new BinaryFormatter();
byte[] bytes;
MemoryStream ms = new MemoryStream();

string orig = "喂 Hello 谢谢 Thank You";
bf.Serialize(ms, orig);
ms.Seek(0, 0);
bytes = ms.ToArray();

MessageBox.Show("Original bytes Length: " + bytes.Length.ToString());

MessageBox.Show("Original string Length: " + orig.Length.ToString());

for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo encrypt
for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo decrypt

BinaryFormatter bfx = new BinaryFormatter();
MemoryStream msx = new MemoryStream();            
msx.Write(bytes, 0, bytes.Length);
msx.Seek(0, 0);
string sx = (string)bfx.Deserialize(msx);

MessageBox.Show("Still intact :" + sx);

MessageBox.Show("Deserialize string Length(still intact): " 
    + sx.Length.ToString());

BinaryFormatter bfy = new BinaryFormatter();
MemoryStream msy = new MemoryStream();
bfy.Serialize(msy, sx);
msy.Seek(0, 0);
byte[] bytesy = msy.ToArray();

MessageBox.Show("Deserialize bytes Length(still intact): " 
   + bytesy.Length.ToString());
Konamiman Reply to 2009-02-19 21:03:34Z

Two ways:

public static byte[] StrToByteArray(this string s)
    List<byte> value = new List<byte>();
    foreach (char c in s.ToCharArray())
    return value.ToArray();


public static byte[] StrToByteArray(this string s)
    s = s.Replace(" ", string.Empty);
    byte[] buffer = new byte[s.Length / 2];
    for (int i = 0; i < s.Length; i += 2)
        buffer[i / 2] = (byte)Convert.ToByte(s.Substring(i, 2), 16);
    return buffer;

I tend to use the bottom one more often than the top, haven't benchmarked them for speed.

Sunrising Reply to 2016-08-04 10:31:17Z

Fastest way

public static byte[] GetBytes(string text)
    return System.Text.ASCIIEncoding.UTF8.GetBytes(text);

EDIT as Makotosan commented this is now the best way:

Tshilidzi Mudau
Tshilidzi Mudau Reply to 2017-03-09 08:55:32Z

Well, I've read all answers and they were about using encoding or one about serialization that drops unpaired surrogates.

It's bad when the string, for example, comes from SQL Server where it was built from a byte array storing, for example, a password hash. If we drop anything from it, it'll store an invalid hash, and if we want to store it in XML, we want to leave it intact (because the XML writer drops an exception on any unpaired surrogate it finds).

So I use Base64 encoding of byte arrays in such cases, but hey, on the Internet there is only one solution to this in C#, and it has bug in it and is only one way, so I've fixed the bug and written back procedure. Here you are, future googlers:

public static byte[] StringToBytes(string str)
    byte[] data = new byte[str.Length * 2];
    for (int i = 0; i < str.Length; ++i)
        char ch = str[i];
        data[i * 2] = (byte)(ch & 0xFF);
        data[i * 2 + 1] = (byte)((ch & 0xFF00) >> 8);

    return data;

public static string StringFromBytes(byte[] arr)
    char[] ch = new char[arr.Length / 2];
    for (int i = 0; i < ch.Length; ++i)
        ch[i] = (char)((int)arr[i * 2] + (((int)arr[i * 2 + 1]) << 8));
    return new String(ch);
Peter Mortensen
Peter Mortensen Reply to 2015-04-24 09:58:10Z

Try this, a lot less code:

System.Text.Encoding.UTF8.GetBytes("TEST String");
user1120193 Reply to 2012-01-02 11:07:00Z
bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes

bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes
Vlad Reply to 2015-07-23 14:32:52Z

The accepted answer is very, very complicated. Use the included .NET classes for this:

const string data = "A string with international characters: Norwegian: ÆØÅæøå, Chinese: 喂 谢谢";
var bytes = System.Text.Encoding.UTF8.GetBytes(data);
var decoded = System.Text.Encoding.UTF8.GetString(bytes);

Don't reinvent the wheel if you don't have to...

Avlin Reply to 2017-05-23 12:18:28Z

Just to demonstrate that Mehrdrad's sound answer works, his approach can even persist the unpaired surrogate characters(of which many had leveled against my answer, but of which everyone are equally guilty of, e.g. System.Text.Encoding.UTF8.GetBytes, System.Text.Encoding.Unicode.GetBytes; those encoding methods can't persist the high surrogate characters d800 for example, and those just merely replace high surrogate characters with value fffd ) :

using System;

class Program
    static void Main(string[] args)
        string t = "爱虫";            
        string s = "Test\ud800Test"; 

        byte[] dumpToBytes = GetBytes(s);
        string getItBack = GetString(dumpToBytes);

        foreach (char item in getItBack)
            Console.WriteLine("{0} {1}", item, ((ushort)item).ToString("x"));

    static byte[] GetBytes(string str)
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;

    static string GetString(byte[] bytes)
        char[] chars = new char[bytes.Length / sizeof(char)];
        System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
        return new string(chars);


T 54
e 65
s 73
t 74
? d800
T 54
e 65
s 73
t 74

Try that with System.Text.Encoding.UTF8.GetBytes or System.Text.Encoding.Unicode.GetBytes, they will merely replace high surrogate characters with value fffd

Every time there's a movement in this question, I'm still thinking of a serializer(be it from Microsoft or from 3rd party component) that can persist strings even it contains unpaired surrogate characters; I google this every now and then: serialization unpaired surrogate character .NET. This doesn't make me lose any sleep, but it's kind of annoying when every now and then there's somebody commenting on my answer that it's flawed, yet their answers are equally flawed when it comes to unpaired surrogate characters.

Darn, Microsoft should have just used System.Buffer.BlockCopy in its BinaryFormatter


iliketocode Reply to 2016-08-12 18:38:24Z

Here is my unsafe implementation of String to Byte[] conversion:

public static unsafe Byte[] GetBytes(String s)
    Int32 length = s.Length * sizeof(Char);
    Byte[] bytes = new Byte[length];

    fixed (Char* pInput = s)
    fixed (Byte* pBytes = bytes)
        Byte* source = (Byte*)pInput;
        Byte* destination = pBytes;

        if (length >= 16)
                *((Int64*)destination) = *((Int64*)source);
                *((Int64*)(destination + 8)) = *((Int64*)(source + 8));

                source += 16;
                destination += 16;
            while ((length -= 16) >= 16);

        if (length > 0)
            if ((length & 8) != 0)
                *((Int64*)destination) = *((Int64*)source);

                source += 8;
                destination += 8;

            if ((length & 4) != 0)
                *((Int32*)destination) = *((Int32*)source);

                source += 4;
                destination += 4;

            if ((length & 2) != 0)
                *((Int16*)destination) = *((Int16*)source);

                source += 2;
                destination += 2;

            if ((length & 1) != 0)

                destination[0] = source[0];

    return bytes;

It's way faster than the accepted anwser's one, even if not as elegant as it is. Here are my Stopwatch benchmarks over 10000000 iterations:

[Second String: Length 20]
Buffer.BlockCopy: 746ms
Unsafe: 557ms

[Second String: Length 50]
Buffer.BlockCopy: 861ms
Unsafe: 753ms

[Third String: Length 100]
Buffer.BlockCopy: 1250ms
Unsafe: 1063ms

In order to use it, you have to tick "Allow Unsafe Code" in your project build properties. As per .NET Framework 3.5, this method can also be used as String extension:

public static unsafe class StringExtensions
    public static Byte[] ToByteArray(this String s)
        // Method Code
shytikov Reply to 2013-01-23 06:41:24Z

Here is the code:

// Input string.
const string input = "Dot Net Perls";

// Invoke GetBytes method.
// ... You can store this array as a field!
byte[] array = Encoding.ASCII.GetBytes(input);

// Loop through contents of the array.
foreach (byte element in array)
    Console.WriteLine("{0} = {1}", element, (char)element);
iliketocode Reply to 2016-08-12 18:39:11Z

C# to convert a string to a byte array:

public static byte[] StrToByteArray(string str)
   System.Text.UTF8Encoding  encoding=new System.Text.UTF8Encoding();
   return encoding.GetBytes(str);
İlker Elçora
İlker Elçora Reply to 2014-05-02 07:39:30Z

You can use following code to convert a string to a byte array in .NET

string s_unicode = "abcéabc";
byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes(s_unicode);
Thomas Eding
Thomas Eding Reply to 2013-09-27 23:26:41Z

OP's question: "How do I convert a string to a byte array in .NET (C#)?" [sic]

You can use the following code:

static byte[] ConvertString (string s) {
    return new byte[0];

As a benefit, encoding does not matter! Oh wait, this is an ecoding... it's just trivial and highly lossy.

Peter Mortensen
Peter Mortensen Reply to 2017-01-09 01:22:07Z


    string text = "string";
    byte[] array = System.Text.Encoding.UTF8.GetBytes(text);

The result is:

[0] = 115
[1] = 116
[2] = 114
[3] = 105
[4] = 110
[5] = 103
Community Reply to 2017-05-23 10:31:37Z

This is a popular question. It is important to understand what the question author is asking, and that it is different from what is likely the most common need. To discourage misuse of the code where it is not needed, I've answered the later first.

Common Need

Every string has a character set and encoding. When you convert a System.String object to an array of System.Byte you still have a character set and encoding. For most usages, you'd know which character set and encoding you need and .NET makes it simple to "copy with conversion." Just choose the appropriate Encoding class.

// using System.Text;
Encoding.UTF8.GetBytes(".NET String to byte array")

The conversion may need to handle cases where the target character set or encoding doesn't support a character that's in the source. You have some choices: exception, substitution or skipping. The default policy is to substitute a '?'.

// using System.Text;
var text = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes("You win €100")); 
                                                      // -> "You win ?100"

Clearly, conversions are not necessarily lossless!

Note: For System.String the source character set is Unicode.

The only confusing thing is that .NET uses the name of a character set for the name of one particular encoding of that character set. Encoding.Unicode should be called Encoding.UTF16.

That's it for most usages. If that's what you need, stop reading here. See the fun Joel Spolsky article if you don't understand what an encoding is.

Specific Need

Now, the question author asks, "Every string is stored as an array of bytes, right? Why can't I simply have those bytes?"

He doesn't want any conversion.

From the C# spec:

Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units.

So, we know that if we ask for the null conversion (i.e., from UTF-16 to UTF-16), we'll get the desired result:

Encoding.Unicode.GetBytes(".NET String to byte array")

But to avoid the mention of encodings, we must do it another way. If an intermediate data type is acceptable, there is a conceptual shortcut for this:

".NET String to byte array".ToCharArray()

That doesn't get us the desired datatype but Mehrdad's answer shows how to convert this Char array to a Byte array using BlockCopy. However, this copies the string twice! And, it too explicitly uses encoding-specific code: the datatype System.Char.

The only way to get to the actual bytes the String is stored in is to use a pointer. The fixed statement allows taking the address of values. From the C# spec:

[For] an expression of type string, ... the initializer computes the address of the first character in the string.

To do so, the compiler writes code skip over the other parts of the string object with RuntimeHelpers.OffsetToStringData. So, to get the raw bytes, just create a pointer to the string and copy the number of bytes needed.

// using System.Runtime.InteropServices
unsafe byte[] GetRawBytes(String s)
    if (s == null) return null;
    var codeunitCount = s.Length;
    /* We know that String is a sequence of UTF-16 codeunits 
       and such codeunits are 2 bytes */
    var byteCount = codeunitCount * 2; 
    var bytes = new byte[byteCount];
    fixed(void* pRaw = s)
        Marshal.Copy((IntPtr)pRaw, bytes, 0, byteCount);
    return bytes;

As @CodesInChaos pointed out, the result depends on the endianness of the machine. But the question author is not concerned with that.

Knickerless-Noggins Reply to 2014-04-14 08:32:44Z
string s = "abcdefghijklmnopqrstuvwxyz";
byte[] b = new System.Text.UTF32Encoding().GetBytes(s); 
Bharat Mane
Bharat Mane Reply to 2017-08-17 07:33:04Z

The string can be converted to byte array in few different ways, due to the following fact: .NET supports Unicode, and Unicode standardizes several difference encodings called UTFs. They have different lengths of byte representation but are equivalent in that sense that when a string is encoded, it can be coded back to the string, but if the string is encoded with one UTF and decoded in the assumption of different UTF if can be screwed up.

Also, .NET supports non-Unicode encodings, but they are not valid in general case (will be valid only if a limited sub-set of Unicode code point is used in an actual string, such as ASCII). Internally, .NET supports UTF-16, but for stream representation, UTF-8 is usually used. It is also a standard-de-facto for Internet.

Not surprisingly, serialization of string into an array of byte and deserialization is supported by the class System.Text.Encoding, which is an abstract class; its derived classes support concrete encodings: ASCIIEncoding and four UTFs (System.Text.UnicodeEncoding supports UTF-16)

Ref this link.

For serialization to an array of bytes using System.Text.Encoding.GetBytes. For the inverse operation use System.Text.Encoding.GetChars. This function returns an array of characters, so to get a string, use a string constructor System.String(char[]).
Ref this page.


string myString = //... some string

System.Text.Encoding encoding = System.Text.Encoding.UTF8; //or some other, but prefer some UTF is Unicode is used
byte[] bytes = encoding.GetBytes(myString);

//next lines are written in response to a follow-up questions:

myString = new string(encoding.GetChars(bytes));
byte[] bytes = encoding.GetBytes(myString);
myString = new string(encoding.GetChars(bytes));
byte[] bytes = encoding.GetBytes(myString);

//how many times shall I repeat it to show there is a round-trip? :-)
Jarvis Stark
Jarvis Stark Reply to 2017-01-09 01:21:19Z

A character is both a lookup key into a font table and a lexical tradition such as ordering, upper and lower case versions, etc.

Consequently, a character is not a byte (8-bits) and a byte is not a character. In particular, the 256 permutations of a byte cannot accommodate the thousands of symbols within some written languages, much less all languages. Hence, various methods for encoding characters have been devised. Some encode for a particular class of languages (ASCII encoding); multiple languages using code pages (Extended ASCII); or, ambitiously, all languages by selectively including additional bytes as needed, Unicode.

Within a system, such as the .NET framework, a String implies a particular character encoding. In .NET this encoding is Unicode. Since the framework reads and writes Unicode by default, dealing with character encoding is typically not necessary in .NET.

However, in general, to load a character string into the system from a byte stream you need to know the source encoding to therefore interpret and subsequently translate it correctly (otherwise the codes will be taken as already being in the system's default encoding and thus render gibberish). Similarly, when a string is written to an external source, it will be written in a particular encoding.

Jodrell Reply to 2014-11-25 10:29:12Z

If you really want a copy of the underlying bytes of a string, you can use a function like the one that follows. However, you shouldn't please read on to find out why.

        EntryPoint = "memcpy",
        CallingConvention = CallingConvention.Cdecl,
        SetLastError = false)]
private static extern unsafe void* UnsafeMemoryCopy(
    void* destination,
    void* source,
    uint count);

public static byte[] GetUnderlyingBytes(string source)
    var length = source.Length * sizeof(char);
    var result = new byte[length];
        fixed (char* firstSourceChar = source)
        fixed (byte* firstDestination = result)
            var firstSource = (byte*)firstSourceChar;

    return result;

This function will get you a copy of the bytes underlying your string, pretty quickly. You'll get those bytes in whatever way they are encoding on your system. This encoding is almost certainly UTF-16LE but that is an implementation detail you shouldn't have to care about.

It would be safer, simpler and more reliable to just call,


In all likelihood this will give the same result, is easier to type, and the bytes will always round-trip with a call to

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.431689 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO