.NET System Data Types, a few notes on

This post is based on Chapter 3 of Andrew Troelsen’s “Pro C# 2008 and the .Net Platform” availble from APress.

Note that C# uses a shorthand notation for most of the .NET system data types, hence int is shorthand for System.Int32 and bool is shorthand for System.Boolean.
You can use these interchangably, they repesent the same thing.

Note that for most numerical data types we have signed (e.g. Int32) data types which can be negative and positive, and unsigned versions (e.g. UInt32 or uint) which only allow positive values.

All the built in data types have a default constructor which sets the variable to the data type’s default value. So this can be used to work around the “use of unassigned variable” error in a consistent way.
Note that the default value of a String is null.

The Class Hierachy of System Types

All classes including the intrinsic data types inherit from object. Note that all the numerical datatypes live on the stack and all inherit from ValueType – which in turn inherits from Object. Note that all fields allocated on the stack are removed immediately as soon as the field looses scope – this is much more efficient than the garbage collected heap. This also explains the “StackTrace” you see in your exceptions. To visualize this think of the stack as a “last-in-first-out” queue – and as you code goes through methods calling other methods, value types that are created are placed onto the stack, and then as each “}” closes of a method the value types referenced in that method fall out of scope and are popped off the stack iommediately. Note that Char is a ValueType.

All built-in datatypes not inheriting from ValueType (like String, Array, Exceptions etc) are reference types and live in garbage collected heap memory. Think of this as just that – a heap of stuff floating around, with a lot of pointers (references)  from the stack. At certain times the Garbage Collector will run and anything that is no longer referenced from the stack will be disposed.

See the illustration below for this relationship (image from Andrew Troelsen’s Pro C# 2008):
Class Hierachy of System Types

Note that Boolean types also have FalseString and TrueString properties which return string representations of the value in a consistent way.

Note that all String and Char data by default is Unicode. Also the Char data type has some useful static helper methods like IsDigit, IsLetter, IsPuntuation and IsWhiteSpace.
Note that the + operator when used with strings really apply the static String.Concat  method under the hood.
Interestingly the \a escape sequence allows you to do a system beep from the Console.Write method.

Comparing Strings

The object.Eqauls(object) method when used on reference types (like String) will normally determine if the reference is pointing to the same instance in heap memory, so if you were to declare two instances, a and b, of some class and set all properties exactly the same, the a.Equals(b) call would return false, because your a and b object are pointing to two different location on the heap.
If on the other hand you created a and then assigned b = a; – the Equals method would return true because you are looking at the same instance on the heap.
For the String type this behaviour has been overridden to actually compare the value of the string object, and not the object in memory they refer to.

Note that it is often a good practise to override the object.ToString() method in your classes to return a formatted string that represents the current data or state of the object. Then you can extend this by overriding the object.GetHashCode() method to return a hashcode of your ToString() value – and finally override the Equals() method to compare the Hash Codes of your objects.
If you are doing this you might also be interested to know that you can still use the static object helper method ReferenceEquals to perform the default memory instance comparison.

Immutable Strings

So strings are immutable, what the h3ll does that mean?
Well, it means that an instance of a string cannot ever be changed(!). So, when you are messing around with strings, perhaps doing things like SubString(), ToUpper() or ToLower() you are infact always creating a new instance of the original string: a modified copy. The same applies to String.Concat and the shorthand + operator, and even assigning strB = strA would mean that strB points to a new copy of the actual string data!
The end result of all this is that whenever you assign a value to a string member, that string object data cannot ever be changed. It sounds unreal, but it’s true, and it’s important to know that everytime you in any way modify a string object, you are in fact creating a new piece of string data on the heap and pointing your string reference to this location, leaving the previous string value waiting on the heap for the garbage collector to come and free up the used memory.
So why are strings immutable, what is the design reasons for this? Well the “.NET Framework Standard Library Annotated Reference” tells us that:
String being immutable is a very nice thing for the type system. It means you do not have to worry about ownership issues. You can pass a reference to a string in your internal implementation to clients without a worry that they will modify your data.
And if like me that  leaves you going “what was the midlle bit?” – have a look at our summary of Mr. Troelsen’s excellent treatment of Generics and Boxing/Unboxing: https://blogbustingbeats.wordpress.com/2009/06/22/net-collections-and-generics-in-c/

Generally the Garbagle Collector will be able to deal with this – but if you are doing loads of string modification in a tight loop (that perhaps you do not know the length of) you should consider using the StringBuilder from the System.Text namespace as this allows you to modify the same piece of in-memory text data (managed internally by the StringBuldier).

Narrowing and Widening Data Type Conversions

So basics of this is that you can always safely do a widening conversion, say from a short to a int:

int x = (int)short.MaxValue;

There is not risk of data loss here, however, if we were to declare an int variable and assign it the value 2147483647 (Int32.MaxValue) then you would loose data by converting this to a short.
You then have the option of creating checked{} blocks, which will force a OverflowException on any data loss, or unchecked blocks which will suppress these. By default your application will be unchecked and allow data loss, you can enforce this on a project wide basis using the advanced option under the project build settings (set the check for arithmatic over/underflow).

Casting versus System.Convert

I’ve wondered about this before, what’s the Convert.ToXXX (like e.g. Convert.ToInt32) and the implicit casting operator like e.g. (short)123.

The answer apparently is nothing really, use whichever you prefer. I some time ago used to prefer the Convert.ToXXX  as I thought it more clear and readable, however I think the normal casting syntax is more commonly used.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: