home Puzzling Through Erasure II

Remember to check the "with heading" box in the lower right corner. Surround code listings with <pre> and </pre>, and if you use < and > symbols you must say & lt ; and & gt ;


Maybe a competitor has already patented all the obvious implementation ideas?


2004/09/23 14:21 EST (via web):
Please see http://www.gafter.com/~neal/Erasure.html

-Neal Gafter

"Obvious implementation ideas" are by definition not patentable.

I don't see how "migration compatibility" was comprimised at all in .NET generics and C#. In .NET, a commercial assembly is strongly named, and a particular release designed for 1.1 of the .NET Framework will definitely have a different "identity" (version number and hash) from one desgined for the 2.0 framework. This is what customers want; they want to continue to use the older version of the assembly until they have had time to consider the changes in the new version. They don't want any kind of "silent upgrade", nor will they receive one. When they begin using a new version of the assembly they must recompile and relink.

All .NET assemblies compiled with the 1.1 version of the framework run fine in 2.0. So where is the migration compatibility problem? No one is forced to use generics, and no serious library provider will introduce breaking changes into their existing API
this issue has nothing to do with generics.
As far as claiming that the original design premises should not be revisited so late in the implementation process
if a design is a turkey, it should be labeled one, whether during design or after implementation. I could make similar complaints about Avalon...


2004/09/23 18:56 EST (via web):
The type injection idea seems similar to virtual types in Beta and they do seem to have some advantages, but as Neal Gafter has explained it wouldn't be as backward compatible as erasure and it also increases the size of an object (an extra field for the type information for each generic type parameter).

Sun almost certainly are aware of the virtual types in Beta because they have already pinched inner (non-static) classes from Beta!

A point worth remembering is that just because the current version of Java uses erasure it doesn't mean that a future version has to. This is were the NextGen people are comming from, once most libraries use Generics then they can be compiled with a later compiler that uses something other than erasure. Obviously with the problem that Neal has described in relation to backward compatability.

With regard to the C# system this seems very similar to NextGen?, e.g. they both use a where clause to specify in addition to interfaces implemented the constructors and static methods available. NextGen? predated C# considerably and NextGen? is supported to some extent by Sun (Guy Steele who works for Sun is one of the proposers of NextGen?). So I think it is fair to say that Sun are well aware of the alternatives and that they also have the most experiance with Generics of anyone and if they have decided that erasure is the best for the moment. I think it is probably best to trust their judgement for now and see if they are right in a couple of years time.

Just a bit of MS paranoier :). Just because the C# people say the new version will work with old doesn't mean it will work well with the old. Tell anyone who has tried to port a VB 6 program to VB .NET that it is just an upgrade and you will get a different (and generally stong) opinion :)


2004/09/23 22:56 EST (via web):
A previous post mentioned that in .NET an assembly (jar file) has a version number and you can specify which version you want. For Java Sun could have achieved the same (although finner grained) by introducing new names, e.g. a package name that was new like java.util2. Then you could import either java.util or java.util2 and recompile. However the problem is, is that if you don't have the source then you can't recomplie. The Sun solution is much more backwardly compatiple, it can use already compiled code.

So people will say "what we need to provide is a conversion between version 1 of the library and version 2 so that both can co-exist." However the problem with this is performance, suppose it is a large structure then you wast considerable time copying the structure each time you transision from old to new or vice versa.

If you have control over both version 1 and 2 of a library then there is a good solutions. For example when Sun went from the Vector/HashTable to Collections they were able to change the code in Vector/HashTable to be compatible with Collections and because Java links at runtime even old code that wasn't recompile would pick up the new Vector/HashTable and hence old and new would work seemlessly together. Sun must be aware of this solution and therefore we can conclude that it won't work in this case.

Why won't the new version "work well with the old"? I see no reason why. Do you have anything to back that up? It looks like older assemblies simply won't use generics, and run exactly as before. Newer assemblies aren't forced to use generics either. There are a lot of unsubstantiated claims about C# and the CLR here.


2004/09/24 00:19 EST (via web):
Frank,

You make two points, one rubbishing C# and two interaction of old and new.

  1. In the above post I made *no* "unsubstantiated claims about C# and the CLR here" and was only pointing out that version numbering an assembly is very similar to giving a package a new name. Whether it is a good idea or not, in this case, who knows. My guess is that it will be OK for C# because there are so few C# programs and the ones that exist are mostely in MS. The point I was making is that versioning would have been an option to Sun and that they almost certainly thought of it and decided it was a bad idea in this case. In other cases they have used it, e.g. AWT and Swing.
  2. Yes you could have programs that use only the old stuff or only the new stuff exclusively. However in a large project you probably don't have that luxurary. You are going to have to try and move the project over to the new stuff gradually, you don't have the resources to do otherwise. IE you are going to evolve your program not start from a blank sheet of paper. This inevitably means portions of your code using the old stuff and portions using the new. Neal Gafter gives the example of a program that uses two third party libraries (i.e. ones that you cannot convert to the new stuff yourself) and one of these libraries is updated and the other isn't. If you have to exclusively use either new or old you are stuck, you can't use the updated library and you can't use the new stuff in your code until the other library is updated.


2004/09/24 10:01 EST (via web):
1. Compile-time linking, version numbering an assembly, and side-by-side deployment are a source of stability in the CLR, and not a deteriment. It is the run-time linking in Java that creates the mess that the "migration stability" requirement was supposed to solve in Java's "generics". Without this mess the CLR has no use for this requirement. "There are so few C# programs and the ones that exist are mostely in MS" -- what???? There are millions of C# programs in the wild, I am sure. 2. If library A depends on library B, application C depeneds on both, and library B is modified, there is no reason in the CLR the authors of application C cannot upgrade to library B. Whether the upgrade is or is not possible is entrirely determined by the nature in which B is modified. If the authors of B followed good software engineering practices, they introduced no breaking changes into B -- members and Types were added, but no members deleted. This has absolutely nothing to do with generics, it is a general problem with a general solution. Whether B now uses generics is not important. No library developer is going to replace strongly typed collections with generic collections, deliberately breaking client code. Ideally, library A will be recompiled and tested with the new version of B, so that library A users can be ensured that it is safe to use the new version. Config files can force library A to use the new version of B, but that is not ideal. Any potential problems introduced have nothing to do with generics. -- Frank Hileman

I haven't investigated or used JDK5 yet, but I thought erasure was needed, so JDK5 compiled code can run on older JVMs?.
Malte Finsterwalder


2004/09/27 03:06 EST (via web):
More on library versioning:

The C# and NextGen systems are essentially the same and you can hand code them in Generic Java. In C# and NextGen when you write a generic class that calls a constructor the class is made abstract and an abstract-instance-factory method is added. Then a type specific class is added that extends the abstract base class (that the programmer wrote in non-abstract form). When you use a generic type the type specific base class is used instead. The problem is, is that the old non-generic classes do not interoperate with the new generic classes. E.G. suppose a new version 2 of ArrayList were introduced called ArrayList2 that was coded using generics and that it called an array constructor to create the generic array then the equivalent as written by the C#/NextGen compiler would be:

package junk.generics.brucearrayexample;
import java.util.;
import static java.lang.System.;
public abstract class ArrayList2< E > extends AbstractList< E > { // abstract added by compiler
    private final E[] array;
    public ArrayList2( final int size ) {
        array = instance( size );  // programmer wrote new E[ size ]
    }
    public abstract E[] instance( final int size ); // added by compiler
    public E set( final int index, final E value ) {
        final E old = array[ index ];
        array[ index ] = value;
        return old;
    }
    public E get( final int index ) {
        return array[ index ];
    }
    public int size() {
        return array.length;
    }
    public static void main( final String[] notUsed ) {
        final int size = 3;
        final IntegerArrayList2 ial2 = new IntegerArrayList2( size ); // programmer wrote ArrayList2< Integer >
        for ( int i = 0; i < size; i++ ) ial2.set( i, i );
        for ( final int i : ial2 ) out.print( i + " " );
        out.println();
        // final ArrayList ial = ial2; - doesn't interoperate with ArrayList
    }
}
class IntegerArrayList2 extends ArrayList2< Integer > { // class added by compiler
    public IntegerArrayList2( final int size ) {
        super( size );
    }
    public Integer[] instance( final int size ) {
        return new Integer[ size ];
    }
}
Note the final line of main is commented out because you cannot freely mix the old (ArrayList)and the new (ArrayList2). You can continue to use the old or you can use the new but not both. The Sun system doesn't have this restriction.

If you don't mind this restriction of not mixing generic and non-generic code then simply hand code as above.


2004/09/27 10:36 EST (via web):
Regarding ArrayList? and ArrayList2?: In C#, you would use interfaces (ICollection?, IList?) to acheive the same effect. Assuming the old code was written to use old style, non-strongly typed collections, and used a common interface, as long as the new collection implemented the same interface (which they seem to), the old code will work with the new collection, even if it is a generic collection. I appreciate your effort in trying to produce some concrete example that proves the C# generics has some limititation that the Java generics does not have. But so far, I have not seen any code to that effect. Could someone please produce a concrete example written in C#? Not a contrived example, using a collection no one uses.

One disadvantage in using an interface instead of the collection class itself is the implicit boxing that must occur when the generic collection contains a value type, such as Int32. However, since Java generics do not have any support for the value types, there is no comparision with Java in that department. - Frank Hileman


2004/09/27 20:27 EST (via web):
Frank, You raise three issues, taking each one in turn:

  1. Why do you need an example, it is obvious. ArrayList contains an 'Object[]' and ArrayList2 contains a 'T[]', a 'T[]' is not necessarily an 'Object[]' unless erasure is used and therefore some code will break if it directly manipulates the array in a manner that relies on a particular representation. How much code this will affect is anyones guess, but it seems that Sun think it will be significant. The problem is explained in detail by the NextGen people, paricularly section 4.2. This paper gives many examples :)
  2. Interfaces are not necessarily immune from having to be re-written by the compiler in a NextGen?/C# like system, see paper referenced above. Therefore you cannot gaurantee backward campatibility of interfaces.
  3. J5 has automatic boxing and unboxing of primitives like C#, see example above where int i is put into the collection (boxed) and removed from it again (unboxed) for printing. Java doesn't have user defined value types (struct).


2004/09/28 01:38 EST (via web):
Regarding 3 issues:

1) In C# in fact any array is "generic" even in 1.1 in that it can be cast to Array and manipulated that way. So perhaps that is not a good example. I understand your point. My point is this: C# libraries which are designed to be flexible, do not take ArrayList? parameters. They take ICollection? or IList?, and will work fine with any generic collection in the future implementing those interfaces.

2) I have never seen a backward compatibility problem with interfaces
could you provide a code example in C# please. I don't understand.
3) I meant that the Java "generics" does not have the advantages of true generics (as in C#) with respect to value types
the boxing overhead will occur regardless.


2004/09/28 02:28 EST (via web):
With regard to interfaces, here are some operations that expose the internal structure:

  1. Serialization 2. Reflection 3. Remote method innvocation

How common this type of thing is I don't know. But as I have said before in this forum; the solution for Sun and MS could well be different. There are many more Java programs than C# programs and there are many more third party libraries. Sun obviously feel that beaking people's code will be bad for them. MS clearly feel that there will only be a few problems for them. Both companies will be well aware of what is possible and have chosen different paths that reflect their market segment and culture.


2004/09/28 11:08 EST (via web):
"Breaking peoples code": so far not a single example has been produced here in C# or any other .NET language demonstrating how anyone's code, custom collection code or collection user code, will be broken by the upgrade to the .NET framework version 2.0. This is not just a theoretical interest; we have written custom collections and fail to see any problems introduced by generics.

  1. Serialization: already people must customize serialization if they want to preserve compatability across assembly versions. Otherwise the embedded assembly identity will interfere with that. So generics has absolutely no affect on that -- correct? Generic collections cannot be deserialized using non-generic serialization data, because they have an entirely different identity.
2) Reflection. Using reflection to break encapsulation is uncommon and strongly discouraged. It is unlikely anyone will be affected in this way. Even if they are using reflection, no one is automatically upgrated to a different, generic collection type and "broken". They will continue to use the old non-generic type and work fine
unless they are broken by internal changes independent of generics, which is why encapsulation should not be broken.
  1. Remote method invocation. Please provide an example. I don't understand why this has anything to do with generics in the CLR.


2004/09/28 19:45 EST (via web):
1. Serialization - In the Sun version of generics a generic API can serialize data and a non-generic API can de-serialize it or vice versa, because the Generic version and the non-generic version are the same (the generics are erased).

  1. Reflection - This is certainly used, e.g. my IDE uses it for Graphical components, and I have used it, for example if part of the program is loaded from a remote location or is supplied after the main program is compiled.
  2. RMI - This uses both Serialization and Reflection as well as custom classes, so see above for why this may fail.

You keep asking for examples, take a look at the NextGen stuff they describe the problems in depth. Also your focus on interfaces as to why there will be no problems is misguided. See NextGen? again, the compiler might have to modify the interface!

The debate is not whether code will be broken, it will by NextGen style generics, the debate is how much code will be broken. The NextGen people agree that code will be broken, they however feel that this is a price worth paying and Sun don't. Sun may well have got this right, their system gives you nearly all the advantages of the NextGen system and doesn't break old code. What is more; if Sun have got it wrong they can in future go down the NextGen path, the current system doesn't preclude that.


2004/09/29 19:09 EST (via web):
Really I was looking for real world examples where C# code will be broken. Instead I have been given a bunch of theoretical examples illustrating how Java might have been broken if it had used a different system. It seems that no one answering questions here works with C# and .NET. As far as I can tell absolutely nothing will be broken moving from the .NET framework version 1.1 to version 2.0 as a result of the introduction of generics. Nor does the migration compatibility argument make any sense with regards to .NET. If any one has any real examples, in C# or any other .NET language, proving to the contrary please post it. Thanks.


2004/10/07 19:28 EST (via web):
In response to the above comment, this is a discussion about Java - so how about an example that erasure can't do :). Note: you might find this very difficult to do - Bruce is currently trying but most of the suppossed flaws are rare cases and are easily coded in a couple of extra lines anyway!

subtopics: