String encodings - another thorn in interop

By Scott Balmos
There are days where I wish to return to the world of ASCII only encoding. Life was simpler back then - in a naive way. Now, we have ASCII, UTF-8, UTF-16, and so on. Even within them, there are different names for different locales (Sun sometimes using different names than the ISO ones come mind). And as far as I have seen so far (please correct me!), SOAP and friends do not really specify encodings to use when passing data that smells like a string. This could seriously blow a major hole in the side of a ship if you want your SOAP service to be truly international in its language support.

Think about it - you have Java on the one hand using UTF-16. Thankfully, so does the CLR (.Net), which covers a decent majority of language interop cases. But what about other languages that use UTF-8, or even pure C/C++ with Favorite Multibyte String Library Of The Week bolted on?

The default choice for multilingual language handling with Linux apps is UTF-8, which covers a lot of the C/C++ programs written there. PHP uses UTF-8. Ruby’s support for Unicode is also minimal, likewise barely handling UTF-8. Python is in the UTF-8 camp also. So, unfortunately, it looks as if we’re heading to another language interop battle. Back to one of the other favorite political battles that Ted Neward was writing about - The Java/.Net “enterprise/large app” languages, vs. the “small app” scripting languages.

Take SOAP out of the equation completely at the moment. What about in the simple cases of a Unicode text file? Is it UTF-16? Is it UTF-8? You can’t really tell by the text file itself. You have to know the encoding beforehand.

Some cursory Googling around reminded me of RFC 1641, which is a spec for specifying the Unicode encoding in the MIME type of a datastream. It seems that it might generally be a good idea to possibly look into extending the SOAP spec to specify encoding type in a string datatype.

Of course that leaves it up to the implementing language to deal with the encoding and decoding magic. But at least it’d be a start. As it is now, we have to explicitly encode our string data into UTF-8 or elsewhere if we want to be “completely” interoperable with languages.

It’d definitely be nice to not have to worry about what encoding my various SOAP client programs use. Just bring in the data, let the string encoding bloodhound class sniff at it a few times, and if it’s an encoding different than my language’s native encoding, it chews up the string byte array and spits out one to me that is in my native encoding, without any extra effort on my part. Ideally, this would simply be an extension to the string data marshaller/unmarshaller in the SOAP stack.

Data types *are* supposed to be vaguely universal, aren’t they?

“Remember, I’m pullin’ for ya… We’re all in this together…”
Scott’s personal blog can be found at http://members.simunex.com/sbalmos/serendipity/.

A look back in interop

We had a conversation with a long-time software hand the other day, one whose company had created a means to convert and run Java within .NET. This individual told us of his surprise at the developer community’s reaction to this software when first released. One might summarize the reaction as a good dose of vitriol and some asides to the effect that, if the Creator meant for Java to run within .NET.. well, that actually would have been a mistake on His part. In other words, let’s not mix things.Ted Neward’s post on this Interoperability Blog have pointed to the need to look afresh and ‘leave politics’ at the door when considering apps that integrate Java and .NET functions. Looking back at what has gone before, some of Neward’s words, from a 2004 whitepaper on the topic, which follow, still ring true. And they provide a good entry into what TheServerSide Interoperability Blog seeks to be about. – J. Vaughan, Site Editor, TheServerSide.NET

Making two platforms interact is at once a simple and difficult problem. Simple, in that it’s a fairly closed-requirements solution: if I can work out a few technical details, interaction is achieved. It’s also fairly easy to achieve success–if they can talk, you did it, if not, there’s still work to go. In fact, once you’ve worked out low-level issues like byte order, file/data format, and hardware compatibility, basic interaction between any two platforms is pretty straightforward. (As a matter of fact on this basis was the Internet built.)

* * *Historically, we’ve preferred “n”-tier systems to client/server 2-tier ones, because of the increased scalability intrinsic to “n”-tier systems. An “n”-tier system can scale to much higher user counts, due (among other things) to the shared database connections from a central middle tier to which clients connect. We also prefer the “n”-tier approach because it tends to allow for better separation of responsibilities in code: presentation-layer code goes on the client tier, business logic goes on the middle tier, and data-access code (largely represented by SQL) goes on the resource tiers either on or behind the middle tier machines. 3-tiers, 3 layers, but not always mapped one-to-one.

Drawing a distinction between the tiers and the layers is necessary for good interoperability between the platforms, because interoperability across layers is going to necessitate very different decisions than interoperability within layers. For example, if you have a Windows Forms .NET application that wants to display a Swing applications as a child window, very different decisions are in order than if you want that same Windows Forms applications to talk to a J2EE back end. Web services might suffice for the second requirement; it’ll be an unmitigated disaster if you use Web services for the first.

* * *So often, demos done on the expo show floor clearly prove that the product knows how to talk to the same vendor’s product on the other side of the wire, but rarely if ever demonstrates working with another vendor’s product. So, for example, an ASMX Web Methods Web service can easily declare itself as returning a Hashtable, for example, but once marshaled and sent across the wire, what format should it resemble in the J2EE space? While Java certainly has its own implementation of Hashtable, there’s no love lost between them in implementation details. As a result, it’s a fair bet (barring special code to the contrary), the .NET Hashtable will get rendered into a custom data format that has little to no bearing on the .NET Hashtable in widespread use.

For these reasons, just as with data exchange using XSD, when writing WSDL-based services, always start with the “parts in the middle”: in this case, the WSDL.

What do you think? Have things changed over the years? Are there places where both of these major platforms should be used within a single application?

Interoerability White Paper - 2004

Interop Across the Wire

By Ted Neward
Welcome to the next installment of “As the Interop World Turns”. In this particular bit, we’re examining interop across the wire, but before we do, let’s acknowledge the major news in the interoperability arena, the announcement of the formation of the Interoperability Alliance, bringing together Microsoft, BEA, Sun, and another dozen or so vendors, all focused on making it easier to play nicely between the platforms.

Practically speaking, however, at this point the Interop Alliance hasn’t significantly changed the interop landscape, so while it’s important to note that they exist, there’s nothing more to report. Whether this will turn into Something Big, or just another meaningless consortium of vendors remains to be seen—for now, it remains as a “potential” industry-affecting move.

On to more practical matters.

In recent years, most focus about interoperability between Java and .NET has been directly on the WS-* stack, AKA “Web Services”. For almost a decade now, the various vendors involved in the various WS-* standardization efforts (and even those who don’t participate directly but graft on to the edges somehow) have promised that as soon as the standards are here, and the implementations all implement the standards, seamless and ubiquitous interoperability across all platforms will be ours.

We’re waiting….

In the meantime, however, it turns out–according to those incredibly insightful people at Gartner and other “analysis agencies”–that most of the time, the only two platforms that principally draw interop interest are the JVM and the CLR. Hardly a surprise, for those of us who actually work for a living. And, as it turns out, if you’re looking to limit your interoperability to those two platforms, numerous toolkits abound already to make this happen.

While open-source toolkits also exist, in general they aren’t quite “up to speed” against the commercial toolkits, so in this entry we’ll focus on those, mainly the tools offered by J-Intrinsyc, JNBridgePro, and Borland. Each effectively provides a binary RPC-based interop approach, in which you follow a development process that’s (deliberately) similar to what’s done when working with the native ORPC stack (CORBA or RMI for Java, .NET Remoting for .NET). In several cases, the toolkits use the wire syntax and format of one of the two platforms (IIOP or the .NET Remoting format), meaning that for one of the two platforms, the experience is seamless. (Which platform gets to be the seamless experience is up to you, of course, but practical considerations—and a desire to continue to do business with your clients—generally dictate that your clients have the better experience. Choose wisely.)

In the case of Borland’s tool, called Janeva, the definitions are done in CORBA IDL, a language strikingly and deliberately similar to Java or C++ (and thus C#) interface declarations. Developers familiar with CORBA will know what to do with these definitions on a .NET platform: simply run the Janeva code-gen tool over the IDL file, which will generate stubs (client-side proxies) or skeletons (server-side proxies) as necessary. For existing CORBA systems, this is likely to be the easiest thing for a .NET client to do to hook in, but remember that CORBA IDL is an entire language and type system in of itself, and CORBA itself represents a fairly sizable stack to get used to - easily dwarfing what’s in the .NET Remoting stack in both size and complexity. [Quick correction — Just double-checked the Borland website, and Janeva isn’t there anymore. It’s been rolled into their VisiBroker package and is now called “VisiBroker for .NET”. (The rest is still the same, though.)]

For simpler scenarios, it’s generally easier to use something a little less intimidating (and, correspondingly, less powerful), such as the JaNET or JNBridge tools. Each is equally useful in my opinion, so I’m picking one at random here to use as a demo. JNBridge lost the toss (seriously!), so I’m going to use the J-Integra tool for this demo. This is actually taken from one of the demos shipping with their product, so if you feel like following along, grab the eval demo off their website, install, and look for the HelloWorld demo in the examples directory.

J-Integra takes a “.NET-friendly” perspective, meaning that the development experience is a bit easier on the .NET developer than the Java developer. (JNBridgePro take the opposite tack, for what that’s worth.) Thus, for the C# developer, developing an interoperable scenario is as simple as writing a typical .NET Remoting component—build a class that extends System.MarshalByRefObject:

// Copyright 2001-2003 Intrinsyc Software Inc.
// All rights reserved.
using System;
namespace HelloWorld
{
public class HelloWorldClass: System.MarshalByRefObject
{
private String name;
public HelloWorldClass(String name) {
this.name = name;
}
public String getMessage() {
return “Hello World, from ” + name;
}
}
}

From a .NET Remoting perspective, there’s absolutely nothing interesting about this class, which is exactly the point—any existing .NET Remoting servers can be flipped to be interoperable by doing exactly nothing. (In this particular demo, Intrinsyc has the HelloWorldClass instances being hosted by ASP.NET, but obviously we could just as easily self-host it if desired—see Ingo Rammer’s “Advanced .NET Remoting” from APress for details if you’re not “up” on your .NET Remoting.)

To get Java to call this guy, we need to do run J-Integra’s “GenJava” tool to create Java client proxies and compile them. Once those proxies are generated and compiled (and unfortunately I don’t see any custom Ant tasks to do this, so you’ll likely have to write an “exec” task to do it), drop them into your client .jar file, and call the proxies by name:

// Copyright 2001-2003 Intrinsyc Software Inc.
// All rights reserved.
import HelloWorld.*;public class HelloWorldMain {
public static void main(String[] args) throws Exception {
HelloWorldClass helloWorldClass =
new HelloWorldClass(”Fred”);
System.out.println(helloWorldClass.getMessage());
}
}

Again, nothing special, which is the point—the “magic” takes place inside the generated proxy, which (based on the settings in the GenJava tool) knows how to call over HTTP to the ASP.NET server hosting the HelloWorld instance, execute the call, and send back the returned String to the client.

(Before the JNBridgePro folks get peeved at me, let me quickly point out that the development experience there is going to be much the same: point their code-gen tool at the Java RMI server objects you want .NET to communicate to, and use those proxies as-is from C#.)

While tempting, there are some caveats to this approach. First, be careful when considering binary RPC-based approaches to interop, because the interface-based code-gen approach carries with it a nasty side effect: once published, a given interop endpoint can never be modified again without requiring all of its clients to also change with it. While this isn’t a major consideration during development of the project initially, it can be devastating when attempting to refactor code later, after the system has been initially released. This kind of tight coupling works against many agile projects, so choose your interfaces (whether Java or .NET based) with care. (And before the comments start flying, let’s be very clear about this: the tight coupling descends from the proxy-based code-generation approach, and not anything to do with the tools themselves. WSDL-based code-generated proxies frequently fall into the same trap.)

Secondly, using either of these tools assumes that you will never need to branch beyond the Java and .NET platforms; should you have to incorporate Ruby or “legacy” C++ into the mix, for example, you’re out of luck. This is where the “open-ended” interoperability of the WS-* stack (or its conceptual predecessor, CORBA) holds its own, and if there’s any reason to suspect that you’ll need to reach beyond the JVM and CLR, you should consider an IIOP-based or WS-*-based solution. (Be careful, however, since as of this writing I’m not aware of any Ruby-CORBA packages, so even CORBA could be a dead end if you need to plug Ruby into the mix.)

Thirdly, remember that out of the box, these tools generally focus on cross-process communication, and so that means that each method call across the boundary is not only a platform shift, but also a network traversal. Loosely translated, that means “hideously expensive” in performance terms. Even those toolkits that offer support across shared memory channels still go through the marshaling/unmarshaling process, so it’s still not as cheap as an in-proc method call. As with most interoperability scenarios, try to minimize the amount of back-and-forth between the two platforms. (That having been said, however, the JNBridge blog at http://www.jnbridge.com/blog/ shows how to embed Swing components inside of a WinForms form, which represents a powerful idea and one that shouldn’t be discarded out-of-hand. Just be sure to perf-test.)

The tight-coupling concern is a biggie, however, so in the next installment we’ll look at ways to avoid it by using messaging tactics, instead of RPC-based ones. Until then, remember, Java and .NET are like your kids: you love them both… “the same”.  

 

A look at out-of-proc or RPC interop

By Ted Neward
For years, the concept of “Java-.NET interoperability” has been wrapped up in discussions of Web services and the like, but in truth there are a bunch of different ways to make Java and .NET code work together. One such approach is to host the JVM and the CLR inside the same process, using a variety of tools, such as the open-source project IKVM (a part of the Mono project).

IKVM isn’t a “bridge” tool, like other interop technologies—instead, IKVM takes a different path entirely, doing bytecode translation, transforming Java bytecode into CIL instructions, and feeding them through the traditional CLR as such.

This means that Java classes basically become .NET assemblies, and executed using the CLR’s execution engine. The JVM itself, technically, is never loaded—instead, the CLR essentially becomes a JVM, capable of executing Java classes. (This means, too, then, that the various features that accompany the JVM, such as Hotspot execution of Java bytecode, the JVM garbage collectors, and the various JMX-related monitoring tools that are part of Java5 and later, will not be present, either.)

IKVM comes in two basic flavors—a runtime component that’s used to load and execute Java classes from .class binaries, and a precompiler/translator tool, ikvmc, that can be used to translator (or cross-compile, if you will) Java binaries into .NET assemblies. While the second option generally yields faster execution, the first is the more flexible of the two options, as it doesn’t require any preparation on the part of the Java code itself.

Using IKVM to load arbitrary Java code and execute it via Java Reflection turns out to be fairly easy to do; so easy, in fact, that you can use it from Visual Basic code. After adding the IKVM assembly to a VB.NET project, write:

Imports IKVM.Runtime
Imports java.lang
Imports java.lang.reflect

Imports jlClass = java.lang.Class
Imports jlrMethod = java.lang.reflect.Method

The first line just brings the IKVM.Runtime namespace into use, necessary to make use of the “Startup” class without having to fully-qualify it. The next two lines bring in parts of the Java runtime library that ship with IKVM (the GNU Classpath project, precompiled to CIL using ikvmc and tweaked as necessary to fit the CLR’s internals). Similarly, the last two lines create an “alias”, such that now the types “jlClass” and “jlMethod” are now synonyms for “java.lang.Class” and “java.lang.Method”, respectively—we want this because otherwise we’ll run into name clashes with the CLR Reflection APIs, and because it helps cut confusion about which Reflection we’re working with.

Module Module1
Sub Main()
Dim properties As Hashtable = New Hashtable
properties(”java.class.path”) = “.”
Startup.SetProperties(properties)

Next, we create a Hashtable object to hold a set of name-value pairs that will be passed to IKVM in the same manner that we pass “-D” properties to the Java Virtual Machine on the command-line. In this particular case, I’m (redundantly) setting the CLASSPATH to be the current directory, causing the JVM to look for code there along with the usual places (rt.jar and the Extensions directory inside the JRE). “Startup” is a static class, meaning there’s no instance thereof.

Startup.EnterMainThread()

To quote the vernacular, we’re off and running. By calling “EnterMainThread”, IKVM is now up and running, ready to start taking on Java code. Our next task is to find the code we want to execute via the standard Java ClassLoader mechanism, find the “main” method exposed thereon, create the String array of parameters we want to pass, and call it, all via traditional Java Reflection APIs, but called through IKVM instead of through Java code itself.

Dim sysClassLoader = ClassLoader.getSystemClassLoader

Dim cl1 As jlClass = jlClass.forName(”App”, True, sysClassLoader)

Dim paramTypes As jlClass() = { _
jlClass.forName(”[Ljava.lang.String;”, True, sysClassLoader) _
}
‘ java.lang.Class has an implicit conversion operator to/from Type
‘Dim paramTypes As jlClass() = { _
‘ GetType(String()) _
‘}

Dim main As jlrMethod = cl1.getDeclaredMethod(”main”, paramTypes)

In the lookup for the “main” method, notice how there are two different ways to specify the method parameters: one, using the JVM syntax to specify an array of Strings (“[Ljava.lang.String;” as given in the Java Virtual Machine Specification), and the other using IKVM’s ability to translate types from .NET to Java, which allows us to specify it as a “String()” in VB (or “String[]” in C#).

Dim parms As Object() = { _
New String() {”From”, “IKVM”} _
}

Dim result = main.invoke(Nothing, parms)

We create the array of Strings to pass, then call invoke(), passing “Nothing” (the VB representation of null) for the object instance, as per the usual Java Reflection rules. At this point, the “App.main()” method is invoked, and when it returns, the Java code has completed execution. All that is left is to harvest the results and display them, and shut IKVM down appropriately.

If result Nothing Then
Console.WriteLine(result)
Else
Console.WriteLine(”No result”)
End If
Startup.ExitMainThread()
End Sub End Module

Using IKVM is not a silver bullet, but it does offer some powerful in-proc interoperability options to the development team looking to leverage both .NET and Java simultaneously, such as calling out to Java EJB servers from within Excel or Word documents, or loading Spring into Outlook in order to evaluate incoming mail messages and process them for local execution.

Interoperability: Check your politics at the door

By Ted Neward
As we prepare to enter the holiday season here in the US, I think it’s time that we called for Peace on Earth. Or, at least, Peace in Computer Science.  

In 2000, when Microsoft first announced the .NET Framework (then called by various alternative names, such as the “Universal RunTime (URT)” or “COM3” or the “Component Object Runtime (COR)”), it was immediately hailed as the formal declaration of war on Sun and Java, if not an actual pre-emptive attack.

Within the industry, a schism already present was made deeper—developers were routinely asked “which side” they were on, whether they were supporters of “open” standards and “community-driven” development, or whether they were trying to support the evil corporate conglomerates. (I’ve since lost track of who’s supposed to be good or evil—Sun because they refused to release Java to an international standards body, IBM because they are trying to subvert Sun’s control over Java, Microsoft because they routinely “embrace and extend” open standards, or Oracle, because… well, just because.) I’m personally regarded as some kind of heretic and looney because not only do I routinely write code for both the Java and .NET platforms, but because I refuse to say, when asked, which one I like “better”.

You know what? I’m damn tired of these arguments. Can’t we all just get along and write software?

It’s not like these arguments really do much for our customers and clients. Truth be told, few of the people who use our software can even tell which platform the silly thing was written in, much less how it being written in Java will somehow make the world a more free (as in speech, as in beer, as in sex, whatever) place. Or that .NET somehow allows for multiple languages—generally speaking, the only language they care about is the one they speak and read and interact in. Most of the time, they’re just happy if they can *use* the software—remember, according to statistics routinely cited at conferences and presentations, half the time our customers never see software they’ve asked for, and when they do, it’s likely to be twice the budget costs originally anticipated, with half the features they originally asked for, in a user interface they don’t quite understand, even though it’s supposed to be “the latest greatest thing”.

This is progress?

Over the last five years, there’s been a quiet revolution under way, and it’s not the dynamic language revolution, nor the REST-HTTP-SOAP revolution, nor the agile revolution, nor AJAX. It’s not about containers or dependency injection or inversion of control or mock objects or unit testing or patterns or services or objects or aspects or meta-object protocols or domain-specific languages or model-driven architecture or any other fancy acronym and accompanying hype and marketing drivel. It’s a revolution of pragmatism, of customers and clients and others turning to developers and saying, “Enough is enough. I want software that works.”

“Works” here is a nebulous term, but before the Marketing goons start spinning the term to their best advantage, let’s clarify: “Works” is a simple term, as defined by our customers, not us. “Works” means runs in a manner that’s genuinely useful to our clients and customers. “Works” means it’s delivered close to on time and preferably under budget. (Nothing will ever make that utopian dream come true completely, so let’s be more realistic about the process—besides, *close* to on time and budget is a pretty good goal to shoot for right now, anyway.) “Works” means software that attaches itself to the existing mess we’ve made over the years, without having to rip out a whole bunch of servers and replace them with a whole bunch more. “Works” means taking what a customer has, in place, that already meets that definition, and tying the new stuff we’re building into that existing mess.

“Works” means, practically speaking, that we take the languages and tools that are available to us, and use them each to their advantage, regardless of political affiliation or perceived moral stance. That means taking Microsoft’s tools and technologies and tying them into Java’s, and vice versa. That means dropping the shrill rhetoric about how each is trying to “leverage” the other out of existence, and figuring out how to use them all together in a meaningful and technologically powerful way. That means recognizing that we are all one community, not little villages out in the countryside trying to beat each other into submission even as we try to scrape a living off the land.

Abraham Lincoln, the man who had the unfortunate luck to preside over the United States during its most divisive era, once said, “A house divided cannot stand.” Welcome to the Interoperability blog. Please check your politics at the door—here, we care only about how tools can be used to solve problems.