Another question would be, what transformation-matrix should be used for this case?
From what to what? 3d to isometric?
Last but not least, how would a mouse-click be handled? My idea was to transform the mouse-coordinates from 2d to the 3d field and raycast it for collision?
If you're using everything in 2d, this math can help you: http://clintbellanger.net/articles/isometric_math/
If you're using a 3d engine with a isometric projection matrix, the first things i can think are 1) raycast 2) use MRT and render on another texture an integer that represent each particular object,coordinate or anything you want (e.g. player == 0x99), so that you can just read that value where the mouse is and instantly know which object has been touched.
Or maybe you can run the 2d math also with 3d isometric projection (== simplified raycast)