Bug82 Editing
Server needs to cleanup afterwards.
Perhaps this is too obvious to work, but maybe we can just call orb->run() once the server is started, and set the signal-handler to call orb- >shutdown()?
OK, that didn't work ;| However, according to a thread on the omniorb mailing list (Re: [omniORB] Problem shuting down server on Aix 4.3.2), Duncan Grisby said that this shouldn't work, and that instead you should use a posix semaphore: "The only primitive that is guaranteed signal-safe is the Posix semaphore (and even then it isn't always)." So that would seem to be the way forward. Now if I just knew *anything* about semaphores...
Initial patch. Summary of changes: - main server thread was looping inside ScreenManager::run(); refactored that to run in its own thread in the same way as ServerImpl::run() - after starting the ScreenManager we wait on a semaphore; the signal-handler can unblock the semaphore, and maybe later we can let console quit-events do the same (bug81). - once the semaphore is unblocked, we unpublish the server-reference and stop the Server and ScreenManager threads. - unpublishing removes the reference from the nameserver or deletes the file storing the stringified ior; not sure of the security aspects of removing the file, since you can specify both the location and filename! maybe we should just corrupt the ior in the file, to make it obvious that server-reference is invalid... Added nicholas since he's my patch-submitting buddy ;) Not sure if we should get a thread/signal guru to review this. Would like to get this into m1 if we can... [clean_shutdown.1.diff]
I meant to mention that for some reason the orb->destroy() at the end of main() causes an orb error; we don't need this anyway afaict, but perhaps its important. Am getting the odd abort when running this too :(
I've read through the patch, and here are some comments, in no particular order: 1. Why do you use my_foo instead of the already existing _foo idiom for member variables? 2. I wonder if you should use orb->shutdown() instead of orb->destroy()? (I don't know the difference between them) 3. Did you consider having the static thread run method call into a member function of the ScreenManager class? ie: static void run(void*p) { static_cast<ScreenManager*>(p)->run(); } That way you wont have to use the obfuscating s-> syntax everywhere. 4. You are using the thread cancel() method. Generally stopping threads is bad. pthread_cancel appears to support a cancellation framework, whereby your code can define cancellation points. You don't seem to have any of this.. the simplest way is to not use cancel, but instead set a flag, ie: (not sure of exact syntax) // In main thread when semaphore is set and we // are shutting things down void ScreenManager::stop() { _flag_mutex.lock(); _stop_flag = true; _flag_mutex.unlock(); // Wait for thread to exit _thread.join(); } // In ScreenManager::run() in separate thread: bool stop = false; while (!stop) { // Do loop stuff ... // Check stop flag _flag_mutex.lock(); stop = _stop_flag; _flag_mutex.unlock(); } I didn't read the unpublish stuff.
I don't see why you want this in M1. It sounds pretty likely to cause assertions, aborts and other nasties on exit which we don't get right now. Stefan has said some things in the past about unpublishing, but I forget what they were and I don't know if they're still applicable. I'll go ask him. cc'ing chalky so neiljp's response to his review will get back to him.
Chalky: 1. the my_ prefix is in the new coding standard, so I didn't see a reason to stick with the old style; w/o the standard, yes, I would just have stuck with the present style. 2. orb->run()/orb->shutdown() are normally used to do the equivalent of the semaphore used in the patch: that is, orb->run() gives control over to the orb, which blocks until orb->shutdown() is called (eg. via an ExitCommand attached to a button). orb->destroy() just destroys the orb :) 3. yes, that's a nicer syntax :) I was just working on the principle of getting something going, and following the layout used in ServerImpl. Perhaps the ServerImpl style ought to be changed? it would make it more consistent/clean... 4. I didn't know about cancellation points - hence my request for some thread/signal guru ;) The reason I didn't just put a test into the loops of the ServerImpl/ScreenManager was that those are important loops and I didn't know if putting a check in *every* iteration might slow things down a lot. From what I've just read, the default cancellation behaviour is PTHREAD_CANCEL_DEFERRED, which only cancels the thread at a predefined cancellation-point; these already exist in certain system functions, but you can add your own. Apart from making the system 'clean', since we don't hold any resources which won't automatically be released (?), is there any real problem with just cancelling? Of course, if object references/servants are stored in objects in running threads, then the orb won't necessarily shut down cleanly...but we are shutting down, so convince me it really matters. eg. we don't deallocate dynamically-allocated servants anyway!a nicholas: Currently the only time the 'clean' shutdown occurs is in response to signals. So if the user clicks on the X11 'X' nothing will happen and we exit in the old way. The behaviour with hitting ESC is bad already, and in all my tests the 'clean' shutdown doesn't affect the generation of the server.log file. In summary, this doesn't affect the core period of server behaviour, only cleanup, and the only 'bad effect' so far is the occasional abort (which ok, is bad, but only happens on signals; and on a segv you might expect that). Reworked patch to work with (3); advice on working with (4) invited, if really necessary.
New patch [clean_shutdown.3.diff] Changes: - don't bother cancelling the active threads: rely on them being closed automatically when the server process ends; - make the server shut down smoothly when hitting ESC, hopefully fixing bug126. Please comment; so far I've had no unexpected behaviour with this patch.
Current patch produces sane results for ESC (note that the client dies before the server when you do that, but not for CTRL+C... strange) and CTRL+C and the X close button. I tried testing the signal handling by sending it signals from another terminal, and managed to kill off the X window but leave some of the threads running. I have no idea whether this is possible without the patch, I've never tried anything quite that pathological. Neil promises to add a sigterm handler. I'm still worried about the case where one thread might "just die" even though the others don't. At some point in the distant past, stefan suggested that he'd rather not have the server remove itself from the nameserver? That was before any of this stuff ever got underway. I'd like stefan to comment on that and on the use of semaphores in this patch. He should know any side-effects of neil's code.
From discussions with stefan about this on irc, the semaphore and extra-thread changes seem to be acceptable. I'll change the unpublishing to not mess with the files (for security's sake, for now), and then we can apply it, unless anyone has any further objections.
Here's the latest patch; there should be no major changes from the previous version.
Gah. Last patch was messed-up. This one should apply ;|
Change to exit after a segv, and comment as such! (undefined posix behaviour, according to sdt) Curiously lost when I was recoding that section ;)
This patch currently breaks the OpenGL console. The problem is the lack of nicholas's task38 patch, and the fact that with neiljp's patch the code now spawns an extra thread for the ScreenManager, so the GL context gets lost between the server thread and this new thread. Furthermore, may I suggest that the IOR file unpublisher simply checks that the to-be-deleted file contains the current IOR and ONLY the current IOR (ie. check for EOF after reading the IOR from the file) and then delete it? In this case I don't see any security problem since at most the user has lost a file containing the IOR of a dieing server.
sdt: - I think you mean that this breaks the openGL *DrawingKit*, but I'm sure most will understand that ;) Yet another reason for task38 :) - there is a section of code which I commented-out (for now) which does almost what you suggest for the ior (file) unpublishing (except for the eof). See the block at line 478 in http://src.fresco.org/viewcvs/Fresco/Fresco-C++/src/Resolve.cc
Yes, indeed, I did mean the drawingkit - sorry. njs made me realise that removing the file is still a race condition, since we remove the filename, not the file descriptor. I suppose the safest thing would be to seek back and overwrite the file with something indicating no IOR.
As far as I'm aware this patch works fine; rather than wait for me to get my cvs account, perhaps someone could commit it now? we can treat any small tweaks (eg. ior-file treatment) separately. If you need some extra motivation, the patch should actually solve bug126 too :)
Just committed this patch; this should now be fixed.
lBFw3r <a href="http://foipnourdgql.com/">foipnourdgql</a>, [url=http://uazhfwbijgas.com/]uazhfwbijgas[/url], [link=http://krubkwqsfaqs.com/]krubkwqsfaqs[/link], http://fertzbqmivhz.com/