bergie: BTW, the client is asking when we expect to get n.n.reg working
[10:55] bergie: Can you figure out *any* workaround?
[10:55] torben: you won't like this: i'm really out of ideas right now
[10:56] torben: this segfault has so much randomness, that i even can't track it down to specific constellation
[10:56] bergie: what if I just surround MidCOM with one mgd_auth_midgard("admin") call?
[10:56] torben: just 5 minutes ago, i registered for an event with a new account creation on the cached m-t host without the slightest problem, now i'm getting segfaults again
[10:56] bergie: would n.n.reg then skip its own auth calls?
[10:57] torben: ?
[10:57] torben: i don't think that the auth calls are the root of this
[10:57] torben: all these auth calls work
[10:57] torben: the segfaults are at the end of the request
[10:57] torben: midcom completly runs through, 100%. db-changes, everything
[10:57] torben: look at this:
[10:57] torben: Sep 01 09:56:45 [debug] midcom_helper__cache: Sent HEader: ETag: d41d8cd98f00b204e9800998ecf8427e
[10:57] torben: Sep 01 09:56:45 [debug] midcom_helper__cache: We are on no_cache, flushing output buffer and exitting
[10:57] torben: Sep 01 09:56:45 [debug] midcom_helper__cache: END OF MIDCOM REQUEST
[10:57] bergie: yeah, but all cases of segfaults in Midgard that I've seen mgd_auth_midgard() has been the root cause
[10:57] torben: [Wed Sep 1 09:56:45 2004] [notice] child pid 17673 exit signal Segmentation fault (11)
[10:58] torben: if you take mgd_auth calls out of this component, it won't work either
[10:58] bergie: how many times do you mgd_auth_midgard() or mgd_unsetuid() in those page views?
[10:58] torben: at most twice as far as i see it.
[10:59] torben: once for authenticating the user that might be logged in, a second time for doing the actual db changes
[10:59] bergie: that can be the problem. OpenPSA has a Midgard segfault, too
[10:59] bergie: or at least had
[10:59] torben: i can't disable these auth's
[10:59] bergie: When we update an OpenPSA user record Midgard crashes quite soon. So the page just calls flush(); exit(); after the update
[11:00] torben: well, let me put it that way
[11:00] torben: if i disable all auth calls in the component, you have to ensure, that the component on-site runs with the privileges it requires to creat persons and events in the system
[11:00] torben: which means sg-admin privileges essentially
[11:01] torben: i just can't exit in the midst of the request, your customer does want a user interface, does he?
[11:02] bergie: obviously. but you could try making a redirect and quitting
[11:02] torben: *phone*
[11:02] Piotras joined the chat room.
[11:02] Piotras: hi all
[11:03] bergie: hi Piotras
[11:03] Piotras: bergie: there is old copyright info for midgard php module
[11:03] bergie: Piotras: I just remembered that OpenPSA has this same crash
[11:03] bergie: Piotras: also when we edit an user
[11:04] torben: re
[11:04] torben: is just writing a mail to dev
[11:04] bergie: 36: mgd_auth_midgard( $system_user, $system_pass, 0 );
[11:04] bergie: 37: $midgard = mgd_get_midgard();
[11:04] bergie: 50: if (is_object($person)) {
[11:04] bergie: 51: $res = $person->update();
[11:04] Piotras: bergie: should I point &copy; 2004 to midgard community?
[11:04] bergie: yep
[11:05] bergie: and in the snippet that called this snippet, we just have flush();
[11:05] Piotras: ok , I will do , by now I have mess in sources
[11:06] Piotras: bergie: I am not sure if what I wrote is 100% true , but what I found with google is exactly the same as midgard segfaults
[11:07] Piotras: I am afraid it is "unsolutionable"
[11:07] bergie: arg
[11:07] bergie: so what can we do?
[11:08] torben: bergie: i'm just checking this flush() / exit() combo
[11:09] torben: bergie: that _might_ be a workaround here
[11:09] torben: bergie: i think that's it
[11:09] torben: what happens now is:
[11:09] bergie: torben: basically, on those page views where you update a person record, you should only output something very simple, like redirect and *not* make *any* Midgard queries after the update()
[11:09] torben: 1. midcom processes the request, with or without buffering should be irrelevant here
[11:10] torben: 2. midcom comes to its end, the cache is updated (or not, depending on settings)
[11:10] torben: up to this point we are in a state like this
[11:10] torben: output has been generated, and is somewhere in the "queue" between php and apache, but not yet sent to the client.
[11:11] torben: up to now, the system would simply exit and segfault there, with this code still in this "cached" state, meaning not sent to the client
[11:11] torben: if you now insert a flush() before the exit(), you get this:
[11:11] torben: 3. the flush call forces php/apache to put the component's output into the network up to the client. This is essentially a blocking call until the client has recieved and confirmed the data, as far as i understand flush().
[11:12] torben: So this might be a DOS condition in some cases where the client doesn't confirm received data
[11:12] torben: but
[11:12] torben: just this will give us an unique advantage
[11:12] torben: if PHP/Apache segfaults after this line, we are sure that all data that has been generated so far makes it to the client.
[11:13] Piotras: torben: look at my mail at dev
[11:13] torben: the prove i had just now, three requests, each with segfaults, where the client acutally go the generated data, so that the client didn't even "see" that the server had much trouble
[11:13] Piotras: torben: about exit
[11:14] bergie: torben: ok, so flush(); solves this?
[11:14] torben: hm
[11:14] torben: i think so, yes
[11:14] torben: the trick now is, that i have to find all places where we exit()
[11:15] bergie: well, that at least buys us time to work on this segfault, as I can hopefully get the angry client off my back
[11:15] torben: i fear, that it doesn't work on Location http redirects right now
[11:15] Piotras: torben: I think that we could check sources to call mysql_free_result everywhere it is needed , and do not call mysql_free_result and the end of request
[11:15] bergie: torben: then we must JS redirect, I guess
[11:16] torben: sounds sensible
[11:16] torben: bergie: can use html here, no need for jS
[11:16] bergie: ok
[11:16] torben: bergie: actually, i think we have a different problem here
[11:17] torben: we do, yes
[11:17] torben: brb
[11:17] bergie: wonders that since this seems to be a Zend engine bug whether PHP5 upgrade would fix it
[11:18] bergie: torben: ??
[11:18] Piotras: bergie: BTW , did you noticed somehow my scandinavian languages skills? :)
[11:18] bergie: Piotras: yes, the "perkele" comment was impressive ;-)
[11:18] Piotras: :-D
[11:18] bergie: torben: I'd like to mail something about this to the client
[11:18] Piotras: bergie: look on google , the same segfault with PHP5
[11:19] bergie: ouch
[11:19] Piotras: very random ghost segfault
[11:19] bergie: Lets switch to .Net ;-)
[11:19] Piotras: ;)
[11:19] bergie: then we could at least blame Microsoft instead of these "shabby Open Source guys"
[11:20] Piotras: bergie: I can try to look for solution this week , but I make release delayed
[11:20] Piotras: lol
[11:20] torben: bergie: .net at least has a decent debugger and safe garbage collection
[11:20] bergie: Piotras: thanks!
[11:20] torben: bergie: as for the flush solution
[11:21] torben: it works in cases where we have a segfault at the end of the request
[11:21] bergie: but?
[11:21] torben: i'm just trying to create a new account and getting a sf there
[11:21] torben: the browser appearantly follows the redirect, as far as i see it, and the new request sf's after the datamanager has created a lock
[11:21] torben: i just don't know where exactly
[11:23] bergie: m-p.org is down?
[11:23] torben: ?
[11:23] bergie: hmm... cache for front page returned nothing. now it is ok
[11:23] torben: *args*
[11:23] torben: runs away screaming
[11:25] torben: i have an emergency
[11:25] torben: is off
[11:25] torben: bbl later this day
[11:25] torben left the chat room. ("Leaving")
[11:26] Kaukola: bergie: see you soon...
[11:26] Kaukola left the chat room.
<snip>
[11:40] bergie: Piotras: do you still have the link to the stuff you found on google about this segfault?
[11:41] bergie: Piotras: http:/​​/​​bergie.iki.fi/​​blog/​​2004/​​2004-09-01-000.html
[11:44] Piotras: http:/​​/​​aspn.activestate.com/​​ASPN/​​Mail/​​Message/​​php-Dev/​​2027706
[11:44] Piotras: compare backtrace to these ones reported by Torben and me
11:45] Piotras: http:/​​/​​groups.google.pl/​​groups?q=mysql_free_result+​​segfault​&hl=pl​&lr=​&ie=UTF-8​&selm=c34pre​​%241j1o​​%241​​%40FreeBSD.csie.NCTU.edu.tw​&rnum=1