Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
2.4.0
-
None
Description
discussion from Slack:
I think if you have a loader job that can be updating both factors of a composite at about the same time and either would result in a new composite membership, you can get an error like this. I think it's retried by the loader though.
Michael Gettes 20 hours ago
the group App:Two-Factor:Enrolled is a factor in about 18 other composite groups. Is it possible to do something better with this error? like detect if it is part of a composite and if so, just put out a one-liner of a INFO record or suppress the error if it can be determined it is a non-error?
Chris Hyzer 18 hours ago
I have bad memberships every day at penn, would be nice if this were addressed better, but unless there is an easy fix we need to focus on 2.5 and pspng I Think
Shilen Patel 17 hours ago
I think we have a jira about daemon job dependencies. I guess that's similar.
Shilen Patel 17 hours ago
btw, fwiw, a single loader job can easily cause the error that michael reported.
Chris Hyzer 16 hours ago
@Shilen Patel, maybe we could synchronize on the composite owner so the same JVM is less likely to do this (multiple threads editing the same composite)? Something like that? (edited)
Michael Gettes 16 hours ago
at least in my case with the one group being a composite member of 18 others - i think this would stomp on a large part of the issues i see.
Shilen Patel 16 hours ago
This issue is mainly that errors are appearing in logs that are making it harder to see real problems?
Michael Gettes 16 hours ago
yes, i think so.
Shilen Patel 16 hours ago
it's preventing bad memberships (due to a db constraint)
Shilen Patel 16 hours ago
And yeah synchronizing probably helps, though if it happens to be a longer running transaction (nested composites?) I'm not sure off hand how that would impact it?
Chris Hyzer 10 hours ago
yeah i dont know the exact cause but maybe there is something to do there... we can think about it later. maybe check bad memberships on a composite after updating? we can think of something
Chris Hyzer 10 hours ago
as long as what we are locking doesnt wait for lock on other things there wont be deadlock. but yes we will be careful there
Shilen Patel 19 minutes ago
There might be two different things. For the stack that Michael added here, I don't think there's an actual bad membership. It's just a log entry that should be ignored or logged as info or something. But I think the opposite race condition can happen too where a membership doesn't get added (or removed) when supposed to and bad membership daemon fixes that. But maybe there can be a change log consumer that quickly checks as well.