Uploaded image for project: 'Grouper'
  1. Grouper
  2. GRP-2580

Better detection of real vs non-real issues in logging of composite related membership adds

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • 2.4.0
    • API, grouperLoader
    • None

    Description

      discussion from Slack:
      I think if you have a loader job that can be updating both factors of a composite at about the same time and either would result in a new composite membership, you can get an error like this. I think it's retried by the loader though.
      Michael Gettes  20 hours ago

      the group App:Two-Factor:Enrolled is a factor in about 18 other composite groups. Is it possible to do something better with this error? like detect if it is part of a composite and if so, just put out a one-liner of a INFO record or suppress the error if it can be determined it is a non-error?
      Chris Hyzer  18 hours ago

      I have bad memberships every day at penn, would be nice if this were addressed better, but unless there is an easy fix we need to focus on 2.5 and pspng I Think
      Shilen Patel  17 hours ago

      I think we have a jira about daemon job dependencies. I guess that's similar.
      Shilen Patel  17 hours ago

      btw, fwiw, a single loader job can easily cause the error that michael reported.
      Chris Hyzer  16 hours ago

      @Shilen Patel, maybe we could synchronize on the composite owner so the same JVM is less likely to do this (multiple threads editing the same composite)? Something like that?  (edited) 
      Michael Gettes  16 hours ago

      at least in my case with the one group being a composite member of 18 others - i think this would stomp on a large part of the issues i see.
      Shilen Patel  16 hours ago

      This issue is mainly that errors are appearing in logs that are making it harder to see real problems?
      Michael Gettes  16 hours ago

      yes, i think so.
      Shilen Patel  16 hours ago

      it's preventing bad memberships (due to a db constraint)
      Shilen Patel  16 hours ago

      And yeah synchronizing probably helps, though if it happens to be a longer running transaction (nested composites?) I'm not sure off hand how that would impact it?
      Chris Hyzer  10 hours ago

      yeah i dont know the exact cause but maybe there is something to do there... we can think about it later. maybe check bad memberships on a composite after updating? we can think of something
      Chris Hyzer  10 hours ago

      as long as what we are locking doesnt wait for lock on other things there wont be deadlock. but yes we will be careful there
      Shilen Patel  19 minutes ago

      There might be two different things. For the stack that Michael added here, I don't think there's an actual bad membership. It's just a log entry that should be ignored or logged as info or something. But I think the opposite race condition can happen too where a membership doesn't get added (or removed) when supposed to and bad membership daemon fixes that. But maybe there can be a change log consumer that quickly checks as well.

      Attachments

        Activity

          People

            chris.hyzer@at.internet2.edu Chris Hyzer (upenn.edu)
            michael.gettes.4@at.internet2.edu Michael Gettes (ufl.edu)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: