Exceptional Slowness adding new Devices

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Moderators, Developers

Post Reply
Author
Message
mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Exceptional Slowness adding new Devices

#1 Post by mmccaugh » Wed Jul 17, 2019 3:35 pm

I have been looking at this for an hour or so, and think this 'might' be a side effect of the fix for Issue 2632 some months back.

When adding a new device, there is a very long delay between clicking "Create" and seeing the device added successfully dialogue, this ranges between 5 - 8 minutes in my testing and symptoms vary widely. I have increased the OID's per get, which did speed up the SNMP portion of this when looking at tcpdump but did not substantively affect the overall setup time which still remained well above 6 minutes.

I do see a large number of packets exchanged for SQL during this setup, though it is impossible to know which are related to this (I plan to turn on SQL Debugging next to look at what is happening in the background).

The question I had however is whether this slowness could be related to the fix we deployed for the issue noted above? If the underlying framework is expecting to write this locally to the poller, rather than back to the main node it could certainly explain it I imagine.. But then that begs the question of whether the entire issue was resolved in 1.2.4 (Note in Issue #2632 I noted there were other tables not being replicated which was breaking things, I am curious if these were identified and sorted, or if this has been otherwise addressed in 1.2.4?)

Or perhaps I am connecting two entirely unrelated issues, if there are suggestions around troubleshooting this I am open to them.. But this appears to hinge around data transfer between the remote poller, and the head server.

mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Re: Exceptional Slowness adding new Devices

#2 Post by mmccaugh » Wed Jul 17, 2019 3:54 pm

I turned on MySQL Logging and as soon as I add the new device I see :

37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
190717 21:49:37 37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'
37 Query DELETE FROM host_graph WHERE host_id = '667' AND graph_template_id = '22'

I don't even know how many of these, but thousands, it scrolled for several minutes before another poller cycle started and flooded the logs. I see it doing a delete for the exact same host_id and graph_id, then the graph_id will change and it runs a bunch more, poller cycle finished and it's still running these deletes.

I'm not sure what it is doing yet but it seems like just 1 "DELETE FROM host_graph WHERE host_id = '667'" would suffice, or if we are going to cycle through graph ID's a single delete will probably suffice.

Can confirm the moment this finished the UI showed the host added successfully dialogue.

mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Re: Exceptional Slowness adding new Devices

#3 Post by mmccaugh » Wed Jul 17, 2019 4:32 pm

https://github.com/Cacti/cacti/issues/2742

I am not sure if this was related to the issue I am seeing, I am going to test.. The delete I am seeing appears to be called by api_device.php as part of :

Code: Select all

       /* remove unused graph templates not assigned to the device template */
        $unused_graph_templates = db_fetch_assoc_prepared('SELECT
                hg.graph_template_id AS id, gt.name, result.gtid
                FROM host_graph AS hg
                LEFT JOIN graph_templates AS gt
                ON gt.id=hg.graph_template_id
                LEFT JOIN (
                        SELECT DISTINCT graph_template_id AS gtid
                        FROM graph_local AS gl
                        WHERE gl.host_id = ?
                        AND snmp_query_id = 0
                        UNION
                        SELECT DISTINCT graph_template_id AS gtid
                        FROM host_template_graph AS htg
                        WHERE htg.host_template_id = ?
                ) AS result
                ON hg.graph_template_id=result.gtid
                WHERE gt.id NOT IN (SELECT graph_template_id FROM snmp_query_graph)
            HAVING gtid IS NULL
            ORDER BY gt.name',
            array($host_id, $host_template_id)
        );

        if (cacti_sizeof($unused_graph_templates)) {
                foreach ($unused_graph_templates as $unused_graph_template) {
                        db_execute_prepared('DELETE
                                FROM host_graph
                                WHERE host_id = ?
                                AND graph_template_id = ?',
                                array($host_id, $unused_graph_template['id']));

                        if (($rcnn_id = poller_push_to_remote_db_connect($host_id)) !== false) {
                                db_execute_prepared('DELETE
                                        FROM host_graph
                                        WHERE host_id = ?
                                        AND graph_template_id = ?',
                                        array($host_id, $unused_graph_template['id']), true, $rcnn_id);
                        }
                }
        }
}
But that is only at a glance, still while that explains the multiple deletes by both host_id and graph_id, it doesn't explain why we do each delete over and over again.. While this array doesn't appear to be touched in the published fix, I am curious if the variables fixed also fixed this issue..

I will know soon.. If it doesn't I will keep digging.

User avatar
Osiris
Cacti Pro User
Posts: 863
Joined: Mon Jan 05, 2015 10:10 am

Re: Exceptional Slowness adding new Devices

#4 Post by Osiris » Wed Jul 17, 2019 4:47 pm

That's quite odd because it appears to be in a delete routine or removal routine I know that there was a big problem with automation that was fixed I would just suggest you upgrade ASAP.
Before history, there was a paradise, now dust.

mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Re: Exceptional Slowness adding new Devices

#5 Post by mmccaugh » Wed Jul 17, 2019 4:59 pm

Here is the fix I just tested.

Code: Select all

SELECT hg.graph_template_id AS id, gt.name, result.gtid FROM host_graph AS hg
LEFT JOIN graph_templates AS gt ON gt.id=hg.graph_template_id
LEFT JOIN (SELECT DISTINCT graph_template_id AS gtid FROM graph_local AS gl WHERE gl.host_id = 669 AND snmp_query_id = 0 UNION SELECT DISTINCT graph_template_id AS gtid FROM host_template_graph AS htg WHERE htg.host_template_id = 1)
AS result
ON hg.graph_template_id=result.gtid WHERE gt.id NOT IN (SELECT graph_template_id FROM snmp_query_graph) HAVING gtid IS NULL
ORDER BY gt.name;

Code: Select all

SELECT DISTINCT hg.graph_template_id AS id, gt.name, result.gtid FROM host_graph AS hg
LEFT JOIN graph_templates AS gt ON gt.id=hg.graph_template_id
LEFT JOIN (SELECT DISTINCT graph_template_id AS gtid FROM graph_local AS gl WHERE gl.host_id = 669 AND snmp_query_id = 0 UNION SELECT DISTINCT graph_template_id AS gtid FROM host_template_graph AS htg WHERE htg.host_template_id = 1)
AS result
ON hg.graph_template_id=result.gtid WHERE gt.id NOT IN (SELECT graph_template_id FROM snmp_query_graph) HAVING gtid IS NULL
ORDER BY gt.name;
Ignore the static values, but the initial select as a non distinct returns 3161 rows on my system, and 65 when run distinct.

Same host that was taking 6+ minutes before was added in ~15 seconds after this fix.

mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Re: Exceptional Slowness adding new Devices

#6 Post by mmccaugh » Wed Jul 17, 2019 5:12 pm

OK I think I know what was happening here, I've had this bite me a number of times over the years with joins, where basically results are duplicated exponentially based on the number of results in the outer select. So if someone had fewer results on the outer select then this issue is probably far less noticeable (Or if you have a faster SQL Server, or a faster link between servers). Which would explain why this has seemed to get worse as we grew the number of templates and devices in here.

I think the outer select should be a DISTINCT, but will leave that up to you guys.

My issue is definitely fixed, have added multiple hosts now and the responsiveness is unbelievably better.

User avatar
Osiris
Cacti Pro User
Posts: 863
Joined: Mon Jan 05, 2015 10:10 am

Re: Exceptional Slowness adding new Devices

#7 Post by Osiris » Thu Jul 18, 2019 8:31 pm

Yea, that is a big catch. I can not believe the pace of change in the last few years, and little buggers like this are becoming harder and harder to find. Looking forward to all the cool features on the issues list being delivered in the next years.
Before history, there was a paradise, now dust.

mmccaugh
Cacti User
Posts: 92
Joined: Mon Apr 09, 2018 1:37 pm

Re: Exceptional Slowness adding new Devices

#8 Post by mmccaugh » Mon Jul 22, 2019 8:33 am

Oh no complaints from me! I cannot believe the growth either, I have been using Cacti for a long time and the functionality now vs a decade ago is astonishing!

Assuming it's not been done already I will submit this change to Git today so it makes it's way into the next build perhaps!

Post Reply