[PATCH 00/20] buildman: Add distributed builds
From: Simon Glass <sjg@chromium.org> This series adds distributed build support to buildman, allowing board builds to be spread across a pool of remote machines over SSH. The first few patches prepare Builder for distributed use: adding parameters for custom thread classes, signal handling, lazy worktree setup, and splitting build_boards() so remote workers can queue boards incrementally. The machine module probes remote hosts over SSH to collect their capabilities (CPUs, memory, load, toolchains) and manages toolchain version checking and fetching. The worker module runs on remote machines in --worker mode, accepting JSON commands on stdin and streaming results on stdout. It reuses Builder and BuilderThread with a custom subclass that sends results over the SSH pipe instead of writing to disk. The boss module manages SSH connections to workers, handling startup, source push via git, demand-driven board dispatch, result collection with timeouts, and clean shutdown on Ctrl-C. The wire-up patch connects everything through control.py with --dist, --use-machines, and --no-local flags. Simon Glass (20): u_boot_pylib: Add stdin_data support to run_pipe() u_boot_pylib: Support passing modules to run_test_suites() buildman: Remove unused imports from builder.py buildman: Fix pylint warnings in test_builder.py buildman: Fix import order in control.py buildman: Initialise _timestamps to an empty deque buildman: Add thread_class param to Builder buildman: Add handle_signals param to Builder buildman: Add lazy thread setup to Builder buildman: Split build_boards() into two methods buildman: Add distributed-build attributes to Builder buildman: Cache the kconfig-changed check per commit buildman: Add dynamic job-count setting buildman: Pass NPROC to make for LTO parallelism buildman: Add remote machine probing for distributed builds buildman: Install toolchains on remote machines buildman: Add worker mode for distributed builds buildman: Add boss module for driving remote workers buildman: Wire up distributed builds with WorkerPool buildman: Document distributed builds in the README tools/buildman/boss.py | 1507 ++++++++++++++++++ tools/buildman/bsettings.py | 10 +- tools/buildman/builder.py | 228 ++- tools/buildman/builderthread.py | 94 +- tools/buildman/buildman.rst | 132 ++ tools/buildman/cmdline.py | 30 + tools/buildman/control.py | 313 +++- tools/buildman/func_test.py | 9 +- tools/buildman/machine.py | 923 +++++++++++ tools/buildman/main.py | 6 + tools/buildman/test.py | 3 + tools/buildman/test_boss.py | 2645 +++++++++++++++++++++++++++++++ tools/buildman/test_builder.py | 109 +- tools/buildman/test_machine.py | 1080 +++++++++++++ tools/buildman/test_worker.py | 896 +++++++++++ tools/buildman/worker.py | 985 ++++++++++++ tools/u_boot_pylib/command.py | 13 +- tools/u_boot_pylib/test_util.py | 49 +- 18 files changed, 8877 insertions(+), 155 deletions(-) create mode 100644 tools/buildman/boss.py create mode 100644 tools/buildman/machine.py create mode 100644 tools/buildman/test_boss.py create mode 100644 tools/buildman/test_machine.py create mode 100644 tools/buildman/test_worker.py create mode 100644 tools/buildman/worker.py -- 2.43.0 base-commit: cb53f60f48853713f398b86a12702304c82bdde7 branch: bmt
From: Simon Glass <sjg@chromium.org> Add a stdin_data parameter to run_pipe() so callers can pipe string data to the first command's stdin without needing a temporary file. When stdin_data is provided, stdin is set to PIPE for the subprocess and the data is passed to communicate_filter() as input_buf, which already handles writing to stdin. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/u_boot_pylib/command.py | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/u_boot_pylib/command.py b/tools/u_boot_pylib/command.py index 6b3f9fe59bf..370713bdf41 100644 --- a/tools/u_boot_pylib/command.py +++ b/tools/u_boot_pylib/command.py @@ -66,7 +66,7 @@ class CommandResult: def run_pipe(pipe_list, infile=None, outfile=None, capture=False, capture_stderr=False, oneline=False, raise_on_error=True, cwd=None, - binary=False, output_func=None, **kwargs): + binary=False, output_func=None, stdin_data=None, **kwargs): """ Perform a command pipeline, with optional input/output filenames. @@ -86,6 +86,7 @@ def run_pipe(pipe_list, infile=None, outfile=None, capture=False, binary (bool): True to report binary output, False to use strings output_func (function): Output function to call with each output fragment (if it returns True the function terminates) + stdin_data (str or None): Data to send to the first command's stdin **kwargs: Additional keyword arguments to cros_subprocess.Popen() Returns: CommandResult object @@ -113,6 +114,8 @@ def run_pipe(pipe_list, infile=None, outfile=None, capture=False, kwargs['stdin'] = last_pipe.stdout elif infile: kwargs['stdin'] = open(infile, 'rb') + elif stdin_data: + kwargs['stdin'] = cros_subprocess.PIPE if pipeline or capture: kwargs['stdout'] = cros_subprocess.PIPE elif outfile: @@ -131,8 +134,14 @@ def run_pipe(pipe_list, infile=None, outfile=None, capture=False, return result.to_output(binary) if capture: + if stdin_data and isinstance(stdin_data, str): + input_buf = stdin_data.encode('utf-8') + elif stdin_data: + input_buf = stdin_data + else: + input_buf = b'' result.stdout, result.stderr, result.combined = ( - last_pipe.communicate_filter(output_func)) + last_pipe.communicate_filter(output_func, input_buf)) if result.stdout and oneline: result.output = result.stdout.rstrip(b'\r\n') result.return_code = last_pipe.wait() -- 2.43.0
From: Simon Glass <sjg@chromium.org> Allow passing a module object to run_test_suites() in addition to test classes and doctest module name strings. When a module is passed, all TestCase subclasses are extracted automatically. This avoids having to enumerate every test class when registering a test module. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/u_boot_pylib/test_util.py | 49 ++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/tools/u_boot_pylib/test_util.py b/tools/u_boot_pylib/test_util.py index b1c8740d883..90ae0cee6ce 100644 --- a/tools/u_boot_pylib/test_util.py +++ b/tools/u_boot_pylib/test_util.py @@ -203,24 +203,41 @@ def run_test_suites(toolname, debug, verbosity, no_capture, test_preserve_dirs, if isinstance(module, str) and (not test_name or test_name == module): suite.addTests(doctest.DocTestSuite(module)) - for module in class_and_module_list: - if isinstance(module, str): + for entry in class_and_module_list: + if isinstance(entry, str): continue - # Test the test module about our arguments, if it is interested - if hasattr(module, 'setup_test_args'): - setup_test_args = getattr(module, 'setup_test_args') - setup_test_args(preserve_indir=test_preserve_dirs, - preserve_outdirs=test_preserve_dirs and test_name is not None, - toolpath=toolpath, verbosity=verbosity, no_capture=no_capture) - if test_name: - # Since Python v3.5 If an ImportError or AttributeError occurs - # while traversing a name then a synthetic test that raises that - # error when run will be returned. Check that the requested test - # exists, otherwise these errors are included in the results. - if test_name in loader.getTestCaseNames(module): - suite.addTests(loader.loadTestsFromName(test_name, module)) + + # If entry is a module, extract all TestCase subclasses from it + if hasattr(entry, '__file__'): + classes = [obj for obj in vars(entry).values() + if (isinstance(obj, type) + and issubclass(obj, unittest.TestCase) + and obj is not unittest.TestCase)] else: - suite.addTests(loader.loadTestsFromTestCase(module)) + classes = [entry] + + for module in classes: + # Tell the test module about our arguments, if interested + if hasattr(module, 'setup_test_args'): + setup_test_args = getattr(module, 'setup_test_args') + setup_test_args( + preserve_indir=test_preserve_dirs, + preserve_outdirs=(test_preserve_dirs + and test_name is not None), + toolpath=toolpath, verbosity=verbosity, + no_capture=no_capture) + if test_name: + # Since Python v3.5 If an ImportError or + # AttributeError occurs while traversing a name then + # a synthetic test that raises that error when run + # will be returned. Check that the requested test + # exists, otherwise these errors are included in the + # results. + if test_name in loader.getTestCaseNames(module): + suite.addTests( + loader.loadTestsFromName(test_name, module)) + else: + suite.addTests(loader.loadTestsFromTestCase(module)) print(f" Running {toolname} tests ".center(70, "=")) result = runner.run(suite) -- 2.43.0
From: Simon Glass <sjg@chromium.org> Remove BoardStatus, ErrLine and ResultHandler imports that are not used directly in builder.py. These are imported where they are actually needed (outcome.py and resulthandler.py). Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index f2cbad53c30..2c8d5c5b17b 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -21,10 +21,9 @@ import threading from buildman import builderthread from buildman.cfgutil import Config, process_config -from buildman.outcome import (BoardStatus, DisplayOptions, ErrLine, Outcome, +from buildman.outcome import (DisplayOptions, Outcome, OUTCOME_OK, OUTCOME_WARNING, OUTCOME_ERROR, OUTCOME_UNKNOWN) -from buildman.resulthandler import ResultHandler from u_boot_pylib import command from u_boot_pylib import gitutil from u_boot_pylib import terminal -- 2.43.0
From: Simon Glass <sjg@chromium.org> Remove unused OUTCOME_WARNING import, prefix unused mock arguments with underscore, and replace unnecessary lambda wrappers around datetime with direct references. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/test_builder.py | 61 ++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/tools/buildman/test_builder.py b/tools/buildman/test_builder.py index ec1e0301a6e..70a8b365f2a 100644 --- a/tools/buildman/test_builder.py +++ b/tools/buildman/test_builder.py @@ -4,6 +4,8 @@ """Unit tests for builder.py""" +# pylint: disable=W0212 + from datetime import datetime import os import shutil @@ -12,7 +14,7 @@ from unittest import mock from buildman import builder from buildman import builderthread -from buildman.outcome import (DisplayOptions, OUTCOME_OK, OUTCOME_WARNING, +from buildman.outcome import (DisplayOptions, OUTCOME_OK, OUTCOME_ERROR, OUTCOME_UNKNOWN) from buildman.resulthandler import ResultHandler from u_boot_pylib import gitutil @@ -183,7 +185,7 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(gitutil, 'fetch') @mock.patch.object(os.path, 'isdir', return_value=True) @mock.patch.object(builderthread, 'mkdir') - def test_existing_clone(self, mock_mkdir, mock_isdir, mock_fetch): + def test_existing_clone(self, _mock_mkdir, _mock_isdir, mock_fetch): """Test with existing git clone (fetches updates)""" terminal.get_print_test_lines() # Clear self.builder._prepare_thread(0, 'clone') @@ -196,7 +198,7 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=True) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_existing_worktree(self, mock_mkdir, mock_isdir, mock_isfile): + def test_existing_worktree(self, _mock_mkdir, _mock_isdir, _mock_isfile): """Test with existing worktree (no action needed)""" terminal.get_print_test_lines() # Clear self.builder._prepare_thread(0, 'worktree') @@ -209,8 +211,8 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=False) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_invalid_git_dir(self, mock_mkdir, mock_isdir, mock_isfile, - mock_exists): + def test_invalid_git_dir(self, _mock_mkdir, _mock_isdir, _mock_isfile, + _mock_exists): """Test with git_dir that exists but is neither file nor directory""" with self.assertRaises(ValueError) as ctx: self.builder._prepare_thread(0, 'clone') @@ -222,8 +224,8 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=False) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_create_worktree(self, mock_mkdir, mock_isdir, mock_isfile, - mock_exists, mock_add_worktree): + def test_create_worktree(self, _mock_mkdir, _mock_isdir, _mock_isfile, + _mock_exists, mock_add_worktree): """Test creating a new worktree""" terminal.get_print_test_lines() # Clear self.builder._prepare_thread(0, 'worktree') @@ -238,8 +240,8 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=False) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_create_clone(self, mock_mkdir, mock_isdir, mock_isfile, - mock_exists, mock_clone): + def test_create_clone(self, _mock_mkdir, _mock_isdir, _mock_isfile, + _mock_exists, mock_clone): """Test creating a new clone""" terminal.get_print_test_lines() # Clear self.builder._prepare_thread(0, 'clone') @@ -254,8 +256,8 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=False) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_create_clone_with_true(self, mock_mkdir, mock_isdir, mock_isfile, - mock_exists, mock_clone): + def test_create_clone_with_true(self, _mock_mkdir, _mock_isdir, + _mock_isfile, _mock_exists, mock_clone): """Test creating a clone when setup_git=True""" terminal.get_print_test_lines() # Clear self.builder._prepare_thread(0, True) @@ -266,8 +268,8 @@ class TestPrepareThread(unittest.TestCase): @mock.patch.object(os.path, 'isfile', return_value=False) @mock.patch.object(os.path, 'isdir', return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_invalid_setup_git(self, mock_mkdir, mock_isdir, mock_isfile, - mock_exists): + def test_invalid_setup_git(self, _mock_mkdir, _mock_isdir, _mock_isfile, + _mock_exists): """Test with invalid setup_git value""" with self.assertRaises(ValueError) as ctx: self.builder._prepare_thread(0, 'invalid') @@ -304,9 +306,10 @@ class TestPrepareWorkingSpace(unittest.TestCase): @mock.patch.object(builder.Builder, '_prepare_thread') @mock.patch.object(gitutil, 'prune_worktrees') - @mock.patch.object(gitutil, 'check_worktree_is_available', return_value=True) + @mock.patch.object(gitutil, 'check_worktree_is_available', + return_value=True) @mock.patch.object(builderthread, 'mkdir') - def test_worktree_available(self, mock_mkdir, mock_check_worktree, + def test_worktree_available(self, _mock_mkdir, mock_check_worktree, mock_prune, mock_prepare_thread): """Test when worktree is available""" self.builder._prepare_working_space(3, True) @@ -320,9 +323,10 @@ class TestPrepareWorkingSpace(unittest.TestCase): mock_prepare_thread.assert_any_call(2, 'worktree') @mock.patch.object(builder.Builder, '_prepare_thread') - @mock.patch.object(gitutil, 'check_worktree_is_available', return_value=False) + @mock.patch.object(gitutil, 'check_worktree_is_available', + return_value=False) @mock.patch.object(builderthread, 'mkdir') - def test_worktree_not_available(self, mock_mkdir, mock_check_worktree, + def test_worktree_not_available(self, _mock_mkdir, mock_check_worktree, mock_prepare_thread): """Test when worktree is not available (falls back to clone)""" self.builder._prepare_working_space(2, True) @@ -335,7 +339,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): @mock.patch.object(builder.Builder, '_prepare_thread') @mock.patch.object(builderthread, 'mkdir') - def test_zero_threads(self, mock_mkdir, mock_prepare_thread): + def test_zero_threads(self, _mock_mkdir, mock_prepare_thread): """Test with max_threads=0 (should still prepare 1 thread)""" self.builder._prepare_working_space(0, False) @@ -345,7 +349,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): @mock.patch.object(builder.Builder, '_prepare_thread') @mock.patch.object(builderthread, 'mkdir') - def test_no_git_dir(self, mock_mkdir, mock_prepare_thread): + def test_no_git_dir(self, _mock_mkdir, mock_prepare_thread): """Test with no git_dir set""" self.builder.git_dir = None self.builder._prepare_working_space(2, True) @@ -394,7 +398,7 @@ class TestShowNotBuilt(unittest.TestCase): self.assertEqual(len(lines), 0) def test_some_boards_unknown(self): - """Test when some boards have OUTCOME_UNKNOWN (e.g. missing toolchain)""" + """Test when some boards have OUTCOME_UNKNOWN""" board_selected = {'board1': None, 'board2': None, 'board3': None} board_dict = { 'board1': self._make_outcome(OUTCOME_OK), @@ -430,7 +434,7 @@ class TestShowNotBuilt(unittest.TestCase): self.assertIn('board2', lines[0].text) def test_build_error_not_counted(self): - """Test that build errors (not toolchain) are not counted as 'not built'""" + """Test that build errors are not counted as 'not built'""" board_selected = {'board1': None, 'board2': None} board_dict = { 'board1': self._make_outcome(OUTCOME_OK), @@ -450,8 +454,9 @@ class TestShowNotBuilt(unittest.TestCase): board_selected = {'board1': None, 'board2': None, 'board3': None} board_dict = { 'board1': self._make_outcome(OUTCOME_OK), - 'board2': self._make_outcome(OUTCOME_ERROR, - ['Tool chain error for arm: not found']), + 'board2': self._make_outcome( + OUTCOME_ERROR, + ['Tool chain error for arm: not found']), 'board3': self._make_outcome(OUTCOME_ERROR, ['error: some build error']), } @@ -468,7 +473,7 @@ class TestShowNotBuilt(unittest.TestCase): self.assertNotIn('board3', lines[0].text) def test_board_not_in_dict(self): - """Test that boards missing from board_dict are counted as 'not built'""" + """Test boards missing from board_dict count as 'not built'""" board_selected = {'board1': None, 'board2': None, 'board3': None} board_dict = { 'board1': self._make_outcome(OUTCOME_OK), @@ -487,7 +492,7 @@ class TestShowNotBuilt(unittest.TestCase): class TestPrepareOutputSpace(unittest.TestCase): - """Tests for Builder._prepare_output_space() and _get_output_space_removals()""" + """Tests for _prepare_output_space() and _get_output_space_removals()""" def setUp(self): """Set up test fixtures""" @@ -692,7 +697,7 @@ class TestMake(unittest.TestCase): mock_run_one.return_value = mock_result # Simulate loop detection by setting _terminated during the call - def side_effect(*args, **kwargs): + def side_effect(*_args, **kwargs): # Simulate output_func being called with loop data output_func = kwargs.get('output_func') if output_func: @@ -822,7 +827,7 @@ class TestPrintBuildSummary(unittest.TestCase): start_time = datetime(2024, 1, 1, 12, 0, 0) end_time = datetime(2024, 1, 1, 12, 0, 10) mock_datetime.now.return_value = end_time - mock_datetime.side_effect = lambda *args, **kwargs: datetime(*args, **kwargs) + mock_datetime.side_effect = datetime terminal.get_print_test_lines() # Clear self.handler.print_build_summary(100, 0, 0, start_time, []) @@ -839,7 +844,7 @@ class TestPrintBuildSummary(unittest.TestCase): start_time = datetime(2024, 1, 1, 12, 0, 0) end_time = datetime(2024, 1, 1, 12, 0, 10, 600000) # 10.6 seconds mock_datetime.now.return_value = end_time - mock_datetime.side_effect = lambda *args, **kwargs: datetime(*args, **kwargs) + mock_datetime.side_effect = datetime terminal.get_print_test_lines() # Clear self.handler.print_build_summary(100, 0, 0, start_time, []) -- 2.43.0
From: Simon Glass <sjg@chromium.org> Move the patman import after u_boot_pylib imports to satisfy pylint's C0411 (wrong-import-order) check. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/control.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/buildman/control.py b/tools/buildman/control.py index 7fc0d20574a..989057db60e 100644 --- a/tools/buildman/control.py +++ b/tools/buildman/control.py @@ -22,7 +22,6 @@ from buildman import toolchain from buildman.builder import Builder from buildman.outcome import DisplayOptions from buildman.resulthandler import ResultHandler -from patman import patchstream import qconfig from u_boot_pylib import command from u_boot_pylib import gitutil @@ -30,6 +29,8 @@ from u_boot_pylib import terminal from u_boot_pylib import tools from u_boot_pylib.terminal import print_clear, tprint +from patman import patchstream + TEST_BUILDER = None # Space-separated list of buildman process IDs currently running jobs @@ -564,7 +565,8 @@ def run_builder(builder, commits, board_selected, display_options, args): args.threads, args.jobs)) builder.set_display_options( - display_options, args.filter_dtb_warnings, args.filter_migration_warnings) + display_options, args.filter_dtb_warnings, + args.filter_migration_warnings) if args.summary: builder.commits = commits builder.result_handler.show_summary( -- 2.43.0
From: Simon Glass <sjg@chromium.org> The demand-driven worker mode (introduced later in this series) queues boards one at a time and may call process_result() before the first board has been set up, which crashes on None.append() Change _timestamps from None to an empty deque. The existing build_boards() flow always calls _setup_build() before process_result(), so this is safe for current callers. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index 2c8d5c5b17b..f704cc86943 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -362,7 +362,7 @@ class Builder: self.commit_count = 0 self.commits = None self.count = 0 - self._timestamps = None + self._timestamps = collections.deque() self._verbose = False # Note: baseline state for result summaries is now in ResultHandler -- 2.43.0
From: Simon Glass <sjg@chromium.org> The distributed-build worker needs a custom BuilderThread subclass that sends results over SSH instead of writing to disk. Add a thread_class parameter to Builder.__init__() so the caller can override the thread class used in _setup_threads() The default is builderthread.BuilderThread Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index f704cc86943..f64331d0fb1 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -230,7 +230,8 @@ class Builder: force_build_failures=False, kconfig_check=True, force_reconfig=False, in_tree=False, force_config_on_failure=False, make_func=None, - dtc_skip=False, build_target=None): + dtc_skip=False, build_target=None, + thread_class=builderthread.BuilderThread): """Create a new Builder object Args: @@ -284,6 +285,10 @@ class Builder: make_func (function): Function to call to run 'make' dtc_skip (bool): True to skip building dtc and use the system one build_target (str): Build target to use (None to use the default) + thread_class (type): BuilderThread subclass to use (default + builderthread.BuilderThread). This allows the caller to + override how results are processed, e.g. sending over SSH + instead of writing to disk. """ self.toolchains = toolchains self.base_dir = base_dir @@ -367,6 +372,7 @@ class Builder: # Note: baseline state for result summaries is now in ResultHandler + self._thread_class = thread_class self._setup_threads(mrproper, per_board_out_dir, test_thread_exceptions) ignore_lines = ['(make.*Waiting for unfinished)', @@ -392,7 +398,7 @@ class Builder: self.queue = queue.Queue() self.out_queue = queue.Queue() for i in range(self._num_threads): - t = builderthread.BuilderThread( + t = self._thread_class( self, i, mrproper, per_board_out_dir, test_exception=test_thread_exceptions) t.daemon = True @@ -404,7 +410,7 @@ class Builder: t.start() self._threads.append(t) else: - self._single_builder = builderthread.BuilderThread( + self._single_builder = self._thread_class( self, -1, mrproper, per_board_out_dir) def __del__(self): -- 2.43.0
From: Simon Glass <sjg@chromium.org> The distributed-build worker manages its own signal handling (SIGTERM, SIGINT, SIGHUP) to clean up child processes on exit. Add a handle_signals parameter (default True) so the worker can skip Builder's built-in SIGINT handler and avoid conflicting with its own. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index f64331d0fb1..53ec604f654 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -231,7 +231,8 @@ class Builder: force_reconfig=False, in_tree=False, force_config_on_failure=False, make_func=None, dtc_skip=False, build_target=None, - thread_class=builderthread.BuilderThread): + thread_class=builderthread.BuilderThread, + handle_signals=True): """Create a new Builder object Args: @@ -289,6 +290,9 @@ class Builder: builderthread.BuilderThread). This allows the caller to override how results are processed, e.g. sending over SSH instead of writing to disk. + handle_signals (bool): True to register SIGINT handler (default + True). Set to False when running inside a worker that has + its own signal handling. """ self.toolchains = toolchains self.base_dir = base_dir @@ -380,7 +384,8 @@ class Builder: self._re_make_err = re.compile('|'.join(ignore_lines)) # Handle existing graceful with SIGINT / Ctrl-C - signal.signal(signal.SIGINT, self._signal_handler) + if handle_signals: + signal.signal(signal.SIGINT, self._signal_handler) def _setup_threads(self, mrproper, per_board_out_dir, test_thread_exceptions): -- 2.43.0
From: Simon Glass <sjg@chromium.org> Setting up worktrees or clones for every thread sequentially in _prepare_working_space() is slow on machines with many threads (e.g. 256 on ruru), and the distributed-build worker does not need it since each thread sets up its own worktree on first use. Add a lazy_thread_setup parameter that skips the per-thread setup loop. Only the working directory is created and the git setup type (worktree vs clone) is determined via a new _detect_git_setup() method and stored in _setup_git, so that the deferred prepare_thread() calls can use it. Extract _detect_git_setup() from _prepare_working_space() so that both the eager and lazy paths share the same detection logic. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 50 ++++++++++++++++++++++++++++------ tools/buildman/test_builder.py | 24 ++++++++++++++-- 2 files changed, 63 insertions(+), 11 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index 53ec604f654..8c83203cf8e 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -232,7 +232,7 @@ class Builder: in_tree=False, force_config_on_failure=False, make_func=None, dtc_skip=False, build_target=None, thread_class=builderthread.BuilderThread, - handle_signals=True): + handle_signals=True, lazy_thread_setup=False): """Create a new Builder object Args: @@ -310,6 +310,7 @@ class Builder: self.kconfig_reconfig = 0 self.force_build = False self.git_dir = git_dir + self._setup_git = False self._timestamp_count = 10 self._build_period_us = None self._complete_delay = None @@ -377,6 +378,7 @@ class Builder: # Note: baseline state for result summaries is now in ResultHandler self._thread_class = thread_class + self._lazy_thread_setup = lazy_thread_setup self._setup_threads(mrproper, per_board_out_dir, test_thread_exceptions) ignore_lines = ['(make.*Waiting for unfinished)', @@ -1047,6 +1049,18 @@ class Builder: return self._working_dir return os.path.join(self._working_dir, f'{max(thread_num, 0):02d}') + def prepare_thread(self, thread_num): + """Prepare a single thread's working directory on demand + + This can be called by a BuilderThread to lazily set up its + worktree/clone on first use, rather than doing all threads upfront. + Uses the git setup method determined by _detect_git_setup(). + + Args: + thread_num (int): Thread number (0, 1, ...) + """ + self._prepare_thread(thread_num, self._setup_git) + def _prepare_thread(self, thread_num, setup_git): """Prepare the working directory for a thread. @@ -1103,6 +1117,11 @@ class Builder: Set up the git repo for each thread. Creates a linked working tree if git-worktree is available, or clones the repo if it isn't. + When lazy_thread_setup is True, only the working directory and git + setup type are determined here. Each thread sets up its own + worktree/clone on first use via prepare_thread(), which avoids a + long sequential setup phase on machines with many threads. + Args: max_threads: Maximum number of threads we expect to need. If 0 then 1 is set up, since the main process still needs somewhere to @@ -1110,20 +1129,35 @@ class Builder: setup_git: True to set up a git worktree or a git clone """ builderthread.mkdir(self._working_dir) + + self._setup_git = self._detect_git_setup(setup_git) + + if self._lazy_thread_setup: + return + + # Always do at least one thread + for thread in range(max(max_threads, 1)): + self._prepare_thread(thread, self._setup_git) + + def _detect_git_setup(self, setup_git): + """Determine which git setup method to use + + Args: + setup_git: True to set up git, False to skip + + Returns: + str or False: 'worktree', 'clone', or False + """ if setup_git and self.git_dir: src_dir = os.path.abspath(self.git_dir) if gitutil.check_worktree_is_available(src_dir): - setup_git = 'worktree' # If we previously added a worktree but the directory for it # got deleted, we need to prune its files from the repo so # that we can check out another in its place. gitutil.prune_worktrees(src_dir) - else: - setup_git = 'clone' - - # Always do at least one thread - for thread in range(max(max_threads, 1)): - self._prepare_thread(thread, setup_git) + return 'worktree' + return 'clone' + return False def _get_output_space_removals(self): """Get the output directories ready to receive files. diff --git a/tools/buildman/test_builder.py b/tools/buildman/test_builder.py index 70a8b365f2a..48be83cf645 100644 --- a/tools/buildman/test_builder.py +++ b/tools/buildman/test_builder.py @@ -354,10 +354,28 @@ class TestPrepareWorkingSpace(unittest.TestCase): self.builder.git_dir = None self.builder._prepare_working_space(2, True) - # setup_git should remain True but git operations skipped + # _detect_git_setup returns False when git_dir is None self.assertEqual(mock_prepare_thread.call_count, 2) - mock_prepare_thread.assert_any_call(0, True) - mock_prepare_thread.assert_any_call(1, True) + mock_prepare_thread.assert_any_call(0, False) + mock_prepare_thread.assert_any_call(1, False) + + @mock.patch.object(builder.Builder, '_prepare_thread') + @mock.patch.object(gitutil, 'prune_worktrees') + @mock.patch.object(gitutil, 'check_worktree_is_available', + return_value=True) + @mock.patch.object(builderthread, 'mkdir') + def test_lazy_setup(self, _mock_mkdir, mock_check_worktree, + mock_prune, mock_prepare_thread): + """Test lazy_thread_setup skips upfront thread preparation""" + self.builder._lazy_thread_setup = True + self.builder._prepare_working_space(4, True) + + # Git setup type is detected so prepare_thread() can use it + # later, but no threads are prepared upfront + self.assertEqual(self.builder._setup_git, 'worktree') + mock_check_worktree.assert_called_once() + mock_prune.assert_called_once() + mock_prepare_thread.assert_not_called() class TestShowNotBuilt(unittest.TestCase): -- 2.43.0
From: Simon Glass <sjg@chromium.org> The distributed-build worker queues boards one at a time as they arrive from the boss, so it needs to separate job creation from the blocking wait-for-completion. Split build_boards() into two methods: - init_build(): sets up working space, output dirs and queues jobs - run_build(): blocks until all jobs finish and prints the summary Keep build_boards() as a thin wrapper that calls both, so existing callers are unchanged. Extract print_summary() so that callers which use init_build() / run_build() directly can print the summary later. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 67 +++++++++++++++++++++++++++++++-------- 1 file changed, 54 insertions(+), 13 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index 8c83203cf8e..716bf43a66f 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -1199,9 +1199,13 @@ class Builder: shutil.rmtree(dirname) terminal.print_clear() - def build_boards(self, commits, board_selected, keep_outputs, verbose, - fragments): - """Build all commits for a list of boards + def init_build(self, commits, board_selected, keep_outputs, verbose, + fragments): + """Initialise a build: prepare working space and create jobs + + This sets up the working directory, output space and job queue + but does not start the build threads. Call run_build() after + this to start the build. Args: commits (list): List of commits to be build, each a Commit object @@ -1210,12 +1214,6 @@ class Builder: keep_outputs (bool): True to save build output files verbose (bool): Display build results as they are completed fragments (str): config fragments added to defconfig - - Returns: - tuple: Tuple containing: - - number of boards that failed to build - - number of boards that issued warnings - - list of thread exceptions raised """ self.commit_count = len(commits) if commits else 1 self.commits = commits @@ -1224,7 +1222,7 @@ class Builder: self._result_handler.reset_result_summary(board_selected) builderthread.mkdir(self.base_dir, parents = True) self._prepare_working_space(min(self._num_threads, len(board_selected)), - commits is not None) + board_selected and commits is not None) self._prepare_output_space() if not self._opts.ide: tprint('\rStarting build...', newline=False) @@ -1247,6 +1245,18 @@ class Builder: else: self._single_builder.run_job(job) + def run_build(self): + """Run the build to completion + + Waits for all jobs to finish and prints a summary. + Call init_build() first to set up the jobs. + + Returns: + tuple: Tuple containing: + - number of boards that failed to build + - number of boards that issued warnings + - list of thread exceptions raised + """ if self._num_threads: term = threading.Thread(target=self.queue.join) term.daemon = True @@ -1257,8 +1267,39 @@ class Builder: # Wait until we have processed all output self.out_queue.join() if not self._opts.ide: - self._result_handler.print_build_summary( - self.count, self._already_done, self.kconfig_reconfig, - self._start_time, self.thread_exceptions) + self.print_summary() return (self.fail, self._warned, self.thread_exceptions) + + def build_boards(self, commits, board_selected, keep_outputs, verbose, + fragments): + """Build all commits for a list of boards + + Convenience method that calls init_build() then run_build(). + + Args: + commits (list): List of commits to be build, each a Commit object + board_selected (dict): Dict of selected boards, key is target name, + value is Board object + keep_outputs (bool): True to save build output files + verbose (bool): Display build results as they are completed + fragments (str): config fragments added to defconfig + + Returns: + tuple: Tuple containing: + - number of boards that failed to build + - number of boards that issued warnings + - list of thread exceptions raised + """ + self.init_build(commits, board_selected, keep_outputs, verbose, + fragments) + return self.run_build() + + def print_summary(self): + """Print the build summary line + + Shows total built, time taken, and any thread exceptions. + """ + self._result_handler.print_build_summary( + self.count, self._already_done, self.kconfig_reconfig, + self._start_time, self.thread_exceptions) -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add active_boards and max_boards counters (with lock) for use by the dynamic -j calculation in BuilderThread, so each make invocation gets an appropriate number of jobs based on how many boards are currently in flight. Add an extra_count parameter to init_build() and build_boards() so the progress total can include builds running on remote workers. Add a delay_summary parameter to run_build() and build_boards() so the boss can defer the summary until remote results have been collected. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index 716bf43a66f..de895d43cfe 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -306,6 +306,9 @@ class Builder: self.checkout = checkout self._num_threads = num_threads self.num_jobs = num_jobs + self.active_boards = 0 + self.max_boards = 0 + self._active_lock = threading.Lock() self._already_done = 0 self.kconfig_reconfig = 0 self.force_build = False @@ -1200,7 +1203,7 @@ class Builder: terminal.print_clear() def init_build(self, commits, board_selected, keep_outputs, verbose, - fragments): + fragments, extra_count=0): """Initialise a build: prepare working space and create jobs This sets up the working directory, output space and job queue @@ -1214,6 +1217,9 @@ class Builder: keep_outputs (bool): True to save build output files verbose (bool): Display build results as they are completed fragments (str): config fragments added to defconfig + extra_count (int): Additional builds expected from external + sources (e.g. distributed workers) to include in the + progress total """ self.commit_count = len(commits) if commits else 1 self.commits = commits @@ -1228,6 +1234,7 @@ class Builder: tprint('\rStarting build...', newline=False) self._start_time = datetime.now() self._setup_build(board_selected, commits) + self.count += extra_count self.process_result(None) self.thread_exceptions = [] # Create jobs to build all commits for each board @@ -1245,12 +1252,16 @@ class Builder: else: self._single_builder.run_job(job) - def run_build(self): + def run_build(self, delay_summary=False): """Run the build to completion - Waits for all jobs to finish and prints a summary. + Waits for all jobs to finish and optionally prints a summary. Call init_build() first to set up the jobs. + Args: + delay_summary (bool): True to skip printing the build + summary at the end (caller will print it later) + Returns: tuple: Tuple containing: - number of boards that failed to build @@ -1266,13 +1277,13 @@ class Builder: # Wait until we have processed all output self.out_queue.join() - if not self._opts.ide: + if not self._opts.ide and not delay_summary: self.print_summary() return (self.fail, self._warned, self.thread_exceptions) def build_boards(self, commits, board_selected, keep_outputs, verbose, - fragments): + fragments, extra_count=0, delay_summary=False): """Build all commits for a list of boards Convenience method that calls init_build() then run_build(). @@ -1284,6 +1295,11 @@ class Builder: keep_outputs (bool): True to save build output files verbose (bool): Display build results as they are completed fragments (str): config fragments added to defconfig + extra_count (int): Additional builds expected from external + sources (e.g. distributed workers) to include in the + progress total + delay_summary (bool): True to skip printing the build + summary at the end (caller will print it later) Returns: tuple: Tuple containing: @@ -1292,8 +1308,8 @@ class Builder: - list of thread exceptions raised """ self.init_build(commits, board_selected, keep_outputs, verbose, - fragments) - return self.run_build() + fragments, extra_count) + return self.run_build(delay_summary) def print_summary(self): """Print the build summary line -- 2.43.0
From: Simon Glass <sjg@chromium.org> The kconfig_changed_since() check does an os.walk() of the entire source tree which is very slow when dozens of threads do it simultaneously for the same commit. Add a per-commit cache so only the first thread to check a given commit does the walk; all other threads reuse the result. Also skip the check entirely when do_config is already True (e.g. first commit) since defconfig will run anyway. Prune dotfiles and 'build' directories from the walk to reduce the search space. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builderthread.py | 72 +++++++++++++++++++++++++-------- tools/buildman/func_test.py | 6 ++- tools/buildman/test.py | 3 ++ 3 files changed, 63 insertions(+), 18 deletions(-) diff --git a/tools/buildman/builderthread.py b/tools/buildman/builderthread.py index dcf2d8f9ac5..13c98612c81 100644 --- a/tools/buildman/builderthread.py +++ b/tools/buildman/builderthread.py @@ -52,20 +52,19 @@ BuildSetup = namedtuple('BuildSetup', ['env', 'args', 'config_args', 'cwd', RETURN_CODE_RETRY = -1 BASE_ELF_FILENAMES = ['u-boot', 'spl/u-boot-spl', 'tpl/u-boot-tpl'] +# Per-commit cache for the kconfig_changed_since() result. The answer only +# depends on the commit (did any Kconfig file change since the previous +# checkout?), so one thread can do the walk and every other thread building +# the same commit reuses the boolean. +_kconfig_cache = {} +_kconfig_cache_lock = threading.Lock() -def kconfig_changed_since(fname, srcdir='.', target=None): - """Check if any Kconfig or defconfig files are newer than the given file. - Args: - fname (str): Path to file to compare against (typically '.config') - srcdir (str): Source directory to search for Kconfig/defconfig files - target (str): Board target name; if provided, only check that board's - defconfig file (e.g. 'sandbox' checks 'configs/sandbox_defconfig') +def _kconfig_changed_uncached(fname, srcdir, target): + """Check if any Kconfig or defconfig files are newer than fname. - Returns: - bool: True if any Kconfig* file (or the board's defconfig) in srcdir - is newer than fname, False otherwise. Also returns False if fname - doesn't exist. + This does the real work — an os.walk() of srcdir. It should only be + called once per commit; the result is cached by kconfig_changed_since(). """ if not os.path.exists(fname): return False @@ -78,8 +77,9 @@ def kconfig_changed_since(fname, srcdir='.', target=None): if os.path.getmtime(defconfig) > ref_time: return True - # Check all Kconfig files - for dirpath, _, filenames in os.walk(srcdir): + for dirpath, dirnames, filenames in os.walk(srcdir): + # Prune in-place so os.walk() skips dotdirs and build dirs + dirnames[:] = [d for d in dirnames if d[0] != '.' and d != 'build'] for filename in filenames: if filename.startswith('Kconfig'): filepath = os.path.join(dirpath, filename) @@ -87,6 +87,42 @@ def kconfig_changed_since(fname, srcdir='.', target=None): return True return False + +def reset_kconfig_cache(): + """Reset the cached kconfig results, for testing""" + _kconfig_cache.clear() + + +def kconfig_changed_since(fname, srcdir='.', target=None, commit_upto=None): + """Check if any Kconfig or defconfig files are newer than the given file. + + Args: + fname (str): Path to file to compare against (typically '.config') + srcdir (str): Source directory to search for Kconfig/defconfig files + target (str): Board target name; if provided, only check that board's + defconfig file (e.g. 'sandbox' checks 'configs/sandbox_defconfig') + commit_upto (int or None): Commit index for caching. When set, only + the first thread to check a given commit does the walk; all + other threads reuse that result. + + Returns: + bool: True if any Kconfig* file (or the board's defconfig) in srcdir + is newer than fname, False otherwise. Also returns False if fname + doesn't exist. + """ + if commit_upto is None: + return _kconfig_changed_uncached(fname, srcdir, target) + + if commit_upto in _kconfig_cache: + return _kconfig_cache[commit_upto] + + with _kconfig_cache_lock: + if commit_upto in _kconfig_cache: + return _kconfig_cache[commit_upto] + result = _kconfig_changed_uncached(fname, srcdir, target) + _kconfig_cache[commit_upto] = result + return result + # Common extensions for images COMMON_EXTS = ['.bin', '.rom', '.itb', '.img'] @@ -625,11 +661,15 @@ class BuilderThread(threading.Thread): if self.toolchain: commit = self._checkout(commit_upto, req.work_dir) - # Check if Kconfig files have changed since last config - if self.builder.kconfig_check: + # Check if Kconfig files have changed since last config. Skip + # when do_config is already True (e.g. first commit) since + # defconfig will run anyway. This avoids an expensive os.walk() + # of the source tree that can be very slow when many threads + # do it simultaneously. + if self.builder.kconfig_check and not do_config: config_file = os.path.join(out_dir, '.config') if kconfig_changed_since(config_file, req.work_dir, - req.brd.target): + req.brd.target, commit_upto): kconfig_reconfig = True do_config = True diff --git a/tools/buildman/func_test.py b/tools/buildman/func_test.py index 7db4c086207..aa206cf75df 100644 --- a/tools/buildman/func_test.py +++ b/tools/buildman/func_test.py @@ -1634,7 +1634,8 @@ something: me call_count = [0] config_exists = [False] - def mock_kconfig_changed(fname, _srcdir='.', _target=None): + def mock_kconfig_changed(fname, _srcdir='.', _target=None, + _commit_upto=None): """Mock for kconfig_changed_since that checks if .config exists Args: @@ -1671,7 +1672,8 @@ something: me """Test that -Z flag disables Kconfig change detection""" call_count = [0] - def mock_kconfig_changed(_fname, _srcdir='.', _target=None): + def mock_kconfig_changed(_fname, _srcdir='.', _target=None, + _commit_upto=None): """Mock for kconfig_changed_since that always returns True Returns: diff --git a/tools/buildman/test.py b/tools/buildman/test.py index 37930ad9720..998f2227281 100644 --- a/tools/buildman/test.py +++ b/tools/buildman/test.py @@ -1148,6 +1148,7 @@ class TestBuildMisc(TestBuildBase): def test_kconfig_changed_since(self): """Test the kconfig_changed_since() function""" + builderthread.reset_kconfig_cache() with tempfile.TemporaryDirectory() as tmpdir: # Create a reference file ref_file = os.path.join(tmpdir, 'done') @@ -1163,6 +1164,7 @@ class TestBuildMisc(TestBuildBase): # Create a Kconfig file newer than the reference kconfig = os.path.join(tmpdir, 'Kconfig') tools.write_file(kconfig, b'config TEST\n') + builderthread.reset_kconfig_cache() # Should now return True since Kconfig is newer self.assertTrue( @@ -1187,6 +1189,7 @@ class TestBuildMisc(TestBuildBase): time.sleep(0.1) tools.write_file(os.path.join(subdir, 'Kconfig.sub'), b'config SUBTEST\n') + builderthread.reset_kconfig_cache() # Should return True due to newer Kconfig.sub in subdir self.assertTrue( -- 2.43.0
From: Simon Glass <sjg@chromium.org> When many boards are in flight on a high-thread machine, each make gets only -j1 (nthreads / active_boards = 256/256), causing poor efficiency when the machine is mostly idle. Track the number of in-flight boards with a locked counter in BuilderThread.run(). When Builder.num_jobs is None (meaning dynamic mode), calculate -j as nthreads / active_boards. When the in-flight count drops below the baseline (max_boards or nthreads), ramp it up by a factor of two to use more of the available CPU power. Rename _num_threads and _active_lock to public since BuilderThread accesses them from a separate module. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 16 ++++++++-------- tools/buildman/builderthread.py | 19 +++++++++++++++++-- 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index de895d43cfe..c4ec22dbc77 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -199,7 +199,7 @@ class Builder: _complete_delay: Expected delay until completion (timedelta) _next_delay_update: Next time we plan to display a progress update (datatime) - _num_threads: Number of builder threads to run + num_threads: Number of builder threads to run _opts: DisplayOptions for result output _re_make_err: Compiled regex for make error detection _restarting_config: True if 'Restart config' is detected in output @@ -304,11 +304,11 @@ class Builder: self.do_make = make_func or self.make self.gnu_make = gnu_make self.checkout = checkout - self._num_threads = num_threads + self.num_threads = num_threads self.num_jobs = num_jobs self.active_boards = 0 self.max_boards = 0 - self._active_lock = threading.Lock() + self.active_lock = threading.Lock() self._already_done = 0 self.kconfig_reconfig = 0 self.force_build = False @@ -403,11 +403,11 @@ class Builder: test_thread_exceptions (bool): True to make threads raise an exception instead of reporting their result (for tests) """ - if self._num_threads: + if self.num_threads: self._single_builder = None self.queue = queue.Queue() self.out_queue = queue.Queue() - for i in range(self._num_threads): + for i in range(self.num_threads): t = self._thread_class( self, i, mrproper, per_board_out_dir, test_exception=test_thread_exceptions) @@ -1227,7 +1227,7 @@ class Builder: self._result_handler.reset_result_summary(board_selected) builderthread.mkdir(self.base_dir, parents = True) - self._prepare_working_space(min(self._num_threads, len(board_selected)), + self._prepare_working_space(min(self.num_threads, len(board_selected)), board_selected and commits is not None) self._prepare_output_space() if not self._opts.ide: @@ -1247,7 +1247,7 @@ class Builder: job.adjust_cfg = self.adjust_cfg job.fragments = fragments job.step = self._step - if self._num_threads: + if self.num_threads: self.queue.put(job) else: self._single_builder.run_job(job) @@ -1268,7 +1268,7 @@ class Builder: - number of boards that issued warnings - list of thread exceptions raised """ - if self._num_threads: + if self.num_threads: term = threading.Thread(target=self.queue.join) term.daemon = True term.start() diff --git a/tools/buildman/builderthread.py b/tools/buildman/builderthread.py index 13c98612c81..16534196d4d 100644 --- a/tools/buildman/builderthread.py +++ b/tools/buildman/builderthread.py @@ -332,8 +332,18 @@ class BuilderThread(threading.Thread): args.append('V=1') else: args.append('-s') - if self.builder.num_jobs is not None: - args.extend(['-j', str(self.builder.num_jobs)]) + num_jobs = self.builder.num_jobs + if num_jobs is None and self.builder.active_boards: + active = self.builder.active_boards + nthreads = self.builder.num_threads + baseline = self.builder.max_boards or nthreads + if active < baseline: + # Tail: ramp up 2x to compensate for make overhead + num_jobs = max(1, nthreads * 2 // active) + else: + num_jobs = max(1, nthreads // active) + if num_jobs is not None: + args.extend(['-j', str(num_jobs)]) if self.builder.warnings_as_errors: args.append('KCFLAGS=-Werror') args.append('HOSTCFLAGS=-Werror') @@ -1022,10 +1032,15 @@ class BuilderThread(threading.Thread): """ while True: job = self.builder.queue.get() + with self.builder.active_lock: + self.builder.active_boards += 1 try: self.run_job(job) except Exception as exc: # pylint: disable=W0718 print('Thread exception (use -T0 to run without threads):', exc) self.builder.thread_exceptions.append(exc) + finally: + with self.builder.active_lock: + self.builder.active_boards -= 1 self.builder.queue.task_done() -- 2.43.0
From: Simon Glass <sjg@chromium.org> LTO link steps use nproc unconditionally, flooding the machine with jobs during the link phase and reducing build efficiency. Pass NPROC=<num_jobs> alongside -j so that LTO respects the dynamic parallelism setting. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builderthread.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/buildman/builderthread.py b/tools/buildman/builderthread.py index 16534196d4d..6f4f257dedb 100644 --- a/tools/buildman/builderthread.py +++ b/tools/buildman/builderthread.py @@ -344,6 +344,7 @@ class BuilderThread(threading.Thread): num_jobs = max(1, nthreads // active) if num_jobs is not None: args.extend(['-j', str(num_jobs)]) + args.append(f'NPROC={num_jobs}') if self.builder.warnings_as_errors: args.append('KCFLAGS=-Werror') args.append('HOSTCFLAGS=-Werror') -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add support for probing remote build machines over SSH to prepare for distributed builds. Add a new [machines] section to the buildman config file where hostnames are listed, one per line. The --machines flag probes all configured machines in parallel over SSH, collecting their architecture, CPU count, thread count, load average, memory and disk space. Machines that are too busy, low on disk or low on memory are marked unavailable. Toolchains on each machine are checked either via 'buildman --list-tool-chains' or by testing for the boss's toolchain paths over SSH. The --machines-fetch-arch flag fetches missing toolchains, and version-mismatched toolchains are re-fetched to keep all machines in sync. Support per-machine [machine:<name>] config sections with a max_boards option to cap concurrent builds on machines with limited resources. Add gcc_version() and resolve_toolchain_aliases() helpers for toolchain version comparison and alias resolution. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/bsettings.py | 10 +- tools/buildman/cmdline.py | 7 + tools/buildman/control.py | 6 + tools/buildman/machine.py | 923 ++++++++++++++++++++++++++++ tools/buildman/main.py | 2 + tools/buildman/test_machine.py | 1046 ++++++++++++++++++++++++++++++++ 6 files changed, 1993 insertions(+), 1 deletion(-) create mode 100644 tools/buildman/machine.py create mode 100644 tools/buildman/test_machine.py diff --git a/tools/buildman/bsettings.py b/tools/buildman/bsettings.py index 1af2bc66101..52763b9b958 100644 --- a/tools/buildman/bsettings.py +++ b/tools/buildman/bsettings.py @@ -20,7 +20,7 @@ def setup(fname=''): global settings # pylint: disable=W0603 global config_fname # pylint: disable=W0603 - settings = configparser.ConfigParser() + settings = configparser.ConfigParser(allow_no_value=True) if fname is not None: config_fname = fname if config_fname == '': @@ -110,6 +110,14 @@ x86 = i386 # snapper-boards=ENABLE_AT91_TEST=1 # snapper9260=${snapper-boards} BUILD_TAG=442 # snapper9g45=${snapper-boards} BUILD_TAG=443 + +[machines] +# Remote build machines for distributed builds +# List hostnames, one per line (or user@hostname) +# e.g. +# ohau +# moa +# user@build-server ''', file=out) except IOError: print(f"Couldn't create buildman config file '{cfgname}'\n") diff --git a/tools/buildman/cmdline.py b/tools/buildman/cmdline.py index f4a1a3d018c..b284b2cbbfa 100644 --- a/tools/buildman/cmdline.py +++ b/tools/buildman/cmdline.py @@ -105,6 +105,13 @@ def add_upto_m(parser): parser.add_argument( '-M', '--allow-missing', action='store_true', default=False, help='Tell binman to allow missing blobs and generate fake ones as needed') + parser.add_argument('--mach', '--machines', action='store_true', + default=False, dest='machines', + help='Probe all remote machines from [machines] config and show ' + 'their status and available toolchains') + parser.add_argument('--machines-buildman-path', type=str, + default='buildman', + help='Path to buildman on remote machines (default: %(default)s)') parser.add_argument( '--maintainer-check', action='store_true', help='Check that maintainer entries exist for each board') diff --git a/tools/buildman/control.py b/tools/buildman/control.py index 989057db60e..97f6ffcbfd2 100644 --- a/tools/buildman/control.py +++ b/tools/buildman/control.py @@ -18,6 +18,7 @@ import time from buildman import boards from buildman import bsettings from buildman import cfgutil +from buildman import machine from buildman import toolchain from buildman.builder import Builder from buildman.outcome import DisplayOptions @@ -758,6 +759,11 @@ def do_buildman(args, toolchains=None, make_func=None, brds=None, gitutil.setup() col = terminal.Color() + # Handle --machines: probe remote machines and show status + if args.machines: + return machine.do_probe_machines( + col, buildman_path=args.machines_buildman_path) + git_dir = os.path.join(args.git, '.git') toolchains = get_toolchains(toolchains, col, args.override_toolchain, diff --git a/tools/buildman/machine.py b/tools/buildman/machine.py new file mode 100644 index 00000000000..cc4e5d9fe72 --- /dev/null +++ b/tools/buildman/machine.py @@ -0,0 +1,923 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> + +"""Handles remote machine probing and pool management for distributed builds + +This module provides the Machine and MachinePool classes for managing a pool of +remote build machines. Machines are probed over SSH to determine their +capabilities (CPUs, memory, load, toolchains) and can be used to distribute +board builds across multiple hosts. +""" + +import dataclasses +import json +import os +import threading + +from buildman import bsettings +from buildman import toolchain as toolchain_mod +from u_boot_pylib import command +from u_boot_pylib import terminal +from u_boot_pylib import tout + +# Probe script run on remote machines via SSH. This is kept minimal so that +# it works on any Linux machine with Python 3. +PROBE_SCRIPT = r''' +import json, os, platform, subprocess + +def get_cpus(): + try: + return int(subprocess.check_output(['nproc', '--all'], text=True)) + except (subprocess.CalledProcessError, FileNotFoundError, ValueError): + return 1 + +def get_threads(): + try: + return int(subprocess.check_output(['nproc'], text=True)) + except (subprocess.CalledProcessError, FileNotFoundError, ValueError): + return get_cpus() + +def get_load(): + try: + with open('/proc/loadavg') as f: + return float(f.read().split()[0]) + except (IOError, ValueError, IndexError): + return 0.0 + +def get_mem_avail_mb(): + try: + with open('/proc/meminfo') as f: + for line in f: + if line.startswith('MemAvailable:'): + return int(line.split()[1]) // 1024 + except (IOError, ValueError, IndexError): + pass + return 0 + +def get_disk_avail_mb(path='~'): + path = os.path.expanduser(path) + try: + st = os.statvfs(path) + return (st.f_bavail * st.f_frsize) // (1024 * 1024) + except OSError: + return 0 + +def get_bogomips(): + try: + with open('/proc/cpuinfo') as f: + for line in f: + lower = line.lower() + if 'bogomips' in lower and ':' in lower: + return float(line.split(':')[1].strip()) + except (IOError, ValueError, IndexError): + pass + return 0.0 + +print(json.dumps({ + 'arch': platform.machine(), + 'cpus': get_cpus(), + 'threads': get_threads(), + 'bogomips': get_bogomips(), + 'load_1m': get_load(), + 'mem_avail_mb': get_mem_avail_mb(), + 'disk_avail_mb': get_disk_avail_mb(), +})) +''' + +# Load threshold: if load_1m / cpus exceeds this, the machine is busy +LOAD_THRESHOLD = 0.8 + +# Minimum available disk space in MB to use a machine +MIN_DISK_MB = 1000 + +# Minimum available memory in MB to use a machine +MIN_MEM_MB = 512 + +# SSH connect timeout in seconds +SSH_TIMEOUT = 10 + +# Shorter timeout for probing, since it should be fast +PROBE_TIMEOUT = 3 + + +@dataclasses.dataclass +class MachineInfo: + """Probe results for a remote machine + + Attributes: + arch (str): Machine architecture (e.g. 'x86_64', 'aarch64') + cpus (int): Number of physical CPU cores + threads (int): Number of hardware threads + bogomips (float): BogoMIPS from /proc/cpuinfo (single core) + load (float): 1-minute load average + mem_avail_mb (int): Available memory in MB + disk_avail_mb (int): Available disk space in MB + """ + arch: str = '' + cpus: int = 0 + threads: int = 0 + bogomips: float = 0.0 + load: float = 0.0 + mem_avail_mb: int = 0 + disk_avail_mb: int = 0 + + +class MachineError(Exception): + """Error communicating with a remote machine""" + + +def _run_ssh(hostname, cmd, timeout=SSH_TIMEOUT, stdin_data=None): + """Run a command on a remote machine via SSH + + Args: + hostname (str): SSH hostname (user@host or just host) + cmd (list of str): Command and arguments, passed after '--' to + SSH. May be a single-element list with a shell command string + timeout (int): Connection timeout in seconds + stdin_data (str or None): Data to send to the command's stdin + + Returns: + str: stdout from the command + + Raises: + MachineError: if SSH connection fails or command returns non-zero + """ + ssh_cmd = [ + 'ssh', + '-o', 'BatchMode=yes', + '-o', f'ConnectTimeout={timeout}', + '-o', 'StrictHostKeyChecking=accept-new', + hostname, + '--', + ] + cmd + try: + result = command.run_pipe( + [ssh_cmd], capture=True, capture_stderr=True, + raise_on_error=False, stdin_data=stdin_data) + except command.CommandExc as exc: + raise MachineError(str(exc)) from exc + + if result.return_code: + stderr = result.stderr.strip() + if stderr: + # Take last non-empty line as the real error + lines = [l for l in stderr.splitlines() + if l.strip()] + msg = lines[-1] if lines else stderr + raise MachineError(f'SSH to {hostname}: {msg}') + raise MachineError( + f'SSH to {hostname} failed with code ' + f'{result.return_code}') + + return result.stdout + + +def gcc_version(gcc_path): + """Extract the gcc version directory from a toolchain path + + Looks for a 'gcc-*-nolibc' component in the path, which is the + standard naming convention for buildman-fetched toolchains. + + Args: + gcc_path (str): Full path to gcc binary, e.g. + '~/.buildman-toolchains/gcc-13.1.0-nolibc/aarch64-linux/ + bin/aarch64-linux-gcc' + + Returns: + str or None: The version directory (e.g. 'gcc-13.1.0-nolibc'), + or None if the path does not follow this convention + """ + for part in gcc_path.split('/'): + if part.startswith('gcc-') and 'nolibc' in part: + return part + return None + + +def _parse_toolchain_list(output): + """Parse the output of 'buildman --list-tool-chains' + + Extracts architecture -> gcc path mapping from the output. + + Args: + output (str): Output from buildman --list-tool-chains + + Returns: + dict: Architecture name -> gcc path string + """ + archs = {} + in_list = False + for line in output.splitlines(): + # The list starts after "List of available toolchains" + if 'List of available toolchains' in line: + in_list = True + continue + if in_list and ':' in line: + parts = line.split(':', 1) + if len(parts) == 2: + arch = parts[0].strip() + gcc = parts[1].strip() + if arch and gcc and arch != 'None': + archs[arch] = gcc + elif in_list and not line.strip(): + # Empty line ends the list + break + return archs + + +def _toolchain_status(mach, local_archs, local_gcc=None): + """Get toolchain status text and colour for a machine + + Args: + mach (Machine): Machine to check + local_archs (set of str): Toolchain archs available on local host + local_gcc (dict or None): arch -> gcc path on the local machine + + Returns: + tuple: (str, colour) where colour is a terminal.Color constant + or None for no colour + """ + if not mach.toolchains: + err = mach.tc_error + if err: + return 'fail', terminal.Color.RED + if not mach.avail and not mach.info.arch: + return '-', None + return 'none', terminal.Color.YELLOW + if not local_archs: + return str(len(mach.toolchains)), None + missing = local_archs - set(mach.toolchains.keys()) + + # Check for version mismatches + mismatched = 0 + if local_gcc: + for arch, path in mach.toolchains.items(): + local_ver = gcc_version(local_gcc.get(arch, '')) + if not local_ver: + continue + remote_ver = gcc_version(path) + if remote_ver and remote_ver != local_ver: + mismatched += 1 + + parts = [] + if missing: + parts.append(f'{len(missing)} missing') + if mismatched: + parts.append(f'{mismatched} wrong ver') + if parts: + return ', '.join(parts), terminal.Color.YELLOW + return 'OK', terminal.Color.GREEN + + +def build_version_map(local_gcc): + """Build a map of architecture -> gcc version directory + + Args: + local_gcc (dict): arch -> gcc path + + Returns: + dict: arch -> version string (e.g. 'gcc-13.1.0-nolibc') + """ + versions = {} + if local_gcc: + for arch, path in local_gcc.items(): + ver = gcc_version(path) + if ver: + versions[arch] = ver + return versions + + +def resolve_toolchain_aliases(gcc_dict): + """Add toolchain-alias entries to a gcc dict + + Resolves [toolchain-alias] config entries (e.g. x86->i386, sh->sh4) + so that board architectures using alias names are recognised. + + Args: + gcc_dict (dict): arch -> gcc path, modified in place + """ + for tag, value in bsettings.get_items('toolchain-alias'): + if tag not in gcc_dict: + for alias in value.split(): + if alias in gcc_dict: + gcc_dict[tag] = gcc_dict[alias] + break + + +def get_machines_config(): + """Get the list of machine hostnames from the config + + Returns: + list of str: List of hostnames from [machines] section + """ + items = bsettings.get_items('machines') + return [value.strip() if value else name.strip() + for name, value in items] + + +def do_probe_machines(col=None, fetch=False, buildman_path='buildman'): + """Probe all configured machines and display their status + + This is the entry point for 'buildman --machines' when used without a + build command. It probes all machines, checks their toolchains and + prints a summary. + + Args: + col (terminal.Color or None): Colour object for output + fetch (bool): True to fetch missing toolchains + buildman_path (str): Path to buildman on remote machines + + Returns: + int: 0 on success, non-zero on failure + """ + if not col: + col = terminal.Color() + + machines = get_machines_config() + if not machines: + print(col.build(col.RED, + 'No machines configured. Add a [machines] section ' + 'to ~/.buildman')) + return 1 + + # Get local toolchains for comparison. Only include cross- + # toolchains under ~/.buildman-toolchains/ since system compilers + # (sandbox, c89, c99) can't be probed or fetched remotely. + local_toolchains = toolchain_mod.Toolchains() + local_toolchains.get_settings(show_warning=False) + local_toolchains.scan(verbose=False) + home = os.path.expanduser('~') + local_gcc = {arch: tc.gcc + for arch, tc in local_toolchains.toolchains.items() + if tc.gcc.startswith(home)} + resolve_toolchain_aliases(local_gcc) + local_archs = set(local_gcc.keys()) + + pool = MachinePool() + pool.probe_all(col) + pool.check_toolchains(local_archs, buildman_path=buildman_path, + fetch=fetch, col=col, local_gcc=local_gcc) + pool.print_summary(col, local_archs=local_archs, + local_gcc=local_gcc) + return 0 + + +class MachinePool: + """Manages a pool of remote build machines + + Reads machine hostnames from the [machines] section of the buildman + config and provides methods to probe, check toolchains and display + the status of all machines. + + Attributes: + machines (list of Machine): All machines in the pool + """ + + def __init__(self, names=None): + """Create a MachinePool + + Args: + names (list of str or None): If provided, only include machines + whose config key matches one of these names. If None, include + all machines from the config. + """ + self.machines = [] + self._load_from_config(names) + + def _load_from_config(self, names=None): + """Load machine hostnames from the [machines] config section + + Supports bare hostnames (one per line) or name=hostname pairs. + The hostname may include a username (user@host): + [machines] + ohau + moa + myserver = build1.example.com + ruru = sglass@ruru + + Args: + names (list of str or None): If provided, only include machines + whose config key matches one of these names + """ + name_set = set(names) if names else set() + items = bsettings.get_items('machines') + for name, value in items: + # With allow_no_value=True, bare hostnames have value=None + # and the hostname is the key. For key=value pairs, use value. + key = name.strip() + if name_set and key not in name_set: + continue + hostname = value.strip() if value else key + mach = Machine(hostname, name=key) + for oname, ovalue in bsettings.get_items(f'machine:{key}'): + if oname == 'max_boards': + mach.max_boards = int(ovalue) + self.machines.append(mach) + + def probe_all(self, col=None): + """Probe all machines in the pool in parallel + + All machines are probed concurrently via threads. Progress is shown + on a single line and results are printed afterwards. + + Args: + col (terminal.Color or None): Colour object for output + + Returns: + list of Machine: Machines that are available + """ + if not col: + col = terminal.Color() + + names = [m.name for m in self.machines] + done = [] + lock = threading.Lock() + + def _probe(mach): + mach.probe() + with lock: + done.append(mach.name) + tout.progress(f'Probing {len(done)}/{len(names)}: ' + f'{", ".join(done)}') + + # Probe all machines in parallel + threads = [] + tout.progress(f'Probing {len(names)} machines') + for mach in self.machines: + t = threading.Thread(target=_probe, args=(mach,)) + t.start() + threads.append(t) + for t in threads: + t.join() + tout.clear_progress() + return self.get_available() + + def check_toolchains(self, needed_archs, buildman_path='buildman', + fetch=False, col=None, local_gcc=None): + """Check and optionally fetch toolchains on available machines + + Probes toolchains on all available machines in parallel. If + fetch is True, missing toolchains are fetched sequentially. + + Toolchains whose gcc version (e.g. gcc-13.1.0-nolibc) differs + from the local machine are treated as missing and will be + re-fetched if fetch is True. + + Args: + needed_archs (set of str): Set of architectures needed + (e.g. {'arm', 'aarch64', 'sandbox'}) + buildman_path (str): Path to buildman on remote machines + fetch (bool): True to attempt to fetch missing toolchains + col (terminal.Color or None): Colour object for output + local_gcc (dict or None): arch -> gcc path on the local + machine, used for version comparison + + Returns: + dict: Machine -> set of missing architectures + """ + if not col: + col = terminal.Color() + + reachable = self.get_reachable() + if not reachable: + return {} + + # Probe toolchains on all reachable machines, not just available + # ones, so that busy machines still show toolchain info + done = [] + lock = threading.Lock() + + def _check(mach): + mach.probe_toolchains(buildman_path, local_gcc=local_gcc) + with lock: + done.append(mach.name) + tout.progress(f'Checking toolchains {len(done)}/' + f'{len(reachable)}: {", ".join(done)}') + + threads = [] + tout.progress(f'Checking toolchains on {len(reachable)} machines') + for mach in reachable: + t = threading.Thread(target=_check, args=(mach,)) + t.start() + threads.append(t) + for t in threads: + t.join() + tout.clear_progress() + + local_versions = build_version_map(local_gcc) + + # Check for missing or version-mismatched toolchains + missing_map = {} + for mach in reachable: + missing = needed_archs - set(mach.toolchains.keys()) + # Also treat version mismatches as missing + for arch, path in mach.toolchains.items(): + local_ver = local_versions.get(arch) + if not local_ver: + continue + remote_ver = gcc_version(path) + if remote_ver and remote_ver != local_ver: + missing.add(arch) + if missing: + missing_map[mach] = missing + + if fetch and missing_map: + self._fetch_all_missing(missing_map, local_versions, + local_gcc, buildman_path) + + return missing_map + + def _fetch_all_missing(self, missing_map, local_versions, + local_gcc, buildman_path): + """Fetch missing toolchains on all machines in parallel + + For version-mismatched toolchains, removes the old version + directory on the remote before fetching, so the new version + takes its place. + + Updates missing_map in place, removing architectures that + were successfully fetched. + + Args: + missing_map (dict): Machine -> set of missing archs + local_versions (dict): arch -> version string (e.g. + 'gcc-13.1.0-nolibc') from the local machine + local_gcc (dict or None): arch -> gcc path on the boss, + passed to re-probe after fetching + buildman_path (str): Path to buildman on remote + """ + lock = threading.Lock() + done = [] + failed = [] + total = sum(len(v) for v in missing_map.values()) + + def _fetch_one(mach, missing): + fetched = set() + for arch in list(missing): + # Remove old mismatched version before fetching + old_ver = gcc_version(mach.toolchains.get(arch, '')) + if old_ver and old_ver != local_versions.get(arch): + try: + _run_ssh(mach.name, [ + 'rm', '-rf', + f'~/.buildman-toolchains/{old_ver}']) + except MachineError: + pass + ok = mach.fetch_toolchain(buildman_path, arch) + with lock: + done.append(arch) + if ok: + fetched.add(arch) + else: + failed.append(f'{mach.name}: {arch}') + tout.progress( + f'Fetching toolchains {len(done)}/{total}: ' + f'{mach.name} {arch}') + if fetched: + mach.probe_toolchains(buildman_path, + local_gcc=local_gcc) + missing -= fetched + if not missing: + with lock: + del missing_map[mach] + + tout.progress(f'Fetching {total} toolchains on ' + f'{len(missing_map)} machines') + threads = [] + for mach, missing in list(missing_map.items()): + t = threading.Thread(target=_fetch_one, args=(mach, missing)) + t.start() + threads.append(t) + for t in threads: + t.join() + tout.clear_progress() + + # Report failures + for msg in failed: + print(f' Failed to fetch {msg}') + + # Report remaining version mismatches grouped by machine + if missing_map: + print('Version mismatches (local vs remote):') + for mach, missing in sorted(missing_map.items(), + key=lambda x: x[0].name): + diffs = [] + for arch in sorted(missing): + local_ver = local_versions.get(arch, '?') + diffs.append(f'{arch}({local_ver})') + print(f' {mach.name}: {", ".join(diffs)}') + + def get_reachable(self): + """Get list of machines that were successfully probed + + This includes machines that are busy or low on resources, as long + as they were reachable via SSH. + + Returns: + list of Machine: Reachable machines (may not be available) + """ + return [m for m in self.machines + if m.avail or m.info.arch] + + def get_available(self): + """Get list of machines that are available for building + + Returns: + list of Machine: Available machines + """ + return [m for m in self.machines if m.avail] + + def get_total_weight(self): + """Get the total weight of all available machines + + Returns: + int: Sum of weights of all available machines + """ + return sum(m.weight for m in self.get_available()) + + def print_summary(self, col=None, local_archs=None, local_gcc=None): + """Print a summary of all machines in the pool + + Args: + col (terminal.Color or None): Colour object for output + local_archs (set of str or None): Toolchain architectures available + on the local host, used to compare remote toolchain status + local_gcc (dict or None): arch -> gcc path on local machine, + for version comparison + """ + if not col: + col = terminal.Color() + if not local_archs: + local_archs = set() + available = self.get_available() + total_weight = self.get_total_weight() + print(col.build(col.BLUE, + f'Machine pool: {len(available)} of {len(self.machines)} ' + f'machines available, total weight {total_weight}')) + print() + fmt = ' {:<10} {:>10} {:>7} {:>8} {:>6} {:>7} {:>7} {:>10} {}' + print(fmt.format('Name', 'Arch', 'Threads', 'BogoMIPS', + 'Load', 'Mem GB', 'Disk TB', 'Toolchains', 'Status')) + print(f' {"-" * 88}') + for mach in self.machines: + if mach.avail: + parts = [f'weight {mach.weight}'] + if mach.max_boards: + parts.append(f'max {mach.max_boards}') + status_text = ', '.join(parts) + status_colour = col.GREEN + elif mach.reason == 'not probed': + status_text = 'not probed' + status_colour = col.YELLOW + else: + status_text = mach.reason + status_colour = col.RED + inf = mach.info + mem_gb = f'{inf.mem_avail_mb / 1024:.1f}' + disk_tb = f'{inf.disk_avail_mb / 1024 / 1024:.1f}' + tc_text, tc_colour = _toolchain_status( + mach, local_archs, local_gcc) + + # Format the line with plain text for correct alignment, + # then apply colour to the toolchain and status fields + line = fmt.format(mach.name, inf.arch or '-', inf.threads, + f'{inf.bogomips:.0f}', f'{inf.load:.1f}', + mem_gb, disk_tb, tc_text, status_text) + if tc_colour: + line = line.replace(tc_text, + col.build(tc_colour, tc_text), 1) + line = line.replace(status_text, + col.build(status_colour, status_text), 1) + print(line) + + local_versions = build_version_map(local_gcc) + + # Print toolchain errors and missing details after the table + notes = [] + for mach in self.machines: + err = mach.tc_error + if err: + notes.append(f' {mach.name}: {err}') + elif local_archs and mach.toolchains is not None: + missing = local_archs - set(mach.toolchains.keys()) + if missing: + parts = [] + for arch in sorted(missing): + ver = local_versions.get(arch) + if ver: + parts.append(f'{arch}({ver})') + else: + parts.append(arch) + notes.append( + f' {mach.name}: need {", ".join(parts)}') + if notes: + print() + for note in notes: + print(note) + + +class Machine: + """Represents a remote (or local) build machine + + Attributes: + hostname (str): SSH hostname (user@host or just host) + name (str): Short display name from config key + info (MachineInfo): Probed machine information + avail (bool): True if reachable and not too busy + reason (str): Reason the machine is unavailable, or '' + toolchains (dict): Available toolchain architectures, arch -> gcc path + tc_error (str): Error from last toolchain probe, or '' + weight (int): Number of build threads to allocate + max_boards (int): Max concurrent boards (0 = use nthreads) + """ + def __init__(self, hostname, name=None): + self.hostname = hostname + self.name = name or hostname + self.info = MachineInfo() + self.avail = False + self.reason = 'not probed' + self.toolchains = {} + self.tc_error = '' + self.weight = 0 + self.max_boards = 0 + + def probe(self, timeout=PROBE_TIMEOUT): + """Probe this machine's capabilities over SSH + + Runs a small Python script on the remote machine to collect + architecture, CPU count, thread count, load average, available memory + and disk space. + + Args: + timeout (int): SSH connect timeout in seconds + + Returns: + bool: True if the machine was probed successfully + """ + try: + result = _run_ssh(self.hostname, ['python3'], + timeout=timeout, stdin_data=PROBE_SCRIPT) + except MachineError as exc: + self.avail = False + self.reason = str(exc) + return False + + try: + info = json.loads(result) + except json.JSONDecodeError: + self.avail = False + self.reason = f'invalid probe response: {result[:100]}' + return False + + self.info = MachineInfo( + arch=info.get('arch', ''), + cpus=info.get('cpus', 0), + threads=info.get('threads', 0), + bogomips=info.get('bogomips', 0.0), + load=info.get('load_1m', 0.0), + mem_avail_mb=info.get('mem_avail_mb', 0), + disk_avail_mb=info.get('disk_avail_mb', 0), + ) + + # Check whether the machine is too busy or low on resources + self.avail = True + self.reason = '' + inf = self.info + if inf.cpus and inf.load / inf.cpus > LOAD_THRESHOLD: + self.avail = False + self.reason = (f'busy (load {inf.load:.1f} ' + f'with {inf.cpus} cpus)') + elif inf.disk_avail_mb < MIN_DISK_MB: + self.avail = False + self.reason = (f'low disk ' + f'({inf.disk_avail_mb} MB available)') + elif inf.mem_avail_mb < MIN_MEM_MB: + self.avail = False + self.reason = (f'low memory ' + f'({inf.mem_avail_mb} MB available)') + + if self.avail: + self._calc_weight() + return True + + def probe_toolchains(self, buildman_path, local_gcc=None): + """Probe available toolchains on this machine + + If local_gcc is provided, checks which of the boss's + toolchains exist on this machine by testing for the gcc + binary under ~/.buildman-toolchains. This avoids depending + on the remote machine's .buildman config. + + Falls back to running 'buildman --list-tool-chains' on the + remote when local_gcc is not provided (e.g. --machines + without a build). + + Args: + buildman_path (str): Path to buildman on the remote machine + local_gcc (dict or None): arch -> gcc path on the boss + + Returns: + dict: Architecture -> gcc path mapping + """ + self.tc_error = '' + if local_gcc: + return self._probe_toolchains_from_boss(local_gcc) + try: + result = _run_ssh(self.hostname, + [buildman_path, '--list-tool-chains']) + except MachineError as exc: + self.toolchains = {} + self.tc_error = str(exc) + return self.toolchains + + self.toolchains = _parse_toolchain_list(result) + return self.toolchains + + def _probe_toolchains_from_boss(self, local_gcc): + """Check which of the boss's toolchains exist on this machine + + For each architecture, extracts the path relative to the home + directory (e.g. .buildman-toolchains/gcc-13.1.0-nolibc/...) + and tests whether that gcc binary exists on the remote. This + makes the worker mirror the boss's toolchain choices. + + Args: + local_gcc (dict): arch -> gcc path on the boss + + Returns: + dict: Architecture -> gcc path mapping (using remote paths) + """ + # Build a list of relative paths to check + home_prefix = os.path.expanduser('~') + checks = {} + for arch, gcc in local_gcc.items(): + if gcc.startswith(home_prefix): + rel = gcc[len(home_prefix):] + if rel.startswith('/'): + rel = rel[1:] + checks[arch] = rel + + if not checks: + self.toolchains = {} + return self.toolchains + + # Build a single SSH command that tests all paths + # Output: "arch:yes" or "arch:no" for each + test_cmds = [] + for arch, rel in checks.items(): + test_cmds.append( + f'test -f ~/{rel} && echo {arch}:yes || echo {arch}:no') + try: + result = _run_ssh(self.hostname, + ['; '.join(test_cmds)]) + except MachineError as exc: + self.toolchains = {} + self.tc_error = str(exc) + return self.toolchains + + self.toolchains = {} + for line in result.splitlines(): + line = line.strip() + if ':yes' in line: + arch = line.split(':')[0] + rel = checks.get(arch) + if rel: + self.toolchains[arch] = f'~/{rel}' + return self.toolchains + + def fetch_toolchain(self, buildman_path, arch): + """Fetch a toolchain for a given architecture on this machine + + Args: + buildman_path (str): Path to buildman on the remote + arch (str): Architecture to fetch (e.g. 'arm') + + Returns: + bool: True if the fetch succeeded + """ + try: + _run_ssh(self.hostname, + [buildman_path, '--fetch-arch', arch]) + return True + except MachineError: + return False + + def _calc_weight(self): + """Calculate the build weight (threads to allocate) for this machine + + Uses available threads minus a fraction of the current load to avoid + over-committing a partially loaded machine. + """ + if not self.avail: + self.weight = 0 + return + # Reserve some capacity based on current load + spare = max(1, self.info.threads - int(self.info.load)) + self.weight = spare + + def __repr__(self): + inf = self.info + status = 'avail' if self.avail else f'unavail: {self.reason}' + return (f'Machine({self.hostname}, arch={inf.arch}, ' + f'threads={inf.threads}, ' + f'bogomips={inf.bogomips:.0f}, load={inf.load:.1f}, ' + f'weight={self.weight}, {status})') diff --git a/tools/buildman/main.py b/tools/buildman/main.py index af289a46508..faff7d41ceb 100755 --- a/tools/buildman/main.py +++ b/tools/buildman/main.py @@ -41,6 +41,7 @@ def run_tests(skip_net_tests, debug, verbose, args): from buildman import test_bsettings from buildman import test_builder from buildman import test_cfgutil + from buildman import test_machine test_name = args.terms and args.terms[0] or None if skip_net_tests: @@ -63,6 +64,7 @@ def run_tests(skip_net_tests, debug, verbose, args): test_builder.TestCheckOutputForLoop, test_builder.TestMake, test_builder.TestPrintBuildSummary, + test_machine, 'buildman.toolchain']) return (0 if result.wasSuccessful() else 1) diff --git a/tools/buildman/test_machine.py b/tools/buildman/test_machine.py new file mode 100644 index 00000000000..1397a4a76c0 --- /dev/null +++ b/tools/buildman/test_machine.py @@ -0,0 +1,1046 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> + +"""Tests for the machine module""" + +# pylint: disable=W0212 + +import json +import os +import unittest +from unittest import mock + +from u_boot_pylib import command +from u_boot_pylib import terminal + +from buildman import bsettings +from buildman import machine + + +# Base machine-info dict used by probe tests. Individual tests override +# fields as needed (e.g. load_1m, mem_avail_mb) via {**MACHINE_INFO, ...}. +MACHINE_INFO = { + 'arch': 'x86_64', + 'cpus': 4, + 'threads': 8, + 'bogomips': 5000.0, + 'load_1m': 0.5, + 'mem_avail_mb': 16000, + 'disk_avail_mb': 50000, +} + + +class TestParseToolchainList(unittest.TestCase): + """Test _parse_toolchain_list()""" + + def test_parse_normal(self): + """Test parsing normal toolchain list output""" + output = '''List of available toolchains (3): +arm : /usr/bin/arm-linux-gnueabi-gcc +aarch64 : /usr/bin/aarch64-linux-gnu-gcc +sandbox : /usr/bin/gcc +''' + result = machine._parse_toolchain_list(output) + self.assertEqual(result, { + 'arm': '/usr/bin/arm-linux-gnueabi-gcc', + 'aarch64': '/usr/bin/aarch64-linux-gnu-gcc', + 'sandbox': '/usr/bin/gcc', + }) + + def test_parse_empty(self): + """Test parsing empty output""" + self.assertEqual(machine._parse_toolchain_list(''), {}) + + def test_parse_none_toolchains(self): + """Test parsing when no toolchains are available""" + output = '''List of available toolchains (0): +None +''' + result = machine._parse_toolchain_list(output) + self.assertEqual(result, {}) + + def test_parse_with_colour(self): + """Test parsing output that has extra text before the list""" + output = """Some preamble text +List of available toolchains (2): +arm : /opt/toolchains/arm-gcc +x86 : /usr/bin/x86_64-linux-gcc + +Some trailing text +""" + result = machine._parse_toolchain_list(output) + self.assertEqual(result, { + 'arm': '/opt/toolchains/arm-gcc', + 'x86': '/usr/bin/x86_64-linux-gcc', + }) + + +class TestMachine(unittest.TestCase): + """Test Machine class""" + + def test_init(self): + """Test initial state of a Machine""" + m = machine.Machine('myhost') + self.assertEqual(m.hostname, 'myhost') + self.assertEqual(m.info.arch, '') + self.assertFalse(m.avail) + self.assertEqual(m.reason, 'not probed') + self.assertEqual(m.weight, 0) + self.assertEqual(m.toolchains, {}) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_success(self, mock_ssh): + """Test successful probe""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'cpus': 8, 'threads': 16, 'load_1m': 1.5}) + m = machine.Machine('server1') + result = m.probe() + self.assertTrue(result) + self.assertTrue(m.avail) + self.assertEqual(m.info.cpus, 8) + self.assertEqual(m.info.threads, 16) + self.assertAlmostEqual(m.info.load, 1.5) + self.assertEqual(m.info.mem_avail_mb, 16000) + self.assertEqual(m.info.disk_avail_mb, 50000) + self.assertGreater(m.weight, 0) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_busy(self, mock_ssh): + """Test probe of a busy machine""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 5.0}) + m = machine.Machine('busy-host') + result = m.probe() + self.assertTrue(result) + self.assertFalse(m.avail) + self.assertIn('busy', m.reason) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_low_disk(self, mock_ssh): + """Test probe of a machine with low disk space""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'disk_avail_mb': 500}) + m = machine.Machine('low-disk') + result = m.probe() + self.assertTrue(result) + self.assertFalse(m.avail) + self.assertIn('disk', m.reason) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_low_mem(self, mock_ssh): + """Test probe of a machine with low memory""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'mem_avail_mb': 200}) + m = machine.Machine('low-mem') + result = m.probe() + self.assertTrue(result) + self.assertFalse(m.avail) + self.assertIn('memory', m.reason) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_ssh_failure(self, mock_ssh): + """Test probe when SSH fails""" + mock_ssh.side_effect = machine.MachineError('connection refused') + m = machine.Machine('bad-host') + result = m.probe() + self.assertFalse(result) + self.assertFalse(m.avail) + self.assertIn('connection refused', m.reason) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_bad_json(self, mock_ssh): + """Test probe when remote returns invalid JSON""" + mock_ssh.return_value = 'not json at all' + m = machine.Machine('bad-json') + result = m.probe() + self.assertFalse(result) + self.assertFalse(m.avail) + self.assertIn('invalid probe response', m.reason) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_toolchains(self, mock_ssh): + """Test probing toolchains""" + mock_ssh.return_value = '''List of available toolchains (2): +arm : /usr/bin/arm-linux-gnueabi-gcc +sandbox : /usr/bin/gcc + +''' + m = machine.Machine('server1') + archs = m.probe_toolchains('buildman') + self.assertEqual(archs, { + 'arm': '/usr/bin/arm-linux-gnueabi-gcc', + 'sandbox': '/usr/bin/gcc', + }) + + @mock.patch('buildman.machine._run_ssh') + def test_weight_calculation(self, mock_ssh): + """Test weight calculation based on load""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'cpus': 8, 'threads': 16, 'load_1m': 4.0}) + m = machine.Machine('server1') + m.probe() + # weight = threads - int(load) = 16 - 4 = 12 + self.assertEqual(m.weight, 12) + + @mock.patch('buildman.machine._run_ssh') + def test_weight_minimum(self, mock_ssh): + """Test weight is at least 1 when available""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'arch': 'aarch64', 'threads': 4, + 'bogomips': 48.0, 'load_1m': 3.1}) + m = machine.Machine('server1') + m.probe() + # weight = max(1, 4 - 3) = 1 + self.assertEqual(m.weight, 1) + + def test_repr(self): + """Test string representation""" + m = machine.Machine('server1') + self.assertIn('server1', repr(m)) + self.assertIn('unavail', repr(m)) + + +class TestMachinePool(unittest.TestCase): + """Test MachinePool class""" + + def setUp(self): + """Set up bsettings for each test""" + bsettings.setup(None) + + def test_empty_pool(self): + """Test pool with no machines configured""" + pool = machine.MachinePool() + self.assertEqual(pool.machines, []) + self.assertEqual(pool.get_available(), []) + self.assertEqual(pool.get_total_weight(), 0) + + def test_load_from_config(self): + """Test loading machines from config with bare hostnames""" + bsettings.add_file( + '[machines]\n' + 'ohau\n' + 'moa\n' + ) + pool = machine.MachinePool() + self.assertEqual(len(pool.machines), 2) + self.assertEqual(pool.machines[0].hostname, 'ohau') + self.assertEqual(pool.machines[1].hostname, 'moa') + + def test_load_from_config_key_value(self): + """Test loading machines from config with key=value pairs""" + bsettings.add_file( + '[machines]\n' + 'server1 = build1.example.com\n' + 'server2 = user@build2.example.com\n' + ) + pool = machine.MachinePool() + self.assertEqual(len(pool.machines), 2) + self.assertEqual(pool.machines[0].hostname, 'build1.example.com') + self.assertEqual(pool.machines[1].hostname, 'user@build2.example.com') + + @mock.patch('buildman.machine._run_ssh') + def test_probe_all(self, mock_ssh): + """Test probing all machines""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file( + '[machines]\n' + 'host1\n' + 'host2\n' + ) + pool = machine.MachinePool() + available = pool.probe_all() + self.assertEqual(len(available), 2) + self.assertEqual(pool.get_total_weight(), 14) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_mixed(self, mock_ssh): + """Test probing with some machines available and some not""" + def ssh_side_effect(hostname, _cmd, **_kwargs): + if hostname == 'host1': + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + raise machine.MachineError('connection refused') + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file( + '[machines]\n' + 'host1\n' + 'host2\n' + ) + pool = machine.MachinePool() + available = pool.probe_all() + self.assertEqual(len(available), 1) + self.assertEqual(available[0].hostname, 'host1') + + @mock.patch('buildman.machine._run_ssh') + def test_check_toolchains(self, mock_ssh): + """Test checking toolchains on machines""" + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + if '--list-tool-chains' in cmd: + return '''List of available toolchains (2): +arm : /usr/bin/arm-gcc +sandbox : /usr/bin/gcc + +''' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file( + '[machines]\n' + 'host1\n' + ) + pool = machine.MachinePool() + pool.probe_all() + missing = pool.check_toolchains({'arm', 'sandbox'}) + self.assertEqual(missing, {}) + + @mock.patch('buildman.machine._run_ssh') + def test_check_toolchains_missing(self, mock_ssh): + """Test checking toolchains with some missing""" + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + if '--list-tool-chains' in cmd: + return '''List of available toolchains (1): +sandbox : /usr/bin/gcc + +''' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file( + '[machines]\n' + 'host1\n' + ) + pool = machine.MachinePool() + pool.probe_all() + missing = pool.check_toolchains({'arm', 'sandbox'}) + self.assertEqual(len(missing), 1) + m = list(missing.keys())[0] + self.assertIn('arm', missing[m]) + + +class TestRunSsh(unittest.TestCase): + """Test _run_ssh()""" + + @mock.patch('buildman.machine.command.run_pipe') + def test_success(self, mock_pipe): + """Test successful SSH command""" + mock_pipe.return_value = mock.Mock( + return_code=0, stdout='hello\n', stderr='') + result = machine._run_ssh('host1', ['echo', 'hello']) + self.assertEqual(result, 'hello\n') + + # Verify SSH options + pipe_list = mock_pipe.call_args[0][0] + cmd = pipe_list[0] + self.assertIn('ssh', cmd) + self.assertIn('BatchMode=yes', cmd) + self.assertIn('host1', cmd) + self.assertIn('echo', cmd) + + @mock.patch('buildman.machine.command.run_pipe') + def test_failure(self, mock_pipe): + """Test SSH command failure""" + mock_pipe.return_value = mock.Mock( + return_code=255, stdout='', + stderr='Connection refused') + with self.assertRaises(machine.MachineError) as ctx: + machine._run_ssh('host1', ['echo', 'hello']) + self.assertIn('Connection refused', str(ctx.exception)) + + @mock.patch('buildman.machine.command.run_pipe') + def test_failure_multiline_stderr(self, mock_pipe): + """Test SSH failure with multi-line stderr picks last line""" + mock_pipe.return_value = mock.Mock( + return_code=255, stdout='', + stderr='Warning: Added host key\n' + 'Permission denied (publickey).') + with self.assertRaises(machine.MachineError) as ctx: + machine._run_ssh('host1', ['echo', 'hello']) + self.assertIn('Permission denied', str(ctx.exception)) + self.assertNotIn('Warning', str(ctx.exception)) + + @mock.patch('buildman.machine.command.run_pipe') + def test_command_exc(self, mock_pipe): + """Test SSH command exception""" + mock_pipe.side_effect = command.CommandExc( + 'ssh failed', command.CommandResult()) + with self.assertRaises(machine.MachineError) as ctx: + machine._run_ssh('host1', ['echo', 'hello']) + self.assertIn('ssh failed', str(ctx.exception)) + + +class TestGetMachinesConfig(unittest.TestCase): + """Test get_machines_config()""" + + def setUp(self): + bsettings.setup(None) + + def test_empty(self): + """Test with no machines configured""" + self.assertEqual(machine.get_machines_config(), []) + + def test_with_machines(self): + """Test with machines configured""" + bsettings.add_file( + '[machines]\n' + 'host1\n' + 'host2\n' + ) + result = machine.get_machines_config() + self.assertEqual(result, ['host1', 'host2']) + + +class TestGccVersion(unittest.TestCase): + """Test gcc_version()""" + + def test_normal_path(self): + """Test extracting version from a standard toolchain path""" + path = ('~/.buildman-toolchains/gcc-13.1.0-nolibc/' + 'aarch64-linux/bin/aarch64-linux-gcc') + self.assertEqual(machine.gcc_version(path), 'gcc-13.1.0-nolibc') + + def test_no_match(self): + """Test path that does not contain a gcc-*-nolibc component""" + self.assertIsNone(machine.gcc_version('/usr/bin/gcc')) + + def test_empty(self): + """Test empty path""" + self.assertIsNone(machine.gcc_version('')) + + +class TestBuildVersionMap(unittest.TestCase): + """Test build_version_map()""" + + def test_normal(self): + """Test building version map from gcc dict""" + gcc = { + 'arm': '~/.buildman-toolchains/gcc-13.1.0-nolibc/arm/bin/gcc', + 'sandbox': '/usr/bin/gcc', + } + result = machine.build_version_map(gcc) + self.assertEqual(result, {'arm': 'gcc-13.1.0-nolibc'}) + + def test_none(self): + """Test with None input""" + self.assertEqual(machine.build_version_map(None), {}) + + +class TestResolveToolchainAliases(unittest.TestCase): + """Test resolve_toolchain_aliases()""" + + def setUp(self): + bsettings.setup(None) + + def test_alias(self): + """Test resolving aliases from config""" + bsettings.add_file( + '[toolchain-alias]\n' + 'x86 = i386 i686\n' + ) + gcc = {'i386': '/usr/bin/i386-gcc'} + machine.resolve_toolchain_aliases(gcc) + self.assertEqual(gcc['x86'], '/usr/bin/i386-gcc') + + def test_no_alias_needed(self): + """Test when arch already exists""" + bsettings.add_file( + '[toolchain-alias]\n' + 'x86 = i386\n' + ) + gcc = {'x86': '/usr/bin/x86-gcc', 'i386': '/usr/bin/i386-gcc'} + machine.resolve_toolchain_aliases(gcc) + # Should not overwrite existing + self.assertEqual(gcc['x86'], '/usr/bin/x86-gcc') + + +class TestToolchainStatus(unittest.TestCase): + """Test _toolchain_status()""" + + def test_no_toolchains_no_error(self): + """Test machine with no toolchains and no error""" + m = machine.Machine('host1') + m.avail = True + m.info.arch = 'x86_64' + text, colour = machine._toolchain_status(m, set()) + self.assertEqual(text, 'none') + + def test_no_toolchains_with_error(self): + """Test machine with toolchain error""" + m = machine.Machine('host1') + m.tc_error = 'SSH failed' + text, colour = machine._toolchain_status(m, set()) + self.assertEqual(text, 'fail') + + def test_all_present(self): + """Test all local toolchains present on machine""" + m = machine.Machine('host1') + m.toolchains = {'arm': '/usr/bin/arm-gcc', + 'sandbox': '/usr/bin/gcc'} + local = {'arm', 'sandbox'} + text, colour = machine._toolchain_status(m, local) + self.assertEqual(text, 'OK') + + def test_some_missing(self): + """Test some toolchains missing""" + m = machine.Machine('host1') + m.toolchains = {'sandbox': '/usr/bin/gcc'} + local = {'arm', 'sandbox'} + text, colour = machine._toolchain_status(m, local) + self.assertIn('missing', text) + + def test_version_mismatch(self): + """Test version mismatch detection""" + m = machine.Machine('host1') + m.toolchains = { + 'arm': '~/.buildman-toolchains/gcc-12.0.0-nolibc/arm/bin/gcc'} + local_gcc = { + 'arm': '~/.buildman-toolchains/gcc-13.1.0-nolibc/arm/bin/gcc'} + text, colour = machine._toolchain_status( + m, {'arm'}, local_gcc=local_gcc) + self.assertIn('wrong ver', text) + + def test_no_local_archs(self): + """Test with empty local arch set""" + m = machine.Machine('host1') + m.toolchains = {'arm': '/usr/bin/gcc', 'x86': '/usr/bin/gcc'} + text, colour = machine._toolchain_status(m, set()) + self.assertEqual(text, '2') + + def test_unreachable_no_toolchains(self): + """Test unreachable machine with no arch info""" + m = machine.Machine('host1') + text, colour = machine._toolchain_status(m, {'arm'}) + self.assertEqual(text, '-') + + +class TestMachineExtended(unittest.TestCase): + """Extended Machine tests for coverage""" + + @mock.patch('buildman.machine._run_ssh') + def test_probe_toolchains_ssh_failure(self, mock_ssh): + """Test toolchain probe when SSH fails""" + mock_ssh.side_effect = machine.MachineError('timeout') + m = machine.Machine('host1') + result = m.probe_toolchains('buildman') + self.assertEqual(result, {}) + self.assertIn('timeout', m.tc_error) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_toolchains_from_boss(self, mock_ssh): + """Test probing toolchains by checking boss's paths on remote""" + home = os.path.expanduser('~') + local_gcc = { + 'arm': f'{home}/.buildman-toolchains/gcc-13/arm/bin/gcc', + 'x86': f'{home}/.buildman-toolchains/gcc-13/x86/bin/gcc', + } + mock_ssh.return_value = 'arm:yes\nx86:no\n' + m = machine.Machine('host1') + result = m.probe_toolchains('buildman', local_gcc=local_gcc) + self.assertIn('arm', result) + self.assertNotIn('x86', result) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_toolchains_from_boss_ssh_fail(self, mock_ssh): + """Test probing boss toolchains when SSH fails""" + home = os.path.expanduser('~') + local_gcc = { + 'arm': f'{home}/.buildman-toolchains/gcc-13/arm/bin/gcc', + } + mock_ssh.side_effect = machine.MachineError('conn refused') + m = machine.Machine('host1') + result = m.probe_toolchains('buildman', local_gcc=local_gcc) + self.assertEqual(result, {}) + self.assertIn('conn refused', m.tc_error) + + @mock.patch('buildman.machine._run_ssh') + def test_probe_toolchains_from_boss_no_home(self, mock_ssh): + """Test probing boss toolchains with non-home paths""" + local_gcc = {'sandbox': '/usr/bin/gcc'} + m = machine.Machine('host1') + result = m.probe_toolchains('buildman', local_gcc=local_gcc) + self.assertEqual(result, {}) + mock_ssh.assert_not_called() + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_toolchain_success(self, mock_ssh): + """Test successful toolchain fetch""" + mock_ssh.return_value = 'Fetched arm toolchain' + m = machine.Machine('host1') + self.assertTrue(m.fetch_toolchain('buildman', 'arm')) + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_toolchain_failure(self, mock_ssh): + """Test failed toolchain fetch""" + mock_ssh.side_effect = machine.MachineError('fetch failed') + m = machine.Machine('host1') + self.assertFalse(m.fetch_toolchain('buildman', 'arm')) + + @mock.patch('buildman.machine._run_ssh') + def test_weight_unavailable(self, mock_ssh): + """Test weight is 0 when unavailable""" + m = machine.Machine('host1') + m.avail = False + m._calc_weight() + self.assertEqual(m.weight, 0) + + +class TestRunSshExtended(unittest.TestCase): + """Extended _run_ssh tests""" + + @mock.patch('buildman.machine.command.run_pipe') + def test_failure_no_stderr(self, mock_pipe): + """Test SSH failure with no stderr""" + mock_pipe.return_value = mock.Mock( + return_code=1, stdout='', stderr='') + with self.assertRaises(machine.MachineError) as ctx: + machine._run_ssh('host1', ['cmd']) + self.assertIn('failed with code 1', str(ctx.exception)) + + @mock.patch('buildman.machine.command.run_pipe') + def test_stdin_data(self, mock_pipe): + """Test SSH with stdin_data""" + mock_pipe.return_value = mock.Mock( + return_code=0, stdout='result\n', stderr='') + result = machine._run_ssh('host1', ['python3'], + stdin_data='print("result")') + self.assertEqual(result, 'result\n') + # Verify stdin_data was passed through + _, kwargs = mock_pipe.call_args + self.assertEqual(kwargs['stdin_data'], 'print("result")') + + +class TestMachinePoolExtended(unittest.TestCase): + """Extended MachinePool tests for coverage""" + + def setUp(self): + bsettings.setup(None) + + def test_load_with_max_boards(self): + """Test loading machines with max_boards config""" + bsettings.add_file( + '[machines]\n' + 'server1\n' + '[machine:server1]\n' + 'max_boards = 50\n' + ) + pool = machine.MachinePool() + self.assertEqual(len(pool.machines), 1) + self.assertEqual(pool.machines[0].max_boards, 50) + + def test_load_filtered_names(self): + """Test loading only specified machine names""" + bsettings.add_file( + '[machines]\n' + 'host1\n' + 'host2\n' + 'host3\n' + ) + pool = machine.MachinePool(names=['host1', 'host3']) + self.assertEqual(len(pool.machines), 2) + names = [m.hostname for m in pool.machines] + self.assertEqual(names, ['host1', 'host3']) + + def test_get_reachable(self): + """Test get_reachable includes busy machines""" + bsettings.add_file('[machines]\nhost1\nhost2\n') + pool = machine.MachinePool() + # Simulate host1 reachable but busy, host2 unreachable + pool.machines[0].avail = False + pool.machines[0].info.arch = 'x86_64' + self.assertEqual(len(pool.get_reachable()), 1) + self.assertEqual(pool.get_reachable()[0].hostname, 'host1') + + @mock.patch('buildman.machine._run_ssh') + def test_print_summary(self, mock_ssh): + """Test print_summary runs without error""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + # Just verify it doesn't crash + with terminal.capture(): + pool.print_summary() + + @mock.patch('buildman.machine._run_ssh') + def test_print_summary_with_toolchains(self, mock_ssh): + """Test print_summary with toolchain info""" + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + if '--list-tool-chains' in cmd: + return 'List of available toolchains (1):\narm : /bin/gcc\n\n' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + pool.check_toolchains({'arm', 'sandbox'}) + with terminal.capture(): + pool.print_summary(local_archs={'arm', 'sandbox'}) + + @mock.patch('buildman.machine._run_ssh') + def test_check_toolchains_version_mismatch(self, mock_ssh): + """Test version mismatch detection in check_toolchains""" + home = os.path.expanduser('~') + + # The remote has gcc-12, but the boss has gcc-13 + remote_gcc = (f'.buildman-toolchains/gcc-12.0.0-nolibc/' + 'arm/bin/arm-linux-gcc') + + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + # Boss probe: report arm as present (wrong version) + return 'arm:yes\n' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + + local_gcc = { + 'arm': f'{home}/.buildman-toolchains/gcc-13.1.0-nolibc/' + 'arm/bin/arm-linux-gcc', + } + # check_toolchains will re-probe, which sets the remote path + # via _probe_toolchains_from_boss. The SSH mock returns + # arm:yes, so the remote path uses the boss's relative path + # but we need it to have the old version. Patch the machine's + # toolchains after the probe runs. + orig_probe = machine.Machine._probe_toolchains_from_boss + + def fake_probe(self_mach, lg): + orig_probe(self_mach, lg) + # Override with the wrong version + if 'arm' in self_mach.toolchains: + self_mach.toolchains['arm'] = f'~/{remote_gcc}' + return self_mach.toolchains + + with mock.patch.object(machine.Machine, + '_probe_toolchains_from_boss', + fake_probe): + missing = pool.check_toolchains({'arm'}, local_gcc=local_gcc) + + # arm should be flagged as missing due to version mismatch + self.assertEqual(len(missing), 1) + + +class TestFetchMissing(unittest.TestCase): + """Test _fetch_all_missing()""" + + def setUp(self): + bsettings.setup(None) + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_success(self, mock_ssh): + """Test successful toolchain fetch""" + mock_ssh.return_value = '' + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + m = pool.machines[0] + m.avail = True + m.info.arch = 'x86_64' + + missing_map = {m: {'arm'}} + with terminal.capture(): + pool._fetch_all_missing(missing_map, {}, None, 'buildman') + # After successful fetch + re-probe, arch is removed + # (re-probe returns empty since SSH mock returns '') + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_failure(self, mock_ssh): + """Test failed toolchain fetch""" + mock_ssh.side_effect = machine.MachineError('failed') + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + m = pool.machines[0] + m.avail = True + m.info.arch = 'x86_64' + + missing_map = {m: {'arm'}} + with terminal.capture(): + pool._fetch_all_missing(missing_map, {}, None, 'buildman') + # Should still have the missing arch + self.assertIn('arm', missing_map[m]) + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_with_version_removal(self, mock_ssh): + """Test fetching removes old version first""" + calls = [] + + def ssh_side_effect(hostname, cmd, **_kwargs): + calls.append(cmd) + if '--fetch-arch' in cmd: + return '' + if 'rm' in cmd: + return '' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + m = pool.machines[0] + m.avail = True + m.info.arch = 'x86_64' + m.toolchains = { + 'arm': '~/.buildman-toolchains/gcc-12.0.0-nolibc/arm/bin/gcc'} + + missing_map = {m: {'arm'}} + local_versions = {'arm': 'gcc-13.1.0-nolibc'} + with terminal.capture(): + pool._fetch_all_missing(missing_map, local_versions, None, + 'buildman') + # Should have called rm -rf for the old version + rm_calls = [c for c in calls if 'rm' in c] + self.assertTrue(len(rm_calls) > 0) + + +class TestPrintSummaryEdgeCases(unittest.TestCase): + """Test print_summary edge cases""" + + def setUp(self): + bsettings.setup(None) + + @mock.patch('buildman.machine._run_ssh') + def test_unavailable_machine(self, mock_ssh): + """Test summary with unavailable machine""" + mock_ssh.side_effect = machine.MachineError('refused') + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + # Should not crash with unavailable machine + with terminal.capture(): + pool.print_summary() + + @mock.patch('buildman.machine._run_ssh') + def test_busy_machine(self, mock_ssh): + """Test summary with busy machine""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 10.0}) + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + with terminal.capture(): + pool.print_summary() + + @mock.patch('buildman.machine._run_ssh') + def test_with_max_boards(self, mock_ssh): + """Test summary shows max_boards""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file( + '[machines]\nhost1\n' + '[machine:host1]\nmax_boards = 50\n') + pool = machine.MachinePool() + pool.probe_all() + with terminal.capture(): + pool.print_summary() + + @mock.patch('buildman.machine._run_ssh') + def test_with_tc_error(self, mock_ssh): + """Test summary with toolchain error note""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + pool.machines[0].tc_error = 'buildman not found' + with terminal.capture(): + pool.print_summary(local_archs={'arm'}) + + @mock.patch('buildman.machine._run_ssh') + def test_with_missing_toolchains(self, mock_ssh): + """Test summary with missing toolchain notes""" + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + pool.machines[0].toolchains = {'sandbox': '/usr/bin/gcc'} + local_gcc = { + 'arm': os.path.expanduser( + '~/.buildman-toolchains/gcc-13/arm/bin/gcc'), + } + with terminal.capture(): + pool.print_summary(local_archs={'arm', 'sandbox'}, + local_gcc=local_gcc) + + +class TestCheckToolchainsEdge(unittest.TestCase): + """Test check_toolchains edge cases""" + + def setUp(self): + bsettings.setup(None) + + def test_no_reachable(self): + """Test check_toolchains with no reachable machines (line 482)""" + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + # Machine is not probed, so not reachable + result = pool.check_toolchains({'arm'}) + self.assertEqual(result, {}) + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_flag(self, mock_ssh): + """Test check_toolchains with fetch=True (line 524)""" + home = os.path.expanduser('~') + + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + # No toolchains found + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + + local_gcc = { + 'arm': f'{home}/.buildman-toolchains/gcc-13/arm/bin/gcc', + } + # fetch=True should trigger _fetch_all_missing + with mock.patch.object(pool, '_fetch_all_missing') as mock_fetch: + pool.check_toolchains({'arm'}, fetch=True, + local_gcc=local_gcc) + mock_fetch.assert_called_once() + + +class TestFetchVersionRemovalFailure(unittest.TestCase): + """Test _fetch_all_missing rm -rf failure path (lines 563-564)""" + + def setUp(self): + bsettings.setup(None) + + @mock.patch('buildman.machine._run_ssh') + def test_rm_failure_continues(self, mock_ssh): + """Test that rm -rf failure is silently ignored""" + call_count = [0] + + def ssh_side_effect(hostname, cmd, **_kwargs): + call_count[0] += 1 + if 'rm' in cmd: + raise machine.MachineError('rm failed') + if '--fetch-arch' in cmd: + return '' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + m = pool.machines[0] + m.avail = True + m.info.arch = 'x86_64' + # Old version that differs from local + m.toolchains = { + 'arm': '~/.buildman-toolchains/gcc-12.0.0-nolibc/arm/bin/gcc'} + + missing_map = {m: {'arm'}} + local_versions = {'arm': 'gcc-13.1.0-nolibc'} + with terminal.capture(): + pool._fetch_all_missing(missing_map, local_versions, None, + 'buildman') + # Should have attempted rm and then fetch despite rm failure + self.assertGreater(call_count[0], 1) + + +class TestPrintSummaryNotProbed(unittest.TestCase): + """Test print_summary 'not probed' branch (lines 669-670)""" + + def setUp(self): + bsettings.setup(None) + + def test_not_probed_machine(self): + """Test summary shows 'not probed' for unprobed machine""" + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + # Don't probe - machine stays in 'not probed' state + with terminal.capture(): + pool.print_summary() + + +class TestPrintSummaryMissingNoVersion(unittest.TestCase): + """Test print_summary missing toolchain without version (line 707)""" + + def setUp(self): + bsettings.setup(None) + + @mock.patch('buildman.machine._run_ssh') + def test_missing_with_and_without_version(self, mock_ssh): + """Test missing note for archs with and without version info""" + home = os.path.expanduser('~') + mock_ssh.return_value = json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + bsettings.add_file('[machines]\nhost1\n') + pool = machine.MachinePool() + pool.probe_all() + pool.machines[0].toolchains = {} + # arm has a version (under ~/.buildman-toolchains), + # sandbox does not + local_gcc = { + 'arm': f'{home}/.buildman-toolchains/gcc-13.1.0-nolibc/' + 'arm/bin/gcc', + 'sandbox': '/usr/bin/gcc', + } + with terminal.capture(): + pool.print_summary(local_archs={'arm', 'sandbox'}, + local_gcc=local_gcc) + + +class TestDoProbe(unittest.TestCase): + """Test do_probe_machines()""" + + def setUp(self): + bsettings.setup(None) + + def test_no_machines(self): + """Test with no machines configured""" + with terminal.capture(): + ret = machine.do_probe_machines() + self.assertEqual(ret, 1) + + @mock.patch('buildman.machine._run_ssh') + def test_with_machines(self, mock_ssh): + """Test probing configured machines""" + def ssh_side_effect(_hostname, cmd, **_kwargs): + if 'python3' in cmd: + return json.dumps({ + **MACHINE_INFO, 'load_1m': 1.0, + 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) + if '--list-tool-chains' in cmd: + return 'List of available toolchains (0):\n\n' + return '' + + mock_ssh.side_effect = ssh_side_effect + bsettings.add_file('[machines]\nhost1\n') + with terminal.capture(): + ret = machine.do_probe_machines() + self.assertEqual(ret, 0) + + +if __name__ == '__main__': + unittest.main() -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add the ability to fetch missing toolchains on remote machines when probing with the --machines option. When --machines-fetch-arch is passed alongside --machines, any toolchain architectures available locally but missing on a remote machine are fetched via 'buildman --fetch-arch' over SSH. Pass the local toolchain set as needed_archs to check_toolchains() so that missing toolchains can be identified by comparison with the local host. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/cmdline.py | 4 ++++ tools/buildman/control.py | 5 +++-- tools/buildman/test_machine.py | 14 ++++++++++++++ 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/tools/buildman/cmdline.py b/tools/buildman/cmdline.py index b284b2cbbfa..a85da069d24 100644 --- a/tools/buildman/cmdline.py +++ b/tools/buildman/cmdline.py @@ -109,6 +109,10 @@ def add_upto_m(parser): default=False, dest='machines', help='Probe all remote machines from [machines] config and show ' 'their status and available toolchains') + parser.add_argument('--machines-fetch-arch', action='store_true', + default=False, + help='Fetch missing toolchains on remote machines (implies ' + '--machines)') parser.add_argument('--machines-buildman-path', type=str, default='buildman', help='Path to buildman on remote machines (default: %(default)s)') diff --git a/tools/buildman/control.py b/tools/buildman/control.py index 97f6ffcbfd2..7515932a2e4 100644 --- a/tools/buildman/control.py +++ b/tools/buildman/control.py @@ -760,9 +760,10 @@ def do_buildman(args, toolchains=None, make_func=None, brds=None, col = terminal.Color() # Handle --machines: probe remote machines and show status - if args.machines: + if args.machines or args.machines_fetch_arch: return machine.do_probe_machines( - col, buildman_path=args.machines_buildman_path) + col, fetch=args.machines_fetch_arch, + buildman_path=args.machines_buildman_path) git_dir = os.path.join(args.git, '.git') diff --git a/tools/buildman/test_machine.py b/tools/buildman/test_machine.py index 1397a4a76c0..b635d1afb6f 100644 --- a/tools/buildman/test_machine.py +++ b/tools/buildman/test_machine.py @@ -172,6 +172,20 @@ sandbox : /usr/bin/gcc 'sandbox': '/usr/bin/gcc', }) + @mock.patch('buildman.machine._run_ssh') + def test_fetch_toolchain_success(self, mock_ssh): + """Test successful toolchain fetch""" + mock_ssh.return_value = 'Downloading...\nDone' + m = machine.Machine('server1') + self.assertTrue(m.fetch_toolchain('buildman', 'arm')) + + @mock.patch('buildman.machine._run_ssh') + def test_fetch_toolchain_failure(self, mock_ssh): + """Test failed toolchain fetch""" + mock_ssh.side_effect = machine.MachineError('fetch failed') + m = machine.Machine('server1') + self.assertFalse(m.fetch_toolchain('buildman', 'arm')) + @mock.patch('buildman.machine._run_ssh') def test_weight_calculation(self, mock_ssh): """Test weight calculation based on load""" -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add --worker flag which runs buildman in worker mode, accepting JSON commands on stdin and streaming BM>-prefixed JSON responses on stdout. This is the remote side of the distributed build system: the boss starts a worker via 'ssh host buildman --worker', pushes source with 'git push', then sends build commands. The worker uses Builder and BuilderThread with a custom subclass that sends results over SSH instead of writing them to disk. Worktrees are created sequentially before the Builder starts, with progress messages sent to the boss. The worker supports both batch mode (build_boards) and demand-driven mode (build_prepare / build_board / build_done). Build settings (no_lto, allow_missing, verbose, etc.) are received from the boss via a 'configure' command and applied to every build. On exit (quit command, stdin close, or signal), the worker kills its entire process group to clean up child make/cc1/ld processes. Tests are included in test_worker.py covering command parsing, build coordination, error handling and cleanup. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 53 +- tools/buildman/cmdline.py | 3 + tools/buildman/control.py | 5 + tools/buildman/main.py | 2 + tools/buildman/test_builder.py | 28 +- tools/buildman/test_worker.py | 896 ++++++++++++++++++++++++++++++ tools/buildman/worker.py | 985 +++++++++++++++++++++++++++++++++ 7 files changed, 1930 insertions(+), 42 deletions(-) create mode 100644 tools/buildman/test_worker.py create mode 100644 tools/buildman/worker.py diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index c4ec22dbc77..3264978a616 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -199,15 +199,12 @@ class Builder: _complete_delay: Expected delay until completion (timedelta) _next_delay_update: Next time we plan to display a progress update (datatime) - num_threads: Number of builder threads to run _opts: DisplayOptions for result output _re_make_err: Compiled regex for make error detection _restarting_config: True if 'Restart config' is detected in output _result_handler: ResultHandler for displaying results _single_builder: BuilderThread object for the singer builder, if threading is not being used - _start_time: Start time for the build - _step: Step value for processing commits (1=all, 2=every other, etc.) _terminated: Thread was terminated due to an error _threads: List of active threads _timestamps: List of timestamps for the completion of the last @@ -318,8 +315,8 @@ class Builder: self._build_period_us = None self._complete_delay = None self._next_delay_update = datetime.now() - self._start_time = None - self._step = step + self.start_time = None + self.step = step self.no_subdirs = no_subdirs self.full_path = full_path self.verbose_build = verbose_build @@ -369,14 +366,14 @@ class Builder: # Attributes set by other methods self._build_period = None self._commit = None - self._upto = 0 + self.upto = 0 self._warned = 0 self.fail = 0 self.commit_count = 0 self.commits = None self.count = 0 - self._timestamps = collections.deque() - self._verbose = False + self.timestamps = collections.deque() + self.verbose = False # Note: baseline state for result summaries is now in ResultHandler @@ -478,9 +475,9 @@ class Builder: build (one board, one commit). """ now = datetime.now() - self._timestamps.append(now) - count = len(self._timestamps) - delta = self._timestamps[-1] - self._timestamps[0] + self.timestamps.append(now) + count = len(self.timestamps) + delta = self.timestamps[-1] - self.timestamps[0] seconds = delta.total_seconds() # If we have enough data, estimate build period (time taken for a @@ -489,7 +486,7 @@ class Builder: self._next_delay_update = now + timedelta(seconds=2) if seconds > 0: self._build_period = float(seconds) / count - todo = self.count - self._upto + todo = self.count - self.upto self._complete_delay = timedelta(microseconds= self._build_period * todo * 1000000) # Round it @@ -497,7 +494,7 @@ class Builder: microseconds=self._complete_delay.microseconds) if seconds > 60: - self._timestamps.popleft() + self.timestamps.popleft() count -= 1 def _select_commit(self, commit, checkout=True): @@ -580,7 +577,7 @@ class Builder: if result: target = result.brd.target - self._upto += 1 + self.upto += 1 if result.return_code != 0: self.fail += 1 elif result.stderr: @@ -592,7 +589,7 @@ class Builder: if self._opts.ide: if result.stderr: sys.stderr.write(result.stderr) - elif self._verbose: + elif self.verbose: terminal.print_clear() boards_selected = {target : result.brd} self._result_handler.reset_result_summary(boards_selected) @@ -602,13 +599,13 @@ class Builder: target = '(starting)' # Display separate counts for ok, warned and fail - ok = self._upto - self._warned - self.fail + ok = self.upto - self._warned - self.fail line = '\r' + self.col.build(self.col.GREEN, f'{ok:5d}') line += self.col.build(self.col.YELLOW, f'{self._warned:5d}') line += self.col.build(self.col.RED, f'{self.fail:5d}') line += f' /{self.count:<5d} ' - remaining = self.count - self._upto + remaining = self.count - self.upto if remaining: line += self.col.build(self.col.MAGENTA, f' -{remaining:<5d} ') else: @@ -1033,10 +1030,10 @@ class Builder: board_selected (dict): Selected boards to build """ # First work out how many commits we will build - count = (self.commit_count + self._step - 1) // self._step + count = (self.commit_count + self.step - 1) // self.step self.count = len(board_selected) * count - self._upto = self._warned = self.fail = 0 - self._timestamps = collections.deque() + self.upto = self._warned = self.fail = 0 + self.timestamps = collections.deque() def get_thread_dir(self, thread_num): """Get the directory path to the working dir for a thread. @@ -1114,7 +1111,7 @@ class Builder: else: raise ValueError(f"Can't setup git repo with {setup_git}.") - def _prepare_working_space(self, max_threads, setup_git): + def prepare_working_space(self, max_threads, setup_git): """Prepare the working directory for use. Set up the git repo for each thread. Creates a linked working tree @@ -1187,7 +1184,7 @@ class Builder: to_remove.append(dirname) return to_remove - def _prepare_output_space(self): + def prepare_output_space(self): """Get the output directories ready to receive files. We delete any output directories which look like ones we need to @@ -1223,16 +1220,16 @@ class Builder: """ self.commit_count = len(commits) if commits else 1 self.commits = commits - self._verbose = verbose + self.verbose = verbose self._result_handler.reset_result_summary(board_selected) builderthread.mkdir(self.base_dir, parents = True) - self._prepare_working_space(min(self.num_threads, len(board_selected)), + self.prepare_working_space(min(self.num_threads, len(board_selected)), board_selected and commits is not None) - self._prepare_output_space() + self.prepare_output_space() if not self._opts.ide: tprint('\rStarting build...', newline=False) - self._start_time = datetime.now() + self.start_time = datetime.now() self._setup_build(board_selected, commits) self.count += extra_count self.process_result(None) @@ -1246,7 +1243,7 @@ class Builder: job.work_in_output = self.work_in_output job.adjust_cfg = self.adjust_cfg job.fragments = fragments - job.step = self._step + job.step = self.step if self.num_threads: self.queue.put(job) else: @@ -1318,4 +1315,4 @@ class Builder: """ self._result_handler.print_build_summary( self.count, self._already_done, self.kconfig_reconfig, - self._start_time, self.thread_exceptions) + self.start_time, self.thread_exceptions) diff --git a/tools/buildman/cmdline.py b/tools/buildman/cmdline.py index a85da069d24..5f3c47bf7fe 100644 --- a/tools/buildman/cmdline.py +++ b/tools/buildman/cmdline.py @@ -119,6 +119,9 @@ def add_upto_m(parser): parser.add_argument( '--maintainer-check', action='store_true', help='Check that maintainer entries exist for each board') + parser.add_argument('--worker', action='store_true', default=False, + help='Run in worker mode, accepting build commands on stdin ' + '(used internally for distributed builds)') parser.add_argument( '--no-allow-missing', action='store_true', default=False, help='Disable telling binman to allow missing blobs') diff --git a/tools/buildman/control.py b/tools/buildman/control.py index 7515932a2e4..082db377293 100644 --- a/tools/buildman/control.py +++ b/tools/buildman/control.py @@ -759,6 +759,11 @@ def do_buildman(args, toolchains=None, make_func=None, brds=None, gitutil.setup() col = terminal.Color() + # Handle --worker: run in worker mode for distributed builds + if args.worker: + from buildman import worker # pylint: disable=C0415 + return worker.do_worker() + # Handle --machines: probe remote machines and show status if args.machines or args.machines_fetch_arch: return machine.do_probe_machines( diff --git a/tools/buildman/main.py b/tools/buildman/main.py index faff7d41ceb..225e341fc26 100755 --- a/tools/buildman/main.py +++ b/tools/buildman/main.py @@ -42,6 +42,7 @@ def run_tests(skip_net_tests, debug, verbose, args): from buildman import test_builder from buildman import test_cfgutil from buildman import test_machine + from buildman import test_worker test_name = args.terms and args.terms[0] or None if skip_net_tests: @@ -65,6 +66,7 @@ def run_tests(skip_net_tests, debug, verbose, args): test_builder.TestMake, test_builder.TestPrintBuildSummary, test_machine, + test_worker, 'buildman.toolchain']) return (0 if result.wasSuccessful() else 1) diff --git a/tools/buildman/test_builder.py b/tools/buildman/test_builder.py index 48be83cf645..74f19ec9528 100644 --- a/tools/buildman/test_builder.py +++ b/tools/buildman/test_builder.py @@ -277,7 +277,7 @@ class TestPrepareThread(unittest.TestCase): class TestPrepareWorkingSpace(unittest.TestCase): - """Tests for Builder._prepare_working_space()""" + """Tests for Builder.prepare_working_space()""" def setUp(self): """Set up test fixtures""" @@ -296,7 +296,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): @mock.patch.object(builderthread, 'mkdir') def test_no_setup_git(self, mock_mkdir, mock_prepare_thread): """Test with setup_git=False""" - self.builder._prepare_working_space(2, False) + self.builder.prepare_working_space(2, False) mock_mkdir.assert_called_once() # Should prepare 2 threads with setup_git=False @@ -312,7 +312,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): def test_worktree_available(self, _mock_mkdir, mock_check_worktree, mock_prune, mock_prepare_thread): """Test when worktree is available""" - self.builder._prepare_working_space(3, True) + self.builder.prepare_working_space(3, True) mock_check_worktree.assert_called_once() mock_prune.assert_called_once() @@ -329,7 +329,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): def test_worktree_not_available(self, _mock_mkdir, mock_check_worktree, mock_prepare_thread): """Test when worktree is not available (falls back to clone)""" - self.builder._prepare_working_space(2, True) + self.builder.prepare_working_space(2, True) mock_check_worktree.assert_called_once() # Should prepare 2 threads with setup_git='clone' @@ -341,7 +341,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): @mock.patch.object(builderthread, 'mkdir') def test_zero_threads(self, _mock_mkdir, mock_prepare_thread): """Test with max_threads=0 (should still prepare 1 thread)""" - self.builder._prepare_working_space(0, False) + self.builder.prepare_working_space(0, False) # Should prepare at least 1 thread self.assertEqual(mock_prepare_thread.call_count, 1) @@ -352,7 +352,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): def test_no_git_dir(self, _mock_mkdir, mock_prepare_thread): """Test with no git_dir set""" self.builder.git_dir = None - self.builder._prepare_working_space(2, True) + self.builder.prepare_working_space(2, True) # _detect_git_setup returns False when git_dir is None self.assertEqual(mock_prepare_thread.call_count, 2) @@ -368,7 +368,7 @@ class TestPrepareWorkingSpace(unittest.TestCase): mock_prune, mock_prepare_thread): """Test lazy_thread_setup skips upfront thread preparation""" self.builder._lazy_thread_setup = True - self.builder._prepare_working_space(4, True) + self.builder.prepare_working_space(4, True) # Git setup type is detected so prepare_thread() can use it # later, but no threads are prepared upfront @@ -510,7 +510,7 @@ class TestShowNotBuilt(unittest.TestCase): class TestPrepareOutputSpace(unittest.TestCase): - """Tests for _prepare_output_space() and _get_output_space_removals()""" + """Tests for prepare_output_space() and _get_output_space_removals()""" def setUp(self): """Set up test fixtures""" @@ -561,25 +561,25 @@ class TestPrepareOutputSpace(unittest.TestCase): self.assertEqual(result, ['/tmp/test/02_g1234567_old']) @mock.patch.object(builder.Builder, '_get_output_space_removals') - def test_prepare_output_space_nothing_to_remove(self, mock_get_removals): - """Test _prepare_output_space with nothing to remove""" + def testprepare_output_space_nothing_to_remove(self, mock_get_removals): + """Test prepare_output_space with nothing to remove""" mock_get_removals.return_value = [] terminal.get_print_test_lines() # Clear - self.builder._prepare_output_space() + self.builder.prepare_output_space() lines = terminal.get_print_test_lines() self.assertEqual(len(lines), 0) @mock.patch.object(shutil, 'rmtree') @mock.patch.object(builder.Builder, '_get_output_space_removals') - def test_prepare_output_space_removes_dirs(self, mock_get_removals, + def testprepare_output_space_removes_dirs(self, mock_get_removals, mock_rmtree): - """Test _prepare_output_space removes old directories""" + """Test prepare_output_space removes old directories""" mock_get_removals.return_value = ['/tmp/test/old1', '/tmp/test/old2'] terminal.get_print_test_lines() # Clear - self.builder._prepare_output_space() + self.builder.prepare_output_space() # Check rmtree was called for each directory self.assertEqual(mock_rmtree.call_count, 2) diff --git a/tools/buildman/test_worker.py b/tools/buildman/test_worker.py new file mode 100644 index 00000000000..4a47501217b --- /dev/null +++ b/tools/buildman/test_worker.py @@ -0,0 +1,896 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> + +"""Tests for the worker module""" + +# pylint: disable=W0212,C0302 + +import io +import json +import os +import queue +import signal +import subprocess +import tempfile +import unittest +from unittest import mock + +from u_boot_pylib.command import CommandExc, CommandResult + +from buildman import worker + + +def _parse(line): + """Parse a protocol line into a dict""" + return json.loads(line[len(worker.RESPONSE_PREFIX):]) + + +class _ProtoTestBase(unittest.TestCase): + """Base class for tests that use the worker protocol + + Provides capture/parse helpers and resets _protocol_out on tearDown. + """ + + def setUp(self): + self.buf = io.StringIO() + worker._protocol_out = self.buf + + def tearDown(self): + worker._protocol_out = None + + def get_resp(self): + """Parse the last response written to the capture buffer""" + return _parse(self.buf.getvalue()) + + def get_all_resp(self): + """Parse all responses from the capture buffer""" + return [_parse(line) for line in self.buf.getvalue().strip().split('\n') + if line] + + def assert_resp(self, key, value): + """Assert a key in the last response equals value""" + self.assertEqual(self.get_resp()[key], value) + + def assert_in_output(self, text): + """Assert text appears in raw protocol output""" + self.assertIn(text, self.buf.getvalue()) + + +class _RunWorkerBase(_ProtoTestBase): + """Base class for tests that run the full worker loop""" + + def _run(self, stdin_text): + """Run the worker with given stdin, return (result, output lines)""" + buf = io.StringIO() + with mock.patch('buildman.worker.toolchain_mod.Toolchains'), \ + mock.patch('sys.stdin', io.StringIO(stdin_text)), \ + mock.patch('sys.stdout', buf): + result = worker.run_worker() + lines = [line for line in buf.getvalue().strip().split('\n') + if line] + return result, lines + + +class TestProtocol(_ProtoTestBase): + """Test _send(), _send_error() and _send_build_result()""" + + def test_send(self): + """Test sending a response""" + worker._send({'resp': 'ready', 'nthreads': 4}) + self.assertTrue(self.buf.getvalue().startswith( + worker.RESPONSE_PREFIX)) + self.assert_resp('resp', 'ready') + self.assert_resp('nthreads', 4) + + def test_send_error(self): + """Test sending an error response""" + worker._send_error('something broke') + self.assert_resp('resp', 'error') + self.assert_resp('msg', 'something broke') + + def test_send_build_result_with_sizes(self): + """Test sending result with sizes""" + worker._send_build_result( + 'sandbox', 0, 0, + sizes={'text': 1000, 'data': 200}) + self.assertEqual( + self.get_resp()['sizes'], {'text': 1000, 'data': 200}) + + def test_send_build_result_without_sizes(self): + """Test sending result without sizes""" + worker._send_build_result('sandbox', 0, 0) + self.assertNotIn('sizes', self.get_resp()) + + +class TestUtilityFunctions(unittest.TestCase): + """Test _get_nthreads(), _get_load_avg() and _get_sizes()""" + + def test_nthreads_normal(self): + """Test getting thread count""" + self.assertGreater(worker._get_nthreads(), 0) + + @mock.patch('os.cpu_count', return_value=None) + def test_nthreads_none(self, _mock): + """Test when cpu_count returns None""" + self.assertEqual(worker._get_nthreads(), 1) + + @mock.patch('os.cpu_count', side_effect=AttributeError) + def test_nthreads_attribute_error(self, _mock): + """Test when cpu_count raises AttributeError""" + self.assertEqual(worker._get_nthreads(), 1) + + @mock.patch('builtins.open', side_effect=OSError('no file')) + def test_load_avg_no_proc(self, _mock): + """Test when /proc/loadavg is not available""" + self.assertEqual(worker._get_load_avg(), 0.0) + + def test_get_sizes_no_elf(self): + """Test with no ELF file""" + with tempfile.TemporaryDirectory() as tmpdir: + self.assertEqual(worker._get_sizes(tmpdir), {}) + + @mock.patch('buildman.worker.subprocess.Popen') + def test_get_sizes_with_elf(self, mock_popen): + """Test with ELF file present""" + proc = mock.Mock() + proc.communicate.return_value = ( + b' text data bss dec hex filename\n' + b' 12345 1234 567 14146 374a u-boot\n', + b'') + proc.returncode = 0 + mock_popen.return_value = proc + with tempfile.TemporaryDirectory() as tmpdir: + elf = os.path.join(tmpdir, 'u-boot') + with open(elf, 'w', encoding='utf-8') as fout: + fout.write('fake') + self.assertIn('raw', worker._get_sizes(tmpdir)) + + @mock.patch('buildman.worker.subprocess.Popen', + side_effect=OSError('no size')) + def test_get_sizes_popen_fails(self, _mock): + """Test when size command fails""" + with tempfile.TemporaryDirectory() as tmpdir: + elf = os.path.join(tmpdir, 'u-boot') + with open(elf, 'w', encoding='utf-8') as fout: + fout.write('fake') + self.assertEqual(worker._get_sizes(tmpdir), {}) + + +class TestCmdSetup(_ProtoTestBase): + """Test _cmd_setup()""" + + @mock.patch('buildman.worker.command.run_one') + def test_auto_work_dir(self, mock_run): + """Test setup with auto-created work directory""" + mock_run.return_value = mock.Mock(return_code=0) + state = {} + result = worker._cmd_setup({'work_dir': ''}, state) + self.assertTrue(result) + self.assertIn('work_dir', state) + self.assertTrue(state.get('auto_work_dir')) + mock_run.assert_called_once() + self.addCleanup(lambda: os.path.isdir(state['work_dir']) + and os.rmdir(state['work_dir'])) + + @mock.patch('buildman.worker.command.run_one') + def test_explicit_work_dir(self, mock_run): + """Test setup with explicit work directory""" + mock_run.return_value = mock.Mock(return_code=0) + with tempfile.TemporaryDirectory() as tmpdir: + state = {} + work_dir = os.path.join(tmpdir, 'build') + self.assertTrue( + worker._cmd_setup({'work_dir': work_dir}, state)) + self.assertEqual(state['work_dir'], work_dir) + self.assertTrue(os.path.isdir(work_dir)) + self.assertNotIn('auto_work_dir', state) + + @mock.patch('buildman.worker.command.run_one') + def test_setup_returns_git_dir(self, mock_run): + """Test setup response includes git_dir""" + mock_run.return_value = mock.Mock(return_code=0) + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_setup({'work_dir': tmpdir}, {}) + self.assert_resp('resp', 'setup_done') + self.assertEqual( + self.get_resp()['git_dir'], + os.path.join(tmpdir, '.git')) + + def test_setup_existing_git(self): + """Test setup skips git init if .git already exists""" + with tempfile.TemporaryDirectory() as tmpdir: + os.makedirs(os.path.join(tmpdir, '.git')) + with mock.patch( + 'buildman.worker.command.run_one') as mock_run: + self.assertTrue( + worker._cmd_setup({'work_dir': tmpdir}, {})) + mock_run.assert_not_called() + + @mock.patch('buildman.worker.command.run_one') + def test_git_init_fails(self, mock_run): + """Test setup when git init fails""" + mock_run.side_effect = CommandExc( + 'git init failed', CommandResult()) + with tempfile.TemporaryDirectory() as tmpdir: + result = worker._cmd_setup( + {'work_dir': os.path.join(tmpdir, 'new')}, {}) + self.assertFalse(result) + self.assert_in_output('git init failed') + + +class TestCmdQuit(_ProtoTestBase): + """Test _cmd_quit()""" + + def test_quit(self): + """Test quit command""" + worker._cmd_quit({}) + self.assert_resp('resp', 'quit_ack') + + def test_quit_cleanup(self): + """Test quit cleans up auto work directory""" + tmpdir = tempfile.mkdtemp(prefix='bm-test-') + worker._cmd_quit({'work_dir': tmpdir, 'auto_work_dir': True}) + self.assertFalse(os.path.exists(tmpdir)) + + def test_quit_preserves_explicit_dir(self): + """Test quit does not remove non-auto work directory""" + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_quit({'work_dir': tmpdir}) + self.assertTrue(os.path.isdir(tmpdir)) + + +class TestCmdConfigure(_ProtoTestBase): + """Test _cmd_configure()""" + + def test_configure(self): + """Test that configure stores settings in state""" + state = {} + settings = {'no_lto': True, 'allow_missing': True} + self.assertTrue( + worker._cmd_configure({'settings': settings}, state)) + self.assertEqual(state['settings'], settings) + self.assert_resp('resp', 'configure_done') + + def test_configure_empty(self): + """Test configure with empty settings""" + state = {} + worker._cmd_configure({'settings': {}}, state) + self.assertEqual(state['settings'], {}) + + +class TestRunWorker(_RunWorkerBase): + """Test run_worker()""" + + def test_empty_stdin(self): + """Test worker with empty stdin (unexpected close)""" + result, lines = self._run('') + self.assertEqual(result, 1) + self.assertEqual(_parse(lines[0])['resp'], 'ready') + + def test_ready_includes_slots(self): + """Test that the ready response includes slots""" + _, lines = self._run('{"cmd": "quit"}\n') + obj = _parse(lines[0]) + self.assertEqual(obj['resp'], 'ready') + self.assertIn('slots', obj) + self.assertGreaterEqual(obj['slots'], 1) + + def test_quit_command(self): + """Test worker with quit command""" + result, lines = self._run('{"cmd": "quit"}\n') + self.assertEqual(result, 0) + self.assertEqual(len(lines), 2) # ready + quit_ack + + def test_invalid_json(self): + """Test worker with invalid JSON input""" + _, lines = self._run('not json\n{"cmd": "quit"}\n') + self.assertEqual(len(lines), 3) # ready + error + quit_ack + self.assertIn('invalid JSON', _parse(lines[1])['msg']) + + def test_unknown_command(self): + """Test worker with unknown command""" + _, lines = self._run( + '{"cmd": "dance"}\n{"cmd": "quit"}\n') + self.assertIn('unknown command', _parse(lines[1])['msg']) + + def test_blank_lines(self): + """Test worker ignores blank lines""" + _, lines = self._run('\n\n{"cmd": "quit"}\n\n') + self.assertEqual(len(lines), 2) # ready + quit_ack + + def test_configure_command(self): + """Test configure command in worker loop""" + result, lines = self._run( + '{"cmd": "configure", "settings": {"no_lto": true}}\n' + '{"cmd": "quit"}\n') + self.assertEqual(result, 0) + self.assertEqual(_parse(lines[1])['resp'], 'configure_done') + + +class TestRunWorkerDispatch(_RunWorkerBase): + """Test run_worker command dispatch""" + + @mock.patch('buildman.worker._cmd_setup', return_value=True) + def test_setup(self, mock_fn): + """Test setup command is dispatched""" + self._run('{"cmd": "setup", "work_dir": "/tmp"}\n' + '{"cmd": "quit"}\n') + mock_fn.assert_called_once() + + @mock.patch('buildman.worker._cmd_build_boards') + def test_build_boards(self, mock_fn): + """Test build_boards command is dispatched""" + self._run('{"cmd": "build_boards", "boards": [], ' + '"commits": []}\n{"cmd": "quit"}\n') + mock_fn.assert_called_once() + + @mock.patch('buildman.worker._cmd_build_prepare') + def test_build_prepare(self, mock_fn): + """Test build_prepare command is dispatched""" + self._run('{"cmd": "build_prepare", "commits": []}\n' + '{"cmd": "quit"}\n') + mock_fn.assert_called_once() + + @mock.patch('buildman.worker._cmd_build_board') + def test_build_board(self, mock_fn): + """Test build_board command is dispatched""" + self._run('{"cmd": "build_board", "board": "x"}\n' + '{"cmd": "quit"}\n') + mock_fn.assert_called_once() + + @mock.patch('buildman.worker._cmd_build_done') + def test_build_done(self, mock_fn): + """Test build_done command is dispatched""" + self._run('{"cmd": "build_done"}\n{"cmd": "quit"}\n') + mock_fn.assert_called_once() + + def test_stdin_eof(self): + """Test worker handles stdin EOF""" + result, _lines = self._run('') + self.assertEqual(result, 1) + + def test_queue_empty_retry(self): + """Test dispatch retries on queue.Empty""" + eof = object() + mock_queue = mock.Mock() + mock_queue.get.side_effect = [ + queue.Empty(), + '{"cmd": "quit"}\n', + ] + worker._dispatch_commands(mock_queue, eof, {}) + self.assert_in_output('quit_ack') + self.assertEqual(mock_queue.get.call_count, 2) + + +class TestWorkerMake(unittest.TestCase): + """Test _worker_make()""" + + @mock.patch('buildman.worker.subprocess.Popen') + def test_success(self, mock_popen): + """Test successful make invocation""" + proc = mock.Mock() + proc.communicate.return_value = (b'built ok\n', b'') + proc.returncode = 0 + mock_popen.return_value = proc + + result = worker._worker_make( + None, None, None, '/tmp', + 'O=/tmp/out', '-s', '-j', '4', 'sandbox_defconfig', + env={'PATH': '/usr/bin'}) + self.assertEqual(result.return_code, 0) + self.assertEqual(result.stdout, 'built ok\n') + self.assertEqual(mock_popen.call_args[0][0][0], 'make') + + @mock.patch('buildman.worker.subprocess.Popen', + side_effect=FileNotFoundError('no make')) + def test_make_not_found(self, _mock_popen): + """Test when make binary is not found""" + result = worker._worker_make( + None, None, None, '/tmp', env={}) + self.assertEqual(result.return_code, 1) + self.assertIn('make failed', result.stderr) + + +class TestWorkerBuilderThread(_ProtoTestBase): + """Test _WorkerBuilderThread""" + + def _make_thread(self): + """Create an uninitialised thread instance for testing + + Uses __new__ to avoid calling __init__ which requires a real + Builder. Tests must set any attributes they need. + """ + return worker._WorkerBuilderThread.__new__( + worker._WorkerBuilderThread) + + def test_write_result_is_noop(self): + """Test that _write_result does nothing""" + self._make_thread()._write_result(None, False, False) + + def test_send_result(self): + """Test that _send_result sends a build_result message""" + thread = self._make_thread() + result = mock.Mock( + brd=mock.Mock(target='sandbox'), + commit_upto=0, return_code=0, + stderr='', stdout='', out_dir='/nonexistent') + thread._send_result(result) + self.assert_resp('resp', 'build_result') + self.assert_resp('board', 'sandbox') + + def test_run_job_sends_heartbeat(self): + """Test run_job sends heartbeat""" + thread = self._make_thread() + thread.thread_num = 0 + job = mock.Mock(brd=mock.Mock(target='sandbox')) + with mock.patch.object(worker._WorkerBuilderThread.__bases__[0], + 'run_job'): + thread.run_job(job) + self.assert_in_output('heartbeat') + + def test_checkout_with_commits(self): + """Test _checkout with commits""" + thread = self._make_thread() + thread.builder = mock.Mock() + commit = mock.Mock(hash='abc123') + thread.builder.commits = [commit] + thread.builder.checkout = True + + with mock.patch('buildman.worker._run_git') as mock_git, \ + mock.patch('buildman.worker._remove_stale_lock'), \ + tempfile.TemporaryDirectory() as tmpdir: + result = thread._checkout(0, tmpdir) + + self.assertEqual(result, commit) + mock_git.assert_called_once() + + def test_checkout_no_commits(self): + """Test _checkout without commits returns 'current'""" + thread = self._make_thread() + thread.builder = mock.Mock(commits=None) + self.assertEqual(thread._checkout(0, '/tmp'), 'current') + + +class TestCmdBuildBoards(_ProtoTestBase): + """Test _cmd_build_boards""" + + def _make_state(self, tmpdir, **overrides): + """Create a standard state dict for build tests""" + state = { + 'work_dir': tmpdir, + 'toolchains': mock.Mock(), + 'settings': {}, + 'nthreads': 4, + } + state.update(overrides) + return state + + def test_no_work_dir_error(self): + """Test error when no work directory set""" + worker._cmd_build_boards({ + 'boards': [{'board': 'x', 'arch': 'arm'}], + 'commits': ['abc'], + }, {'work_dir': None}) + self.assert_in_output('no work directory') + + def test_no_boards_error(self): + """Test error when no boards specified""" + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_boards( + {'boards': [], 'commits': ['abc']}, + self._make_state(tmpdir)) + self.assert_in_output('no boards') + + def test_no_toolchains_error(self): + """Test error when toolchains not set up""" + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_boards({ + 'boards': [{'board': 'x', 'arch': 'arm'}], + 'commits': ['abc'], + }, {'work_dir': tmpdir}) + self.assert_in_output('no toolchains') + + @mock.patch('buildman.worker._setup_worktrees') + @mock.patch('buildman.worker.builder_mod.Builder') + @mock.patch('buildman.worker.ResultHandler') + def test_creates_builder(self, _mock_rh_cls, mock_builder_cls, + mock_setup_wt): + """Test that build_boards creates a Builder correctly""" + mock_builder = mock.Mock() + mock_builder.run_build.return_value = (0, 0, []) + mock_builder_cls.return_value = mock_builder + + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_boards({ + 'boards': [ + {'board': 'sandbox', 'arch': 'sandbox'}, + {'board': 'rpi', 'arch': 'arm'}, + ], + 'commits': ['abc123', 'def456'], + }, self._make_state(tmpdir, nthreads=8, + settings={'no_lto': True, + 'force_build': True, + 'kconfig_check': False})) + + mock_setup_wt.assert_called_once() + kwargs = mock_builder_cls.call_args[1] + self.assertEqual(kwargs['thread_class'], + worker._WorkerBuilderThread) + self.assertTrue(kwargs['no_lto']) + self.assertFalse(kwargs['kconfig_check']) + + call_args = mock_builder.init_build.call_args + self.assertEqual(len(call_args[0][0]), 2) + self.assertIn('rpi', call_args[0][1]) + self.assert_in_output('build_done') + + @mock.patch('buildman.worker.builder_mod.Builder') + @mock.patch('buildman.worker.ResultHandler') + def test_no_commits(self, _mock_rh_cls, mock_builder_cls): + """Test build_boards with no commits (current source)""" + mock_builder = mock.Mock() + mock_builder.run_build.return_value = (0, 0, []) + mock_builder_cls.return_value = mock_builder + + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_boards({ + 'boards': [{'board': 'sandbox', 'arch': 'sandbox'}], + 'commits': [None], + }, self._make_state(tmpdir)) + + self.assertIsNone( + mock_builder.init_build.call_args[0][0]) + self.assertTrue(mock_builder_cls.call_args[1]['kconfig_check']) + + @mock.patch('buildman.worker._setup_worktrees') + @mock.patch('buildman.worker.builder_mod.Builder') + @mock.patch('buildman.worker.ResultHandler') + def test_build_crash(self, _mock_rh, mock_builder_cls, _mock_wt): + """Test build_boards when run_build crashes""" + mock_builder = mock.Mock() + mock_builder.run_build.side_effect = RuntimeError('crash') + mock_builder_cls.return_value = mock_builder + + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_boards({ + 'boards': [{'board': 'x', 'arch': 'arm'}], + 'commits': ['abc'], + }, self._make_state(tmpdir, nthreads=2)) + + self.assert_in_output('"exceptions": 1') + + +class TestCmdBuildPrepare(_ProtoTestBase): + """Test _cmd_build_prepare()""" + + def test_no_work_dir(self): + """Test error when no work directory""" + worker._cmd_build_prepare({}, {}) + self.assert_in_output('no work directory') + + def test_no_toolchains(self): + """Test error when no toolchains""" + with tempfile.TemporaryDirectory() as tmpdir: + worker._cmd_build_prepare({}, {'work_dir': tmpdir}) + self.assert_in_output('no toolchains') + + @mock.patch('buildman.worker._setup_worktrees') + @mock.patch('buildman.worker._create_builder') + def test_success(self, mock_create, _mock_wt): + """Test successful build_prepare""" + mock_bldr = mock.Mock( + base_dir='/tmp/test', commit_count=1, + work_in_output=False) + mock_create.return_value = mock_bldr + + with tempfile.TemporaryDirectory() as tmpdir: + os.makedirs(os.path.join(tmpdir, '.git')) + state = { + 'work_dir': tmpdir, + 'toolchains': mock.Mock(), + 'nthreads': 2, + 'settings': {}, + } + worker._cmd_build_prepare( + {'commits': ['abc123']}, state) + + self.assert_in_output('build_prepare_done') + self.assertIn('builder', state) + + +class TestCmdBuildBoard(_ProtoTestBase): + """Test _cmd_build_board()""" + + def test_no_builder(self): + """Test error when no builder""" + worker._cmd_build_board({'board': 'x'}, {}) + self.assert_in_output('no builder') + + def test_queues_job(self): + """Test that build_board queues a job""" + mock_bldr = mock.Mock( + commit_count=1, count=0, work_in_output=False, + adjust_cfg=None, step=1) + worker._cmd_build_board( + {'board': 'sandbox', 'arch': 'sandbox'}, + {'builder': mock_bldr, 'commits': None}) + mock_bldr.queue.put.assert_called_once() + + +class TestCmdBuildDone(_ProtoTestBase): + """Test _cmd_build_done()""" + + def test_no_builder(self): + """Test build_done with no builder""" + worker._cmd_build_done({}) + self.assert_resp('resp', 'build_done') + self.assert_resp('exceptions', 0) + + def test_with_builder(self): + """Test build_done with a builder""" + mock_bldr = mock.Mock() + mock_bldr.run_build.return_value = (0, 0, []) + state = {'builder': mock_bldr, 'commits': ['abc']} + worker._cmd_build_done(state) + self.assert_resp('resp', 'build_done') + self.assertNotIn('builder', state) + + def test_builder_crash(self): + """Test build_done when run_build crashes""" + mock_bldr = mock.Mock() + mock_bldr.run_build.side_effect = RuntimeError('boom') + state = {'builder': mock_bldr, 'commits': ['abc']} + worker._cmd_build_done(state) + self.assert_resp('exceptions', 1) + self.assertNotIn('builder', state) + + +class TestSetupWorktrees(_ProtoTestBase): + """Test _setup_worktrees()""" + + @mock.patch('buildman.worker._run_git') + def test_creates_worktrees(self, mock_git): + """Test that worktrees are created for each thread""" + with tempfile.TemporaryDirectory() as tmpdir: + git_dir = os.path.join(tmpdir, '.git') + os.makedirs(git_dir) + worker._setup_worktrees(tmpdir, git_dir, 3) + + self.assertEqual(mock_git.call_count, 4) # prune + 3 adds + self.assertIn('prune', mock_git.call_args_list[0][0]) + resps = self.get_all_resp() + self.assertEqual(len(resps), 3) + for resp in resps: + self.assertEqual(resp['resp'], 'worktree_created') + + @mock.patch('buildman.worker._run_git') + def test_skips_existing_valid_worktree(self, mock_git): + """Test that valid existing worktrees are reused""" + with tempfile.TemporaryDirectory() as tmpdir: + git_dir = os.path.join(tmpdir, '.git') + os.makedirs(git_dir) + + thread_dir = os.path.join(tmpdir, '.bm-work', '00') + os.makedirs(thread_dir) + real_gitdir = os.path.join(git_dir, 'worktrees', '00') + os.makedirs(real_gitdir) + with open(os.path.join(thread_dir, '.git'), 'w', + encoding='utf-8') as fout: + fout.write(f'gitdir: {real_gitdir}\n') + + worker._setup_worktrees(tmpdir, git_dir, 1) + + self.assertEqual(mock_git.call_count, 1) # prune only + + @mock.patch('buildman.worker._run_git') + def test_replaces_stale_clone(self, mock_git): + """Test that a full .git directory (old clone) is replaced""" + with tempfile.TemporaryDirectory() as tmpdir: + git_dir = os.path.join(tmpdir, '.git') + os.makedirs(git_dir) + + thread_dir = os.path.join(tmpdir, '.bm-work', '00') + clone_git = os.path.join(thread_dir, '.git') + os.makedirs(os.path.join(clone_git, 'objects')) + + worker._setup_worktrees(tmpdir, git_dir, 1) + self.assertFalse(os.path.isdir(clone_git)) + + self.assertEqual(mock_git.call_count, 2) # prune + add + + @mock.patch('buildman.worker._run_git') + def test_stale_dot_git_file(self, mock_git): + """Test removing stale .git file pointing to non-existent dir""" + with tempfile.TemporaryDirectory() as tmpdir: + git_dir = os.path.join(tmpdir, '.git') + os.makedirs(git_dir) + + thread_dir = os.path.join(tmpdir, '.bm-work', '00') + os.makedirs(thread_dir) + dot_git = os.path.join(thread_dir, '.git') + with open(dot_git, 'w', encoding='utf-8') as fout: + fout.write('gitdir: /nonexistent\n') + + worker._setup_worktrees(tmpdir, git_dir, 1) + self.assertFalse(os.path.isfile(dot_git)) + + self.assertGreaterEqual(mock_git.call_count, 2) + + +class TestRunGit(unittest.TestCase): + """Test _run_git()""" + + @mock.patch('buildman.worker.subprocess.Popen') + def test_success(self, mock_popen): + """Test successful git command""" + proc = mock.Mock() + proc.communicate.return_value = (b'', b'') + proc.returncode = 0 + mock_popen.return_value = proc + worker._run_git('status', cwd='/tmp') + + @mock.patch('buildman.worker.subprocess.Popen') + def test_failure(self, mock_popen): + """Test failed git command""" + proc = mock.Mock() + proc.communicate.return_value = (b'', b'fatal: bad ref\n') + proc.returncode = 128 + mock_popen.return_value = proc + with self.assertRaises(OSError) as ctx: + worker._run_git('checkout', 'bad', cwd='/tmp') + self.assertIn('bad ref', str(ctx.exception)) + + @mock.patch('buildman.worker.subprocess.Popen') + def test_timeout(self, mock_popen): + """Test git command timeout""" + proc = mock.Mock() + proc.communicate.side_effect = [ + subprocess.TimeoutExpired('git', 30), + (b'', b''), + ] + mock_popen.return_value = proc + with self.assertRaises(OSError) as ctx: + worker._run_git('fetch', cwd='/tmp', timeout=30) + self.assertIn('timed out', str(ctx.exception)) + + +class TestResolveGitDir(unittest.TestCase): + """Test _resolve_git_dir()""" + + def test_directory(self): + """Test with a regular .git directory""" + with tempfile.TemporaryDirectory() as tmpdir: + git_dir = os.path.join(tmpdir, '.git') + os.makedirs(git_dir) + self.assertEqual(worker._resolve_git_dir(git_dir), git_dir) + + def test_gitdir_file_absolute(self): + """Test with a .git file with absolute path""" + with tempfile.TemporaryDirectory() as tmpdir: + real_git = os.path.join(tmpdir, 'real.git') + os.makedirs(real_git) + dot_git = os.path.join(tmpdir, '.git') + with open(dot_git, 'w', encoding='utf-8') as fout: + fout.write(f'gitdir: {real_git}\n') + self.assertEqual( + worker._resolve_git_dir(dot_git), real_git) + + def test_gitdir_file_relative(self): + """Test .git file with relative path""" + with tempfile.TemporaryDirectory() as tmpdir: + real_git = os.path.join(tmpdir, 'worktrees', 'wt1') + os.makedirs(real_git) + dot_git = os.path.join(tmpdir, '.git') + with open(dot_git, 'w', encoding='utf-8') as fout: + fout.write('gitdir: worktrees/wt1\n') + self.assertEqual( + worker._resolve_git_dir(dot_git), + os.path.join(tmpdir, 'worktrees', 'wt1')) + + +class TestProcessManagement(unittest.TestCase): + """Test _kill_group(), _dbg() and signal handling""" + + @mock.patch('os.killpg') + def test_kill_group_not_leader(self, mock_killpg): + """Test _kill_group when not group leader""" + old = worker._is_group_leader + worker._is_group_leader = False + worker._kill_group() + mock_killpg.assert_not_called() + worker._is_group_leader = old + + @mock.patch('os.killpg') + @mock.patch('os.getpgrp', return_value=12345) + def test_kill_group_leader(self, _mock_grp, mock_killpg): + """Test _kill_group when group leader""" + old = worker._is_group_leader + worker._is_group_leader = True + worker._kill_group() + mock_killpg.assert_called_once() + worker._is_group_leader = old + + @mock.patch('os.killpg', side_effect=OSError('no perm')) + @mock.patch('os.getpgrp', return_value=12345) + def test_kill_group_fails(self, _mock_grp, _mock_killpg): + """Test _kill_group handles OSError""" + old = worker._is_group_leader + worker._is_group_leader = True + worker._kill_group() # should not raise + worker._is_group_leader = old + + def test_dbg_off(self): + """Test _dbg when debug is off""" + old = worker._debug + worker._debug = False + worker._dbg('test message') # should not raise + worker._debug = old + + def test_dbg_on(self): + """Test _dbg when debug is on""" + old = worker._debug + worker._debug = True + with mock.patch('sys.stderr', new_callable=io.StringIO) as err: + worker._dbg('hello') + self.assertIn('hello', err.getvalue()) + worker._debug = old + + def test_dbg_stderr_oserror(self): + """Test _dbg handles OSError from stderr""" + old = worker._debug + worker._debug = True + err = mock.Mock() + err.write.side_effect = OSError('broken pipe') + with mock.patch('sys.stderr', err): + worker._dbg('will fail') # should not raise + worker._debug = old + + @mock.patch('buildman.worker._kill_group') + @mock.patch('os._exit') + def test_exit_handler(self, mock_exit, mock_kill): + """Test signal handler calls _kill_group and os._exit""" + handlers = {} + orig_signal = signal.signal + + def capture_signal(signum, handler): + handlers[signum] = handler + return orig_signal(signum, signal.SIG_DFL) + + with mock.patch('signal.signal', side_effect=capture_signal), \ + mock.patch('buildman.worker.toolchain_mod.Toolchains'), \ + mock.patch('sys.stdin', io.StringIO('{"cmd":"quit"}\n')), \ + mock.patch('sys.stdout', io.StringIO()): + worker.run_worker() + + handler = handlers.get(signal.SIGTERM) + self.assertIsNotNone(handler) + mock_kill.reset_mock() + mock_exit.reset_mock() + handler(signal.SIGTERM, None) + mock_kill.assert_called() + mock_exit.assert_called_with(1) + + +class TestDoWorker(unittest.TestCase): + """Test do_worker()""" + + @mock.patch('buildman.worker.run_worker', return_value=0) + @mock.patch('os.setpgrp') + @mock.patch('os.getpid', return_value=100) + @mock.patch('os.getpgrp', return_value=100) + def test_start(self, _grp, _pid, _setpgrp, mock_run): + """Test do_worker sets process group and runs""" + self.assertEqual(worker.do_worker(debug=False), 0) + mock_run.assert_called_once_with(False) + + @mock.patch('buildman.worker.run_worker', return_value=0) + @mock.patch('os.setpgrp', side_effect=OSError('not allowed')) + @mock.patch('os.getpid', return_value=100) + @mock.patch('os.getpgrp', return_value=1) + def test_setpgrp_fails(self, _grp, _pid, _setpgrp, _mock_run): + """Test do_worker handles setpgrp failure""" + self.assertEqual(worker.do_worker(debug=False), 0) + + +if __name__ == '__main__': + unittest.main() diff --git a/tools/buildman/worker.py b/tools/buildman/worker.py new file mode 100644 index 00000000000..ddf023a1979 --- /dev/null +++ b/tools/buildman/worker.py @@ -0,0 +1,985 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> + +"""Worker mode for distributed builds + +A worker runs on a remote machine and receives build commands over stdin from a +boss. Commands and responses use a JSON-lines protocol: + +Commands (boss -> worker, on stdin): + {"cmd": "setup", "work_dir": "/path"} + {"cmd": "configure", "settings": {"no_lto": true, ...}} + {"cmd": "build_boards", + "boards": [{"board": "sandbox", "arch": "sandbox"}], + "commits": ["<hash>", ...]} + {"cmd": "build_prepare", "commits": ["<hash>", ...]} + {"cmd": "build_board", "board": "sandbox", "arch": "sandbox"} + {"cmd": "build_done"} + {"cmd": "quit"} + +Responses (worker -> boss, on stdout): + Each line is prefixed with 'BM> ' followed by a JSON object: + BM> {"resp": "ready", "nthreads": 8, "slots": 2} + BM> {"resp": "setup_done", "work_dir": "/path", "git_dir": "/path/.git"} + BM> {"resp": "configure_done"} + BM> {"resp": "build_prepare_done"} + BM> {"resp": "build_result", "board": "sandbox", "commit_upto": 0, + "return_code": 0, "stderr": "", "sizes": {...}} + BM> {"resp": "build_done", "exceptions": 0} + BM> {"resp": "error", "msg": "..."} + BM> {"resp": "quit_ack"} + +The 'BM> ' prefix allows the boss to distinguish protocol messages from +any stray output on the SSH connection (e.g. login banners, warnings). + +The worker uses Builder and BuilderThread from the local build path, +with a custom BuilderThread subclass that sends results over SSH +instead of writing them to disk. This means the worker inherits the +same board-first scheduling, per-thread worktrees, incremental builds +and retry logic as local builds. + +Typical flow (batch mode): + 1. Boss starts worker: ssh host buildman --worker + 2. Worker sends 'ready' with nthreads + 3. Boss sends 'setup' to create work directory with a git repo + 4. Worker sends 'setup_done' with git_dir path + 5. Boss pushes source: git push ssh://host/<git_dir> HEAD:refs/heads/work + 6. Boss sends 'build_boards' with all boards and commits + 7. Worker creates a Builder which sets up per-thread worktrees + and runs BuilderThread instances that pick boards from a queue, + build all commits for each, and stream 'build_result' responses + 8. Boss sends 'quit' when done + +Demand-driven flow: + Steps 1-5 same as above, then: + 6. Boss sends 'build_prepare' with commits + 7. Worker creates Builder and worktrees, sends 'build_prepare_done' + 8. Boss sends 'build_board' commands one at a time from a shared + pool, sending more as results arrive to keep threads busy + 9. Boss sends 'build_done' when no more boards + 10. Worker drains queue, sends 'build_done', boss sends 'quit' +""" + +import json +import os +import queue +import signal +import shutil +import subprocess +import sys +import tempfile +import traceback +import threading + +from buildman.board import Board +from buildman import builderthread +from buildman import builder as builder_mod +from buildman.outcome import DisplayOptions +from buildman.resulthandler import ResultHandler +from buildman import toolchain as toolchain_mod +from u_boot_pylib import command +from u_boot_pylib import terminal + +from patman.commit import Commit + +# Protocol prefix for all worker responses +RESPONSE_PREFIX = 'BM> ' + +# Lock to prevent interleaved stdout writes from concurrent build threads +_send_lock = threading.Lock() + +# Lock for debug output to stderr +_debug_lock = threading.Lock() + +# Whether debug output is enabled (set by run_worker) +_debug = False # pylint: disable=C0103 + +# Whether this process is a process group leader (set by do_worker) +_is_group_leader = False # pylint: disable=C0103 + +# The real stdout for protocol messages (set by run_worker) +_protocol_out = None # pylint: disable=C0103 + + +def _kill_group(): + """Kill all processes in our process group + + Sends SIGKILL to our entire process group, which includes this + process plus all make, cc1, as, ld, etc. spawned by build threads. + Only works if do_worker() confirmed we are the process group leader. + Does nothing otherwise, to avoid killing unrelated processes + (e.g. the test runner). + """ + if not _is_group_leader: + _dbg('_kill_group: not leader, skipping') + return + _dbg(f'_kill_group: killing pgid {os.getpgrp()}') + try: + os.killpg(os.getpgrp(), signal.SIGKILL) + except OSError as exc: + _dbg(f'_kill_group: killpg failed: {exc}') + + +def _dbg(msg): + """Print a debug message to stderr if debug mode is enabled + + Args: + msg (str): Message to print + """ + if _debug: + with _debug_lock: + try: + sys.stderr.write(f'W: {msg}\n') + sys.stderr.flush() + except OSError: + pass + + +def _send(obj): + """Send a JSON response to the boss + + Thread-safe: uses a lock to prevent interleaved writes from + concurrent build threads. Writes to _protocol_out (the real + stdout) rather than sys.stdout which is redirected to stderr. + + Args: + obj (dict): Response object to send + """ + out = _protocol_out or sys.stdout + with _send_lock: + out.write(RESPONSE_PREFIX + json.dumps(obj) + '\n') + out.flush() + + +def _send_error(msg): + """Send an error response + + Args: + msg (str): Error message + """ + _send({'resp': 'error', 'msg': msg}) + + +def _send_build_result(board, commit_upto, return_code, **kwargs): + """Send a build result response + + Args: + board (str): Board target name + commit_upto (int): Commit number + return_code (int): Build return code + **kwargs: Optional keys: stderr, stdout, sizes + """ + result = { + 'resp': 'build_result', + 'board': board, + 'commit_upto': commit_upto, + 'return_code': return_code, + 'stderr': kwargs.get('stderr', ''), + 'stdout': kwargs.get('stdout', ''), + 'load_avg': _get_load_avg(), + } + sizes = kwargs.get('sizes') + if sizes: + result['sizes'] = sizes + _send(result) + + +def _get_nthreads(): + """Get the number of available build threads + + Returns: + int: Number of threads available for building + """ + try: + return os.cpu_count() or 1 + except (AttributeError, NotImplementedError): + return 1 + + +def _get_load_avg(): + """Get the 1-minute load average + + Returns: + float: 1-minute load average, or 0.0 if unavailable + """ + try: + with open('/proc/loadavg', encoding='utf-8') as inf: + return float(inf.read().split()[0]) + except (OSError, ValueError, IndexError): + return 0.0 + + +def _get_sizes(out_dir): + """Get the image sizes from a build output directory + + Uses subprocess.Popen directly instead of command.run_pipe() to + avoid the select() FD_SETSIZE limit in cros_subprocess. With many + threads running builds, pipe file descriptors can exceed 1024, + causing select() to fail or corrupt memory. + + Args: + out_dir (str): Build output directory + + Returns: + dict: Size information, or empty dict if not available + """ + elf = os.path.join(out_dir, 'u-boot') + if not os.path.exists(elf): + return {} + try: + proc = subprocess.Popen( # pylint: disable=R1732 + ['size', elf], stdin=subprocess.DEVNULL, + stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, _ = proc.communicate() + if proc.returncode == 0: + # Strip the header line from size output, keeping only data lines. + # This matches the format that local builderthread produces. + lines = stdout.decode('utf-8', errors='replace').splitlines() + if len(lines) > 1: + return {'raw': '\n'.join(lines[1:])} + except OSError: + pass + return {} + + +def _worker_make(_commit, _brd, _stage, cwd, *args, **kwargs): + """Run make using subprocess.Popen to avoid select() FD limit + + On workers with many parallel builds, file descriptor numbers can + exceed FD_SETSIZE (1024), causing the select()-based + communicate_filter in cros_subprocess to fail. Using + subprocess.Popen with communicate() avoids this. + + Args: + _commit: Unused (API compatibility with Builder.make) + _brd: Unused + _stage: Unused + cwd (str): Working directory + *args: Make arguments + **kwargs: Must include 'env' dict + + Returns: + CommandResult: Result of the make command + """ + env = kwargs.get('env') + cmd = ['make'] + list(args) + try: + proc = subprocess.Popen( # pylint: disable=R1732 + cmd, cwd=cwd, env=env, stdin=subprocess.DEVNULL, + stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, stderr = proc.communicate() + result = command.CommandResult() + result.stdout = stdout.decode('utf-8', errors='replace') + result.stderr = stderr.decode('utf-8', errors='replace') + result.combined = result.stdout + result.stderr + result.return_code = proc.returncode + return result + except Exception as exc: # pylint: disable=W0718 + result = command.CommandResult() + result.return_code = 1 + result.stderr = f'make failed to start: {exc}' + result.combined = result.stderr + return result + + +def _run_git(*args, cwd=None, timeout=60): + """Run a git command using subprocess.Popen to avoid select() FD limit + + On workers with many parallel builds, file descriptor numbers can + exceed FD_SETSIZE (1024), causing the select()-based cros_subprocess + to fail. Using subprocess.Popen with communicate() avoids this. + + Args: + *args: Git command arguments (without 'git' prefix) + cwd (str): Working directory + timeout (int): Timeout in seconds for the command + + Returns: + CommandResult: Result of the git command + + Raises: + OSError: If the git command fails or times out + """ + cmd = ['git'] + list(args) + proc = subprocess.Popen( # pylint: disable=R1732 + cmd, cwd=cwd, stdin=subprocess.DEVNULL, + stdout=subprocess.PIPE, stderr=subprocess.PIPE) + try: + _, stderr = proc.communicate(timeout=timeout) + except subprocess.TimeoutExpired as exc: + proc.kill() + proc.communicate() + raise OSError( + f'git command timed out after {timeout}s: {cmd}') from exc + if proc.returncode != 0: + raise OSError(stderr.decode('utf-8', errors='replace').strip()) + + +def _resolve_git_dir(git_dir): + """Resolve a .git entry to the actual git directory + + For a regular repo, .git is a directory and is returned as-is. + For a worktree, .git is a file containing 'gitdir: <path>' and + the referenced directory is returned. + + Args: + git_dir (str): Path to a .git file or directory + + Returns: + str: Path to the actual git directory + """ + if os.path.isfile(git_dir): + with open(git_dir, encoding='utf-8') as inf: + line = inf.readline().strip() + if line.startswith('gitdir: '): + path = line[8:] + if not os.path.isabs(path): + path = os.path.join(os.path.dirname(git_dir), path) + return path + return git_dir + + +def _remove_stale_lock(git_dir): + """Remove a stale index.lock left by a previous SIGKILL'd run + + When the worker is killed (e.g. boss timeout or Ctrl-C), any + in-progress git checkout leaves behind an index.lock. This must + be cleaned up before the next checkout can proceed. + + Args: + git_dir (str): Path to a .git file or directory + """ + real_dir = _resolve_git_dir(git_dir) + lock = os.path.join(real_dir, 'index.lock') + try: + os.remove(lock) + except FileNotFoundError: + pass + + +def _setup_worktrees(work_dir, git_dir, num_threads): + """Create per-thread worktrees sequentially with progress messages + + Sets up git worktrees for each build thread before the Builder is + created. This avoids the problems of lazy per-thread creation: + concurrent threads contending on the index lock, and no progress + messages reaching the boss while threads are blocked. + + For existing valid worktrees (from a previous run), creation is + skipped. A 'worktree_created' message is sent after each thread + so the boss can show setup progress. + + Args: + work_dir (str): Base work directory (Builder's base_dir) + git_dir (str): Git directory path (e.g. work_dir/.git) + num_threads (int): Number of threads to create worktrees for + """ + bm_work = os.path.join(work_dir, '.bm-work') + os.makedirs(bm_work, exist_ok=True) + src_dir = os.path.abspath(git_dir) + + # Clean up stale locks from a previous SIGKILL'd run. Both the + # main repo and the worktree gitdirs can have stale index.lock + # files that would make git commands hang indefinitely. + _remove_stale_lock(git_dir) + + # Prune stale worktree entries before creating new ones + _run_git('worktree', 'prune', cwd=work_dir) + + for i in range(num_threads): + thread_dir = os.path.join(bm_work, f'{i:02d}') + dot_git = os.path.join(thread_dir, '.git') + + need_worktree = not os.path.exists(dot_git) + if not need_worktree: + if os.path.isdir(dot_git): + # This is a full clone from an older buildman version, + # not a worktree. Remove it so we can create a proper + # worktree that shares objects with the main repo. + shutil.rmtree(thread_dir) + need_worktree = True + else: + # Validate existing worktree — it may be stale from a + # previous killed run whose gitdir was pruned + real_dir = _resolve_git_dir(dot_git) + if not os.path.isdir(real_dir): + os.remove(dot_git) + need_worktree = True + + if need_worktree: + os.makedirs(thread_dir, exist_ok=True) + _run_git('--git-dir', src_dir, 'worktree', + 'add', '.', '--detach', cwd=thread_dir) + else: + _remove_stale_lock(dot_git) + + _send({'resp': 'worktree_created', 'thread': i}) + + +class _WorkerBuilderThread(builderthread.BuilderThread): + """BuilderThread subclass that sends results over SSH + + Overrides _write_result() (no-op, since the worker doesn't write + build output to a local directory tree), _send_result() (sends + the result back to the boss as a JSON protocol message instead of + putting it in the builder's out_queue), and _checkout() (uses + subprocess.Popen for git checkout to avoid the select() FD_SETSIZE + limit on machines with many threads). + + Worktrees are created sequentially before the Builder starts + threads (see _setup_worktrees), so _checkout() only needs to + do the checkout itself. + """ + + def run_job(self, job): + """Run a job, sending a heartbeat so the boss knows we're alive""" + _send({'resp': 'heartbeat', 'board': job.brd.target, + 'thread': self.thread_num}) + super().run_job(job) + + def _write_result(self, result, keep_outputs, work_in_output): + """Skip disk writes — results are sent over SSH""" + + def _send_result(self, result): + """Send the build result to the boss over the SSH protocol""" + sizes = {} + if result.out_dir and result.return_code == 0: + sizes = _get_sizes(result.out_dir) + _send_build_result( + result.brd.target, result.commit_upto, result.return_code, + stderr=result.stderr or '', stdout=result.stdout or '', + sizes=sizes) + + def _checkout(self, commit_upto, work_dir): + """Check out a commit using subprocess to avoid select() FD limit + + Worktrees are already set up by _setup_worktrees() before + the Builder starts threads, so this only needs to do the + checkout itself. + """ + if self.builder.commits: + commit = self.builder.commits[commit_upto] + if self.builder.checkout: + git_dir = os.path.join(work_dir, '.git') + _remove_stale_lock(git_dir) + _run_git('checkout', '-f', commit.hash, cwd=work_dir) + else: + commit = 'current' + return commit + + +def _cmd_setup(req, state): + """Handle the 'setup' command + + Creates or re-uses a work directory and initialises a git repo in it. + The boss can then use 'git push' over SSH to send source code + to the repo before issuing build commands. + + Also scans for available toolchains so that the worker can select + the right cross-compiler for each board's architecture. + + Args: + req (dict): Request with keys: + work_dir (str): Working directory path (auto-created if empty) + state (dict): Worker state, updated in place + + Returns: + bool: True on success + """ + work_dir = req.get('work_dir') + if not work_dir: + work_dir = tempfile.mkdtemp(prefix='bm-worker-') + state['auto_work_dir'] = True + os.makedirs(work_dir, exist_ok=True) + state['work_dir'] = work_dir + + # Initialise a git repo so the boss can push to it + git_dir = os.path.join(work_dir, '.git') + if not os.path.isdir(git_dir): + try: + command.run_one('git', 'init', cwd=work_dir, + capture=True, raise_on_error=True) + except command.CommandExc as exc: + _send_error(f'git init failed: {exc}') + return False + + _send({'resp': 'setup_done', 'work_dir': work_dir, + 'git_dir': git_dir}) + return True + + +def _cmd_configure(req, state): + """Handle the 'configure' command + + Stores build settings received from the boss. These settings mirror + the command-line flags that affect how make is invoked (verbose, + allow_missing, no_lto, etc.) and are applied to every subsequent + build. + + Args: + req (dict): Request with 'settings' dict containing build flags + state (dict): Worker state, updated in place + + Returns: + bool: True on success + """ + settings = req.get('settings', {}) + state['settings'] = settings + _dbg(f'configure: {settings}') + _send({'resp': 'configure_done'}) + return True + + +def _parse_commits(commit_hashes): + """Convert commit hashes to Commit objects + + Args: + commit_hashes (list): Commit hashes, or [None] for current source + + Returns: + list of Commit or None: Commit objects, or None for current source + """ + if commit_hashes and commit_hashes[0] is not None: + return [Commit(h) for h in commit_hashes] + return None + + +def _parse_boards(board_dicts): + """Convert board dicts from the boss into Board objects + + Args: + board_dicts (list of dict): Each with 'board' and 'arch' keys + + Returns: + dict: target_name -> Board mapping + """ + board_selected = {} + for bd in board_dicts: + target = bd['board'] + brd = Board('Active', bd.get('arch', ''), '', '', '', + target, target, target) + board_selected[target] = brd + return board_selected + + +def _run_build(bldr, commits, board_selected): + """Run a build and send the result over the protocol + + Args: + bldr (Builder): Configured builder + commits (list of Commit or None): Commits to build + board_selected (dict): target_name -> Board mapping + """ + bldr.init_build(commits, board_selected, keep_outputs=False, + verbose=False, fragments=None) + try: + _fail, _warned, exceptions = bldr.run_build(delay_summary=True) + except Exception as exc: # pylint: disable=W0718 + _dbg(f'run_build crashed: {exc}') + _dbg(traceback.format_exc()) + _send({'resp': 'build_done', 'exceptions': 1}) + return + _send({'resp': 'build_done', + 'exceptions': len(exceptions)}) + + +def _cmd_build_boards(req, state): + """Handle the 'build_boards' command + + Creates a Builder with a _WorkerBuilderThread subclass and runs the + build. Results are streamed back over the SSH protocol as each + commit completes. + + Args: + req (dict): Request with: + boards (list of dict): Each with 'board' and 'arch' + commits (list): Commit hashes in order, or [None] for + current source + state (dict): Worker state + """ + work_dir = state.get('work_dir') + if not work_dir: + _send_error('no work directory set up') + return + + board_dicts = req.get('boards', []) + if not board_dicts: + _send_error('no boards specified') + return + + toolchains = state.get('toolchains') + if not toolchains: + _send_error('no toolchains available (run setup first)') + return + + nthreads = state.get('nthreads', _get_nthreads()) + commits = _parse_commits(req.get('commits', [None])) + board_selected = _parse_boards(board_dicts) + + # Calculate thread/job split: enough threads to keep all CPUs + # busy, with each thread running make with -j proportionally + num_threads = min(nthreads, len(board_selected)) + num_jobs = max(1, nthreads // num_threads) + + _dbg(f'build_boards: {len(board_selected)} boards x ' + f'{len(commits) if commits else 1} commits ' + f'threads={num_threads} -j{num_jobs}') + + # Set up worktrees sequentially before creating the Builder. + # This sends progress messages so the boss can show setup status + # (e.g. [ruru 3/256]) and avoids the build timeout firing before + # any build results arrive. + git_dir = os.path.join(work_dir, '.git') + if commits is not None: + _send({'resp': 'build_started', 'num_threads': num_threads}) + _setup_worktrees(work_dir, git_dir, num_threads) + + bldr = _create_builder(state, num_threads, num_jobs) + _run_build(bldr, commits, board_selected) + + +def _create_builder(state, num_threads, num_jobs): + """Create a Builder configured for worker use + + Args: + state (dict): Worker state with toolchains, work_dir, settings + num_threads (int): Number of build threads + num_jobs (int): Make -j value per thread + + Returns: + Builder: Configured builder with threads started and waiting + """ + work_dir = state['work_dir'] + git_dir = os.path.join(work_dir, '.git') + settings = state.get('settings', {}) + toolchains = state['toolchains'] + + col = terminal.Color(terminal.COLOR_NEVER) + opts = DisplayOptions( + show_errors=False, show_sizes=False, show_detail=False, + show_bloat=False, show_config=False, show_environment=False, + show_unknown=False, ide=True, list_error_boards=False) + result_handler = ResultHandler(col, opts) + + bldr = builder_mod.Builder( + toolchains, work_dir, git_dir, + num_threads, num_jobs, col, result_handler, + thread_class=_WorkerBuilderThread, + make_func=_worker_make, + handle_signals=False, + lazy_thread_setup=True, + checkout=True, + per_board_out_dir=False, + force_build=settings.get('force_build', False), + force_build_failures=settings.get('force_build', False), + no_lto=settings.get('no_lto', False), + allow_missing=settings.get('allow_missing', False), + verbose_build=settings.get('verbose_build', False), + warnings_as_errors=settings.get('warnings_as_errors', False), + mrproper=settings.get('mrproper', False), + fallback_mrproper=settings.get('fallback_mrproper', False), + config_only=settings.get('config_only', False), + reproducible_builds=settings.get('reproducible_builds', False), + force_config_on_failure=True, + kconfig_check=settings.get('kconfig_check', True), + ) + result_handler.set_builder(bldr) + return bldr + + +def _cmd_build_prepare(req, state): + """Handle the 'build_prepare' command + + Creates a Builder with threads waiting for jobs. The boss follows + this with 'build_board' commands to feed boards one at a time, then + 'build_done' to signal completion. + + Args: + req (dict): Request with: + commits (list): Commit hashes in order, or [None] for + current source + state (dict): Worker state + """ + work_dir = state.get('work_dir') + if not work_dir: + _send_error('no work directory set up') + return + + toolchains = state.get('toolchains') + if not toolchains: + _send_error('no toolchains available (run setup first)') + return + + commits = _parse_commits(req.get('commits', [None])) + nthreads = state.get('nthreads', _get_nthreads()) + max_boards = req.get('max_boards', 0) + + num_threads = nthreads + num_jobs = None # dynamic: nthreads / active_boards + + _dbg(f'build_prepare: ' + f'{len(commits) if commits else 1} commits ' + f'threads={num_threads} max_boards={max_boards} -j=dynamic') + + # Set up worktrees before creating the Builder + git_dir = os.path.join(work_dir, '.git') + if commits is not None: + _send({'resp': 'build_started', 'num_threads': num_threads}) + _setup_worktrees(work_dir, git_dir, num_threads) + + bldr = _create_builder(state, num_threads, num_jobs) + bldr.max_boards = max_boards + + # Minimal init: set commits and prepare directories. Threads are + # already started by the Builder constructor, waiting on the queue. + bldr.commit_count = len(commits) if commits else 1 + bldr.commits = commits + bldr.verbose = False + builderthread.mkdir(bldr.base_dir, parents=True) + bldr.prepare_working_space(num_threads, commits is not None) + bldr.prepare_output_space() + bldr.start_time = builder_mod.datetime.now() + bldr.count = 0 + bldr.upto = bldr._warned = bldr.fail = 0 + bldr.timestamps = builder_mod.collections.deque() + bldr.thread_exceptions = [] + + state['builder'] = bldr + state['commits'] = commits + _send({'resp': 'build_prepare_done'}) + + +def _cmd_build_board(req, state): + """Handle the 'build_board' command + + Adds one board to the running Builder's job queue. + + Args: + req (dict): Request with: + board (str): Board target name + arch (str): Board architecture + state (dict): Worker state with 'builder' from build_prepare + """ + bldr = state.get('builder') + if not bldr: + _send_error('no builder (send build_prepare first)') + return + + target = req['board'] + arch = req.get('arch', '') + brd = Board('Active', arch, '', '', '', target, target, target) + commits = state.get('commits') + + job = builderthread.BuilderJob() + job.brd = brd + job.commits = commits + job.keep_outputs = False + job.work_in_output = bldr.work_in_output + job.adjust_cfg = bldr.adjust_cfg + job.fragments = None + job.step = bldr.step + bldr.count += bldr.commit_count + bldr.queue.put(job) + + +def _cmd_build_done(state): + """Handle the 'build_done' command from the boss + + Waits for all queued jobs to finish, then sends build_done. + + Args: + state (dict): Worker state with 'builder' from build_prepare + """ + bldr = state.get('builder') + if not bldr: + _send({'resp': 'build_done', 'exceptions': 0}) + return + + try: + _fail, _warned, exceptions = bldr.run_build(delay_summary=True) + except Exception as exc: # pylint: disable=W0718 + _dbg(f'run_build crashed: {exc}') + _dbg(traceback.format_exc()) + _send({'resp': 'build_done', 'exceptions': 1}) + state.pop('builder', None) + state.pop('commits', None) + return + _send({'resp': 'build_done', + 'exceptions': len(exceptions)}) + state.pop('builder', None) + state.pop('commits', None) + + +def _cmd_quit(state): + """Handle the 'quit' command + + Cleans up the work directory if auto-created, sends quit_ack, + then kills all child processes (make, cc1, etc.) and this process + via SIGKILL to the process group. + + Args: + state (dict): Worker state + """ + work_dir = state.get('work_dir', '') + if work_dir and state.get('auto_work_dir'): + shutil.rmtree(work_dir, ignore_errors=True) + _send({'resp': 'quit_ack'}) + _kill_group() + + +def run_worker(debug=False): + """Main worker loop + + Reads JSON commands from stdin and dispatches them. Sends responses + as 'BM> ' prefixed JSON lines on stdout. Builds run in parallel + using Builder with a _WorkerBuilderThread subclass. + + Args: + debug (bool): True to print debug messages to stderr + + Returns: + int: 0 on success, non-zero on error + """ + global _debug, _protocol_out # pylint: disable=W0603 + + _debug = debug + + # Save the real stdout for protocol messages, then redirect + # stdout to stderr so that tprint and other library output + # doesn't corrupt the JSON protocol on the SSH pipe. + _protocol_out = sys.stdout + sys.stdout = sys.stderr + + # Exit immediately on signals, killing all child processes. + # SIGHUP is sent by sshd when the SSH connection drops. + # _kill_group() sends SIGKILL to the process group which terminates + # everything including this process. + def _exit_handler(_signum, _frame): + _kill_group() + os._exit(1) # pylint: disable=W0212 + signal.signal(signal.SIGTERM, _exit_handler) + signal.signal(signal.SIGINT, _exit_handler) + signal.signal(signal.SIGHUP, _exit_handler) + + nthreads = _get_nthreads() + + # Scan for toolchains at startup so we can select the right + # cross-compiler for each board's architecture. The boss sets up + # the git repo and pushes source via SSH before starting us, so + # there is no 'setup' command — we are ready as soon as we start. + toolchains = toolchain_mod.Toolchains() + toolchains.get_settings(show_warning=False) + toolchains.scan(verbose=False, raise_on_error=False) + + _dbg(f'ready: {nthreads} threads') + _send({'resp': 'ready', 'nthreads': nthreads, 'slots': nthreads}) + + stop_event = threading.Event() + state = { + 'work_dir': os.getcwd(), + 'nthreads': nthreads, + 'toolchains': toolchains, + 'stop': stop_event, + } + + # Read stdin in a background thread so that EOF (boss + # disconnected) is detected even while a long-running command + # like build_boards is executing. When EOF is seen, kill the + # entire process group so that all child make processes die too. + cmd_queue = queue.Queue() + eof_sentinel = object() + + def _stdin_reader(): + while True: + line = sys.stdin.readline() + if not line: + # Boss disconnected — kill everything immediately. + # _kill_group() handles production (kills the process + # group); stop_event handles tests (where _kill_group + # is a no-op) by unblocking build threads. + _dbg('stdin closed, killing group') + stop_event.set() + _kill_group() + cmd_queue.put(eof_sentinel) + return + line = line.strip() + if line: + cmd_queue.put(line) + + threading.Thread(target=_stdin_reader, daemon=True).start() + + return _dispatch_commands(cmd_queue, eof_sentinel, state) + + +def _dispatch_commands(cmd_queue, eof_sentinel, state): + """Read commands from the queue and dispatch them + + Args: + cmd_queue (queue.Queue): Queue of JSON command strings + eof_sentinel (object): Sentinel value indicating stdin closed + state (dict): Worker state + + Returns: + int: 0 on clean quit, 1 on unexpected stdin close + """ + while True: + try: + line = cmd_queue.get(timeout=1) + except queue.Empty: + continue + if line is eof_sentinel: + break + + try: + req = json.loads(line) + except json.JSONDecodeError as exc: + _send_error(f'invalid JSON: {exc}') + continue + + cmd = req.get('cmd', '') + + if cmd == 'setup': + _cmd_setup(req, state) + elif cmd == 'configure': + _cmd_configure(req, state) + elif cmd == 'build_boards': + _cmd_build_boards(req, state) + elif cmd == 'build_prepare': + _cmd_build_prepare(req, state) + elif cmd == 'build_board': + _cmd_build_board(req, state) + elif cmd == 'build_done': + _cmd_build_done(state) + elif cmd == 'quit': + _cmd_quit(state) + return 0 + else: + _send_error(f'unknown command: {cmd}') + + # stdin closed without quit — boss was interrupted + return 1 + + +def do_worker(debug=False): + """Entry point for 'buildman --worker' + + Args: + debug (bool): True to print debug messages to stderr + + Returns: + int: 0 on success + """ + global _is_group_leader # pylint: disable=W0603 + + # Ensure we are a process group leader so _kill_group() can kill + # all child processes (make, cc1, as, ld) on exit. When launched + # via SSH, sshd already makes us session + group leader (pid == + # pgid), so setpgrp() fails with EPERM — that's fine. This is + # done here rather than in run_worker() so that tests can call + # run_worker() without becoming a process group leader. + try: + os.setpgrp() + except OSError: + pass + _is_group_leader = os.getpid() == os.getpgrp() + return run_worker(debug) -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add the boss side of the distributed build protocol. The RemoteWorker class manages a persistent SSH connection to a remote buildman worker, providing methods to set up a work directory, push source via git, send build commands and receive results. WorkerPool manages parallel worker startup, source push, build settings distribution, demand-driven board dispatch and result collection. Each worker gets boards from a shared pool as it finishes previous ones, so faster workers naturally get more work. Features include capacity-weighted board distribution (nthreads * bogomips), per-worker log files, build timeouts via reader threads, transfer statistics, and clean shutdown on Ctrl-C via two-phase quit. Tests are included in test_boss.py covering protocol handling, board dispatch, worker lifecycle and error paths. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/boss.py | 1507 ++++++++++++++++++ tools/buildman/main.py | 2 + tools/buildman/test_boss.py | 2645 ++++++++++++++++++++++++++++++++ tools/buildman/test_machine.py | 62 +- 4 files changed, 4195 insertions(+), 21 deletions(-) create mode 100644 tools/buildman/boss.py create mode 100644 tools/buildman/test_boss.py diff --git a/tools/buildman/boss.py b/tools/buildman/boss.py new file mode 100644 index 00000000000..b1d15fec28f --- /dev/null +++ b/tools/buildman/boss.py @@ -0,0 +1,1507 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> +# pylint: disable=C0302 + +"""Boss side of the distributed build protocol + +Manages SSH connections to remote workers and communicates using the JSON-lines +protocol defined in worker.py. Each RemoteWorker wraps a persistent SSH process +whose stdin/stdout carry the protocol messages. + +Typical usage: + w = RemoteWorker('myhost', buildman_path='buildman') + w.start() # launches ssh, waits for 'ready' + w.setup() # worker creates git repo + w.push_source(git_dir, ref) # git push to the worker's repo + w.build('sandbox', commit='abc123', commit_upto=0) + result = w.recv() # {'resp': 'build_result', ...} + w.quit() +""" + +import datetime +import json +import os +import queue +import subprocess +import sys +import threading +import time + +from buildman import builderthread +from buildman import worker as worker_mod +from u_boot_pylib import command +from u_boot_pylib import tools +from u_boot_pylib import tout + +# SSH options shared with machine.py +SSH_OPTS = [ + '-o', 'BatchMode=yes', + '-o', 'StrictHostKeyChecking=accept-new', +] + +# Per-build timeout in seconds. If a worker doesn't respond within this +# time, the boss assumes the worker is dead or hung and stops using it. +BUILD_TIMEOUT = 300 + +# Interval in seconds between status summaries in the boss log +STATUS_INTERVAL = 60 + + +class BossError(Exception): + """Error communicating with a remote worker""" + + +class WorkerBusy(BossError): + """Worker machine is already in use by another boss""" + + +def _run_ssh(hostname, remote_cmd, timeout=10): + """Run a one-shot SSH command on a remote host + + Args: + hostname (str): SSH hostname + remote_cmd (str): Shell command to run on the remote host + timeout (int): SSH connect timeout in seconds + + Returns: + str: stdout from the command + + Raises: + BossError: if the command fails + """ + ssh_cmd = [ + 'ssh', + '-o', f'ConnectTimeout={timeout}', + ] + SSH_OPTS + [hostname, '--', remote_cmd] + try: + result = command.run_pipe( + [ssh_cmd], capture=True, capture_stderr=True, + raise_on_error=True) + return result.stdout.strip() if result.stdout else '' + except command.CommandExc as exc: + raise BossError(f'SSH command failed on {hostname}: {exc}') from exc + + +def kill_workers(machines): + """Kill stale worker processes and remove lock files on remote machines + + Connects to each machine via SSH, kills any running worker processes + and removes the lock file. Useful for cleaning up after a failed or + interrupted distributed build. + + Args: + machines (list of str): SSH hostnames to clean up + + Returns: + int: 0 on success + """ + + results = {} + lock = threading.Lock() + + def _kill_one(hostname): + kill_script = ('pids=$(pgrep -f "[p]ython3.*--worker" 2>/dev/null); ' + 'if [ -n "$pids" ]; then ' + ' kill $pids 2>/dev/null; ' + ' echo "killed $pids"; ' + 'else ' + ' echo "no workers"; ' + 'fi; ' + 'rm -f ~/dev/.bm-worker/.lock' + ) + try: + output = _run_ssh(hostname, kill_script) + with lock: + results[hostname] = output + except BossError as exc: + with lock: + results[hostname] = f'FAILED: {exc}' + + threads = [] + for hostname in machines: + thr = threading.Thread(target=_kill_one, args=(hostname,)) + thr.start() + threads.append(thr) + for thr in threads: + thr.join() + + for hostname, output in sorted(results.items()): + print(f' {hostname}: {output}') + return 0 + + +class RemoteWorker: # pylint: disable=R0902 + """Manages one SSH connection to a remote buildman worker + + The startup sequence is: + 1. init_git() - create a bare git repo on the remote via one-shot SSH + 2. push_source() - git push the local tree to the remote repo + 3. start() - launch the worker from the pushed tree + + This ensures the worker runs the same version of buildman as the boss. + + Attributes: + hostname (str): SSH hostname (user@host or just host) + nthreads (int): Number of build threads the worker reported + git_dir (str): Path to the worker's git directory + work_dir (str): Path to the worker's work directory + """ + + def __init__(self, hostname, timeout=10, name=None): + """Create a new remote worker connection + + Args: + hostname (str): SSH hostname + timeout (int): SSH connect timeout in seconds + name (str or None): Short display name, defaults to hostname + """ + self.hostname = hostname + self.name = name or hostname + self.timeout = timeout + self.nthreads = 0 + self.slots = 1 + self.max_boards = 0 + self.bogomips = 0.0 + self.git_dir = '' + self.work_dir = '' + self.toolchains = {} + self.closing = False + self.bytes_sent = 0 + self.bytes_recv = 0 + self._proc = None + self._stderr_lines = [] + self._stderr_thread = None + + def init_git(self, work_dir='~/dev/.bm-worker'): + """Ensure a git repo exists on the remote host via one-shot SSH + + Reuses an existing repo if present, so that subsequent pushes + only transfer the delta. Creates a lock file to prevent two + bosses from using the same worker simultaneously. A lock is + considered stale if no worker process is running. + + Args: + work_dir (str): Fixed path for the work directory + + Raises: + WorkerBusy: if another boss holds the lock + BossError: if the SSH command fails + """ + lock = f'{work_dir}/.lock' + init_script = (f'mkdir -p {work_dir} && ' + # Check for lock — stale if no worker process is running + f'if [ -f {lock} ]; then ' + f' if pgrep -f "[p]ython3.*--worker" >/dev/null 2>&1; then ' + f' echo BUSY; exit 0; ' + f' fi; ' + f' rm -f {lock}; ' + f'fi && ' + # Create lock and init git + f'date +%s > {lock} && ' + f'(test -d {work_dir}/.git || git init -q {work_dir}) && ' + f'git -C {work_dir} config ' + f'receive.denyCurrentBranch updateInstead && ' + f'echo {work_dir}' + ) + output = _run_ssh(self.hostname, init_script, self.timeout) + if not output: + raise BossError( + f'init_git on {self.hostname} returned no work directory') + last_line = output.splitlines()[-1].strip() + if last_line == 'BUSY': + raise WorkerBusy(f'{self.hostname} is busy (locked)') + self.work_dir = last_line + self.git_dir = os.path.join(self.work_dir, '.git') + + def start(self, debug=False): + """Start the worker from the pushed source tree + + Launches the worker using the buildman from the pushed git tree. + The source must already have been pushed via init_git() and + push_source(). + + A background thread forwards the worker's stderr to the boss's + stderr, prefixed with the machine name, so that debug messages + and errors are always visible. + + Args: + debug (bool): True to pass -D to the worker for tracebacks + + Raises: + BossError: if the SSH connection or worker startup fails + """ + if not self.work_dir: + raise BossError(f'No work_dir on {self.hostname} ' + f'(call init_git and push_source first)') + worker_cmd = 'python3 tools/buildman/main.py --worker' + if debug: + worker_cmd += ' -D' + ssh_cmd = [ + 'ssh', + '-o', f'ConnectTimeout={self.timeout}', + ] + SSH_OPTS + [ + self.hostname, '--', + f'cd {self.work_dir} && git checkout -qf work && ' + f'{worker_cmd}', + ] + try: + # pylint: disable=R1732 + self._proc = subprocess.Popen( + ssh_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, + stderr=subprocess.PIPE) + except OSError as exc: + raise BossError( + f'Failed to start SSH to {self.hostname}: {exc}') from exc + + # Forward worker stderr in a background thread so debug messages + # and errors are always visible + self._stderr_lines = [] + self._stderr_thread = threading.Thread( + target=self._forward_stderr, daemon=True) + self._stderr_thread.start() + + resp = self._recv() + if resp.get('resp') != 'ready': + self.close() + raise BossError( + f'Worker on {self.hostname} did not send ready: {resp}') + self.nthreads = resp.get('nthreads', 1) + self.slots = resp.get('slots', 1) + if not self.max_boards: + self.max_boards = self.nthreads + + def _forward_stderr(self): + """Forward worker stderr to boss stderr with machine name prefix + + Runs in a background thread. Saves lines for _get_stderr() too. + """ + try: + for raw in self._proc.stderr: + line = raw.decode('utf-8', errors='replace').rstrip('\n') + if line: + self._stderr_lines.append(line) + sys.stderr.write(f'[{self.name}] {line}\n') + sys.stderr.flush() + except (OSError, ValueError): + pass + + def _send(self, obj): + """Send a JSON command to the worker + + Args: + obj (dict): Command object to send + + Raises: + BossError: if the SSH process is not running + """ + if not self._proc or self._proc.poll() is not None: + raise BossError(f'Worker on {self.hostname} is not running') + line = json.dumps(obj) + '\n' + data = line.encode('utf-8') + self.bytes_sent += len(data) + self._proc.stdin.write(data) + self._proc.stdin.flush() + + def _recv(self): + """Read the next protocol response from the worker + + Reads lines from stdout, skipping any that don't start with the + 'BM> ' prefix (e.g. SSH banners). + + Returns: + dict: Parsed JSON response + + Raises: + BossError: if the worker closes the connection or sends bad data + """ + while True: + raw = self._proc.stdout.readline() + if not raw: + stderr = self._get_stderr() + raise BossError(f'Worker on {self.hostname} closed connection' + f'{": " + stderr if stderr else ""}') + self.bytes_recv += len(raw) + line = raw.decode('utf-8', errors='replace').rstrip('\n') + if line.startswith(worker_mod.RESPONSE_PREFIX): + payload = line[len(worker_mod.RESPONSE_PREFIX):] + try: + return json.loads(payload) + except json.JSONDecodeError as exc: + raise BossError( + f'Bad JSON from {self.hostname}: {exc}') from exc + + def _get_stderr(self): + """Get the last stderr line from the worker + + Waits briefly for the stderr forwarding thread to finish + collecting output, then returns the last non-empty line. + + Returns: + str: Last non-empty line of stderr, or empty string + """ + if hasattr(self, '_stderr_thread'): + self._stderr_thread.join(timeout=2) + for line in reversed(self._stderr_lines): + if line.strip(): + return line.strip() + return '' + + def push_source(self, local_git_dir, refspec): + """Push source code to the worker's git repo + + Uses 'git push' over SSH to send commits to the worker. + + Args: + local_git_dir (str): Path to local git directory + refspec (str): Git refspec to push (e.g. 'HEAD:refs/heads/work') + + Raises: + BossError: if the push fails + """ + if not self.git_dir: + raise BossError( + f'No git_dir on {self.hostname} (call init_git first)') + push_url = f'{self.hostname}:{self.git_dir}' + try: + command.run_pipe([['git', 'push', '--force', push_url, refspec]], + capture=True, capture_stderr=True, + raise_on_error=True, cwd=local_git_dir) + except command.CommandExc as exc: + raise BossError( + f'git push to {self.hostname} failed: {exc}') from exc + + def configure(self, settings): + """Send build settings to the worker + + Sends settings that affect how make is invoked (verbose, no_lto, + allow_missing, etc.). Must be called after start() and before + any build commands. + + Args: + settings (dict): Build settings, e.g.: + verbose_build (bool): Run make with V=1 + allow_missing (bool): Pass BINMAN_ALLOW_MISSING=1 + no_lto (bool): Pass NO_LTO=1 + reproducible_builds (bool): Pass SOURCE_DATE_EPOCH=0 + warnings_as_errors (bool): Pass KCFLAGS=-Werror + mrproper (bool): Run make mrproper before config + fallback_mrproper (bool): Retry with mrproper on failure + + Raises: + BossError: if the worker rejects the settings + """ + self._send({'cmd': 'configure', 'settings': settings}) + resp = self._recv() + if resp.get('resp') != 'configure_done': + raise BossError( + f'Worker on {self.hostname} rejected configure: {resp}') + + def build_boards(self, boards, commits): + """Send a build_boards command to the worker + + Tells the worker to build all boards for each commit. The + worker handles checkout scheduling, parallelism and -j + calculation internally. + + Args: + boards (list of dict): Board info dicts with keys: + board (str): Board target name + defconfig (str): Defconfig target + env (dict): Extra environment variables + commits (list): Commit hashes in order, or [None] for + current source + """ + self._send({ + 'cmd': 'build_boards', + 'boards': boards, + 'commits': commits, + }) + + def build_prepare(self, commits): + """Send a build_prepare command to the worker + + Creates the Builder and worktrees. Follow with build_board() + calls, then build_done(). + + Args: + commits (list): Commit hashes in order, or [None] for + current source + """ + self._send({'cmd': 'build_prepare', 'commits': commits, + 'max_boards': self.max_boards}) + + def build_board(self, board, arch): + """Send a build_board command to add one board to the worker + + Args: + board (str): Board target name + arch (str): Board architecture + """ + self._send({ + 'cmd': 'build_board', + 'board': board, + 'arch': arch, + }) + + def build_done(self): + """Tell the worker no more boards are coming""" + self._send({'cmd': 'build_done'}) + + def recv(self): + """Receive the next response from the worker + + Returns: + dict: Parsed JSON response + """ + return self._recv() + + def quit(self): + """Tell the worker to quit, remove the lock and close""" + try: + self._send({'cmd': 'quit'}) + resp = self._recv() + except BossError: + resp = {} + self.close() + self.remove_lock() + return resp + + def remove_lock(self): + """Remove the lock file from the remote machine""" + if self.work_dir: + try: + _run_ssh(self.hostname, + f'rm -f {self.work_dir}/.lock', self.timeout) + except BossError: + pass + + def close(self): + """Close the SSH connection + + Closes stdin first so SSH can flush any pending data (e.g. a + quit command) to the remote, then waits briefly for SSH to + exit on its own before terminating it. + """ + if self._proc: + try: + self._proc.stdin.close() + except OSError: + pass + try: + self._proc.wait(timeout=2) + except subprocess.TimeoutExpired: + self._proc.terminate() + try: + self._proc.wait(timeout=3) + except subprocess.TimeoutExpired: + self._proc.kill() + self._proc = None + + def __repr__(self): + status = 'running' if self._proc else 'stopped' + return (f'RemoteWorker({self.hostname}, ' + f'nthreads={self.nthreads}, {status})') + + def __del__(self): + self.close() + + +def _format_bytes(nbytes): + """Format a byte count as a human-readable string""" + if nbytes < 1024: + return f'{nbytes}B' + if nbytes < 1024 * 1024: + return f'{nbytes / 1024:.1f}KB' + return f'{nbytes / (1024 * 1024):.1f}MB' + + +class _BossLog: + """Central boss log for distributed builds + + Logs major events and periodic per-worker status summaries + to boss.log in the builder output directory. + """ + + def __init__(self, base_dir): + os.makedirs(base_dir, exist_ok=True) + path = os.path.join(base_dir, '.buildman.log') + # pylint: disable=R1732 + self._logf = open(path, 'w', encoding='utf-8') + self._lock = threading.Lock() + self._stats = {} + self._timer = None + self._closed = False + + def log(self, msg): + """Write a timestamped log entry""" + with self._lock: + if self._logf: + stamp = datetime.datetime.now().strftime('%H:%M:%S') + self._logf.write(f'{stamp} {msg}\n') + self._logf.flush() + + def init_worker(self, wrk): + """Register a worker for status tracking""" + with self._lock: + self._stats[wrk.name] = { + 'sent': 0, + 'recv': 0, + 'load_avg': 0.0, + 'nthreads': wrk.nthreads, + } + + def record_sent(self, wrk_name, count=1): + """Record boards sent to a worker""" + with self._lock: + if wrk_name in self._stats: + self._stats[wrk_name]['sent'] += count + + def record_recv(self, wrk_name, load_avg=0.0): + """Record a reply received from a worker""" + with self._lock: + if wrk_name in self._stats: + self._stats[wrk_name]['recv'] += 1 + self._stats[wrk_name]['load_avg'] = load_avg + + def log_status(self): + """Log a status summary for all workers""" + with self._lock: + parts = [] + total_load = 0.0 + total_threads = 0 + total_done = 0 + total_sent = 0 + for name, st in self._stats.items(): + nthreads = st['nthreads'] + cpu_pct = (st['load_avg'] / nthreads * 100 if nthreads else 0) + parts.append(f'{name}:done={st["recv"]}/{st["sent"]}' + f' cpu={cpu_pct:.0f}%') + total_load += st['load_avg'] + total_threads += nthreads + total_done += st['recv'] + total_sent += st['sent'] + total_cpu = (total_load / total_threads * 100 + if total_threads else 0) + parts.append(f'TOTAL:done={total_done}/{total_sent}' + f' cpu={total_cpu:.0f}%') + if self._logf: + stamp = datetime.datetime.now().strftime('%H:%M:%S') + self._logf.write(f'{stamp} STATUS {", ".join(parts)}\n') + self._logf.flush() + + def start_timer(self): + """Start the periodic status timer""" + def _tick(): + if not self._closed: + self.log_status() + self._timer = threading.Timer(STATUS_INTERVAL, _tick) + self._timer.daemon = True + self._timer.start() + self._timer = threading.Timer(STATUS_INTERVAL, _tick) + self._timer.daemon = True + self._timer.start() + + def close(self): + """Stop the timer and close the log file""" + self._closed = True + if self._timer: + self._timer.cancel() + self._timer = None + with self._lock: + if self._logf: + self._logf.close() + self._logf = None + + +def split_boards(board_selected, toolchains): + """Split boards between local and remote machines + + Boards whose architecture has a toolchain on at least one remote machine + are assigned to remote workers. The rest stay local. + + Args: + board_selected (dict): target_name -> Board for all selected boards + toolchains (dict): Architecture -> gcc path on remote machines. + Combined from all machines. + + Returns: + tuple: + dict: target_name -> Board for local builds + dict: target_name -> Board for remote builds + """ + remote_archs = set(toolchains.keys()) if toolchains else set() + local = {} + remote = {} + for name, brd in board_selected.items(): + if brd.arch in remote_archs: + remote[name] = brd + else: + local[name] = brd + return local, remote + + +def _write_remote_result(builder, resp, board_selected, hostname): + """Write a remote build result and update builder progress + + Creates the same directory structure and files that BuilderThread would + create for a local build, then calls builder.process_result() to + update the progress display. + + Args: + builder (Builder): Builder object + resp (dict): build_result response from a worker + board_selected (dict): target_name -> Board, for looking up + the Board object + hostname (str): Remote machine that built this board + """ + board = resp.get('board', '') + commit_upto = resp.get('commit_upto', 0) + return_code = resp.get('return_code', 1) + stderr = resp.get('stderr', '') + + build_dir = builder.get_build_dir(commit_upto, board) + builderthread.mkdir(build_dir, parents=True) + + tools.write_file(os.path.join(build_dir, 'done'), + f'{return_code}\n', binary=False) + + err_path = os.path.join(build_dir, 'err') + if stderr: + tools.write_file(err_path, stderr, binary=False) + elif os.path.exists(err_path): + os.remove(err_path) + + tools.write_file(os.path.join(build_dir, 'log'), + resp.get('stdout', ''), binary=False) + + sizes = resp.get('sizes', {}) + if sizes.get('raw'): + # Strip any header line (starts with 'text') in case the worker + # sends raw size output including the header + raw = sizes['raw'] + lines = raw.splitlines() + if lines and lines[0].lstrip().startswith('text'): + raw = '\n'.join(lines[1:]) + if raw.strip(): + tools.write_file(os.path.join(build_dir, 'sizes'), + raw, binary=False) + + # Update the builder's progress display + brd = board_selected.get(board) + if brd: + result = command.CommandResult(stderr=stderr, return_code=return_code) + result.brd = brd + result.commit_upto = commit_upto + result.already_done = False + result.kconfig_reconfig = False + result.remote = hostname + builder.process_result(result) + + +class DemandState: # pylint: disable=R0903 + """Mutable state for a demand-driven worker build + + Tracks how many boards have been sent, received and are in-flight + for a single worker during demand-driven dispatch. + + Attributes: + sent: Total boards sent to the worker + in_flight: Boards currently being built (sent - completed) + expected: Total results expected (sent * ncommits) + received: Results received so far + board_results: Per-board result count (target -> int) + ncommits: Number of commits being built + grab_func: Callable(wrk, count) -> list of Board to get more + boards from the shared pool + """ + + def __init__(self, sent, ncommits, grab_func): + self.sent = sent + self.in_flight = sent + self.expected = sent * ncommits + self.received = 0 + self.board_results = {} + self.ncommits = ncommits + self.grab_func = grab_func + + +class _DispatchContext: + """Shared infrastructure for dispatching builds to workers + + Manages per-worker log files, worktree progress tracking, reader + threads, and result processing. Used by both _dispatch_jobs() and + _dispatch_demand() to avoid duplicating this infrastructure. + """ + + def __init__(self, workers, builder, board_selected, boss_log): + self.builder = builder + self.board_selected = board_selected + self.boss_log = boss_log + self._worktree_counts = {} + self._worktree_lock = threading.Lock() + + # Open a log file per worker + os.makedirs(builder.base_dir, exist_ok=True) + self.log_files = {} + for wrk in workers: + path = os.path.join(builder.base_dir, f'worker-{wrk.name}.log') + self.log_files[wrk] = open( # pylint: disable=R1732 + path, 'w', encoding='utf-8') + + def log(self, wrk, direction, msg): + """Write a timestamped entry to a worker's log file""" + logf = self.log_files.get(wrk) + if logf: + stamp = datetime.datetime.now().strftime('%H:%M:%S') + logf.write(f'{stamp} {direction} {msg}\n') + logf.flush() + + def update_progress(self, resp, wrk): + """Handle worktree progress messages from a worker + + Args: + resp (dict): Response from the worker + wrk (RemoteWorker): Worker that sent the response + + Returns: + bool: True if the response was a progress message + """ + resp_type = resp.get('resp') + if resp_type == 'build_started': + with self._worktree_lock: + num = resp.get('num_threads', wrk.nthreads) + self._worktree_counts[wrk.name] = (self._worktree_counts.get( + wrk.name, (0, num))[0], num) + return True + if resp_type == 'worktree_created': + with self._worktree_lock: + done, total = self._worktree_counts.get( + wrk.name, (0, wrk.nthreads)) + self._worktree_counts[wrk.name] = (done + 1, total) + self._refresh_progress() + return True + return False + + def _refresh_progress(self): + """Update the builder's progress string from worktree counts""" + with self._worktree_lock: + parts = [] + for name, (done, total) in sorted(self._worktree_counts.items()): + if done < total: + parts.append(f'{name} {done}/{total}') + self.builder.progress = ', '.join(parts) + if self.builder.progress: + self.builder.process_result(None) + + def start_reader(self, wrk): + """Start a background reader thread for a worker + + Returns: + queue.Queue: Queue that receives (status, value) tuples + """ + recv_q = queue.Queue() + + def _reader(): + while True: + try: + resp = wrk.recv() + recv_q.put(('ok', resp)) + except BossError as exc: + recv_q.put(('error', exc)) + break + except Exception: # pylint: disable=W0718 + recv_q.put(('error', BossError( + f'Worker on {wrk.name} connection lost'))) + break + + threading.Thread(target=_reader, daemon=True).start() + return recv_q + + def recv(self, wrk, recv_q): + """Get next response from queue with timeout + + Returns: + dict or None: Response, or None on error + """ + try: + status, val = recv_q.get(timeout=BUILD_TIMEOUT) + except queue.Empty: + self.log(wrk, '!!', f'Worker timed out after {BUILD_TIMEOUT}s') + if not wrk.closing: + print(f'\n Error from {wrk.name}: timed out') + return None + if status == 'error': + self.log(wrk, '!!', str(val)) + if not wrk.closing: + print(f'\n Error from {wrk.name}: {val}') + return None + resp = val + self.log(wrk, '<<', json.dumps(resp, separators=(',', ':'))) + if resp.get('resp') == 'error': + if not wrk.closing: + print(f'\n Worker error on {wrk.name}: ' + f'{resp.get("msg", "unknown")}') + return None + return resp + + def write_result(self, wrk, resp): + """Write a build result and update progress + + Returns: + bool: True on success, False on error + """ + if self.boss_log: + self.boss_log.record_recv(wrk.name, resp.get('load_avg', 0.0)) + try: + _write_remote_result( + self.builder, resp, self.board_selected, wrk.name) + except Exception as exc: # pylint: disable=W0718 + self.log(wrk, '!!', f'unexpected: {exc}') + print(f'\n Unexpected error on {wrk.name}: {exc}') + return False + return True + + def wait_for_prepare(self, wrk, recv_q): + """Wait for build_prepare_done, handling progress messages + + Returns: + bool: True if prepare succeeded + """ + while True: + resp = self.recv(wrk, recv_q) + if resp is None: + return False + if self.update_progress(resp, wrk): + continue + resp_type = resp.get('resp') + if resp_type == 'build_prepare_done': + return True + if resp_type == 'heartbeat': + continue + self.log(wrk, '!!', f'unexpected during prepare: {resp_type}') + return False + + @staticmethod + def send_batch(wrk, boards): + """Send a batch of boards to a worker + + Returns: + int: Number of boards sent, or -1 on error + """ + for brd in boards: + try: + wrk.build_board(brd.target, brd.arch) + except BossError: + return -1 + return len(boards) + + def collect_results(self, wrk, recv_q, state): + """Collect results and send more boards as threads free up + + Args: + wrk (RemoteWorker): Worker to collect from + recv_q (queue.Queue): Response queue from start_reader() + state (DemandState): Mutable build state for this worker + """ + while state.received < state.expected: + resp = self.recv(wrk, recv_q) + if resp is None: + return False + resp_type = resp.get('resp') + if resp_type == 'heartbeat': + continue + if resp_type == 'build_done': + return True + if resp_type != 'build_result': + continue + + if not self.write_result(wrk, resp): + return False + state.received += 1 + + target = resp.get('board') + results = state.board_results + results[target] = results.get(target, 0) + 1 + if results[target] == state.ncommits: + state.in_flight -= 1 + if state.in_flight < wrk.max_boards: + more = state.grab_func(wrk, 1) + if more and self.send_batch(wrk, more) > 0: + state.sent += 1 + state.in_flight += 1 + state.expected += state.ncommits + if self.boss_log: + self.boss_log.record_sent( + wrk.name, state.ncommits) + return True + + def recv_one(self, wrk, recv_q): + """Receive one build result, skipping progress messages + + Returns: + bool: True to continue, False to stop this worker + """ + while True: + resp = self.recv(wrk, recv_q) + if resp is None: + return False + if self.update_progress(resp, wrk): + continue + resp_type = resp.get('resp') + if resp_type == 'heartbeat': + continue + if resp_type == 'build_done': + nexc = resp.get('exceptions', 0) + if nexc: + self.log(wrk, '!!', f'worker finished with {nexc} ' + f'thread exception(s)') + return False + if resp_type == 'build_result': + return self.write_result(wrk, resp) + return True + + def close(self): + """Close all log files and the boss log""" + for logf in self.log_files.values(): + logf.close() + if self.boss_log: + self.boss_log.log_status() + self.boss_log.log('dispatch: end') + self.boss_log.close() + + +class WorkerPool: + """Manages a pool of remote workers for distributed builds + + Handles starting workers, pushing source, distributing build jobs + and collecting results. + + Attributes: + workers (list of RemoteWorker): Active workers + """ + + def __init__(self, machines): + """Create a worker pool from available machines + + Args: + machines (list of Machine): Available machines from MachinePool + """ + self.workers = [] + self._machines = machines + self._boss_log = None + + def start_all(self, git_dir, refspec, debug=False, settings=None): + """Start workers on all machines + + Uses a three-phase approach so that each worker runs the same + version of buildman as the boss: + 1. Create git repos on all machines (parallel) + 2. Push source to all repos (parallel) + 3. Start workers from pushed source (parallel) + 4. Send build settings to all workers (parallel) + + Args: + git_dir (str): Local git directory to push + refspec (str): Git refspec to push + debug (bool): True to pass -D to workers for tracebacks + settings (dict or None): Build settings to send to workers + + Returns: + list of RemoteWorker: Workers that started successfully + """ + # Phase 1: init git repos + ready = self._run_parallel( + 'Preparing', self._machines, self._init_one) + + # Phase 2: push source + ready = self._run_parallel('Pushing source to', ready, + lambda wrk: wrk.push_source(git_dir, refspec)) + + # Phase 3: start workers + self.workers = self._run_parallel('Starting', ready, + lambda wrk: self._start_one(wrk, debug)) + + # Phase 4: send build settings + if settings and self.workers: + self._run_parallel('Configuring', self.workers, + lambda wrk: wrk.configure(settings)) + + return self.workers + + def _init_one(self, mach): + """Create a RemoteWorker and initialise its git repo + + Args: + mach: Machine object with hostname attribute + + Returns: + RemoteWorker: Initialised worker + """ + wrk = RemoteWorker(mach.hostname, name=mach.name) + wrk.toolchains = dict(mach.toolchains) + wrk.bogomips = mach.info.bogomips if mach.info else 0.0 + wrk.max_boards = mach.max_boards + wrk.init_git() + return wrk + + @staticmethod + def _start_one(wrk, debug=False): + """Start the worker process from the pushed tree + + Args: + wrk (RemoteWorker): Worker to start + debug (bool): True to pass -D to the worker + """ + wrk.start(debug=debug) + + def _run_parallel(self, label, items, func): + """Run a function on items in parallel, collecting successes + + Args: + label (str): Progress label (e.g. 'Pushing source to') + items (list): Items to process + func (callable): Function to call on each item. May return + a replacement item; if None is returned, the original + item is kept. + + Returns: + list: Items that succeeded (possibly replaced by func) + """ + lock = threading.Lock() + results = [] + done = [] + + def _run_one(item): + name = getattr(item, 'name', getattr(item, 'hostname', str(item))) + try: + replacement = func(item) + with lock: + results.append(replacement if replacement else item) + done.append(name) + tout.progress(f'{label} workers {len(done)}/' + f'{len(items)}: {", ".join(done)}') + except WorkerBusy: + with lock: + done.append(f'{name} (BUSY)') + tout.progress(f'{label} workers {len(done)}/' + f'{len(items)}: {", ".join(done)}') + except BossError as exc: + # Clean up lock if the worker was initialised + if hasattr(item, 'remove_lock'): + item.remove_lock() + with lock: + done.append(f'{name} (FAILED)') + tout.progress(f'{label} workers {len(done)}/' + f'{len(items)}: {", ".join(done)}') + print(f'\n Worker failed on {name}: {exc}') + + tout.progress(f'{label} workers on {len(items)} machines') + threads = [] + for item in items: + thr = threading.Thread(target=_run_one, args=(item,)) + thr.start() + threads.append(thr) + for thr in threads: + thr.join() + tout.clear_progress() + return results + + @staticmethod + def _get_capacity(wrk): + """Get a worker's build capacity score + + Uses nthreads * bogomips as the capacity metric. Falls back to + nthreads alone if bogomips is not available. + + Args: + wrk (RemoteWorker): Worker to score + + Returns: + float: Capacity score (higher is faster) + """ + bogo = wrk.bogomips if wrk.bogomips else 1.0 + return wrk.nthreads * bogo + + def _get_worker_for_arch(self, arch, assigned): + """Pick the next worker that supports a given architecture + + Distributes boards proportionally to each worker's capacity + (nthreads * bogomips). Picks the capable worker whose current + assignment is most below its fair share. + + Args: + arch (str): Board architecture (e.g. 'arm', 'aarch64') + assigned (dict): worker -> int count of boards assigned so far + + Returns: + RemoteWorker or None: A worker with the right toolchain + """ + if arch == 'sandbox': + capable = list(self.workers) + else: + capable = [w for w in self.workers if arch in w.toolchains] + if not capable: + return None + + total_cap = sum(self._get_capacity(w) for w in capable) + if not total_cap: + total_cap = len(capable) + + # Pick the worker with the lowest assigned / capacity ratio + best = min(capable, key=lambda w: (assigned.get(w, 0) / + (self._get_capacity(w) or 1))) + assigned[best] = assigned.get(best, 0) + 1 + return best + + def build_boards(self, board_selected, commits, builder, local_count=0): + """Build boards on remote workers and write results locally + + Uses demand-driven dispatch: boards are fed to workers from a + shared pool. Each worker gets one board per thread initially, + then one more each time a board completes. Faster workers + naturally get more boards. + + Args: + board_selected (dict): target_name -> Board to build remotely + commits (list of Commit or None): Commits to build + builder (Builder): Builder object for result processing + local_count (int): Number of boards being built locally + """ + if not self.workers or not board_selected: + return + + ncommits = max(1, len(commits)) if commits else 1 + + # Build a pool of boards that have work remaining + pool = list(board_selected.values()) + if not builder.force_build: + commit_range = range(len(commits)) if commits else range(1) + pool = [b for b in pool + if any(not os.path.exists( + builder.get_done_file(cu, b.target)) + for cu in commit_range)] + + # Filter boards that no worker can handle + capable_archs = set() + for wrk in self.workers: + capable_archs.update(wrk.toolchains.keys()) + capable_archs.add('sandbox') + skipped = [b for b in pool if b.arch not in capable_archs] + pool = [b for b in pool if b.arch in capable_archs] + if skipped: + builder.count -= len(skipped) * ncommits + + if not pool: + print('No remote jobs to dispatch') + return + + total_jobs = len(pool) * ncommits + nmach = len(self.workers) + if local_count: + nmach += 1 + parts = [f'{len(pool)} boards', f'{ncommits} commits {nmach} machines'] + print(f'Building {" x ".join(parts)} (demand-driven)') + + self._boss_log = _BossLog(builder.base_dir) + self._boss_log.log(f'dispatch: {len(self.workers)} workers, ' + f'{total_jobs} total jobs') + for wrk in self.workers: + self._boss_log.init_worker(wrk) + + self._dispatch_demand(pool, commits, builder, board_selected) + + @staticmethod + def _grab_boards(pool, pool_lock, wrk, count): + """Take up to count boards from pool that wrk can build + + Args: + pool (list of Board): Shared board pool (modified in place) + pool_lock (threading.Lock): Lock protecting the pool + wrk (RemoteWorker): Worker to match toolchains against + count (int): Maximum number of boards to take + + Returns: + list of Board: Boards taken from the pool + """ + with pool_lock: + batch = [] + remaining = [] + for brd in pool: + if len(batch) >= count: + remaining.append(brd) + elif (brd.arch == 'sandbox' + or brd.arch in wrk.toolchains): + batch.append(brd) + else: + remaining.append(brd) + pool[:] = remaining + return batch + + def _dispatch_jobs(self, worker_jobs, builder, board_selected): + """Send build jobs to workers and collect results + + Opens a log file per worker, then runs each worker's jobs + in a separate thread. + + Args: + worker_jobs (dict): worker -> list of (board, commit_upto, + commit) tuples + builder (Builder): Builder for result processing + board_selected (dict): target_name -> Board mapping + """ + ctx = _DispatchContext(worker_jobs.keys(), builder, board_selected, + self._boss_log) + + if ctx.boss_log: + ctx.boss_log.start_timer() + + threads = [] + for wrk, wjobs in worker_jobs.items(): + thr = threading.Thread( + target=self._run_batch_worker, args=(wrk, wjobs, ctx), + daemon=True) + thr.start() + threads.append(thr) + for thr in threads: + thr.join() + + ctx.close() + self._boss_log = None + + @staticmethod + def _run_batch_worker(wrk, wjobs, ctx): + """Send build commands to one worker and collect results + + Args: + wrk (RemoteWorker): Worker to run + wjobs (list): List of (board, commit_upto, commit) tuples + ctx (_DispatchContext): Shared dispatch infrastructure + """ + recv_q = ctx.start_reader(wrk) + + board_infos = {} + commit_list = [] + commit_set = set() + for brd, _, commit in wjobs: + target = brd.target + if target not in board_infos: + board_infos[target] = { + 'board': target, 'arch': brd.arch} + commit_hash = commit.hash if commit else None + if commit_hash not in commit_set: + commit_set.add(commit_hash) + commit_list.append(commit_hash) + + boards_list = list(board_infos.values()) + total = len(boards_list) * len(commit_list) + + ctx.log(wrk, '>>', f'{len(boards_list)} boards x ' + f'{len(commit_list)} commits') + if ctx.boss_log: + ctx.boss_log.log(f'{wrk.name}: {len(boards_list)} boards' + f' x {len(commit_list)} commits') + + try: + wrk.build_boards(boards_list, commit_list) + except BossError as exc: + ctx.log(wrk, '!!', str(exc)) + if not wrk.closing: + print(f'\n Error from {wrk.name}: {exc}') + return + if ctx.boss_log: + ctx.boss_log.record_sent(wrk.name, total) + + for _ in range(total): + if not ctx.recv_one(wrk, recv_q): + return + + def _start_demand_worker( # pylint: disable=R0913 + self, wrk, ctx, commit_list, ncommits, pool, pool_lock): + """Prepare a worker and send the initial batch of boards + + Args: + wrk (RemoteWorker): Worker to run + ctx (_DispatchContext): Shared dispatch infrastructure + commit_list (list of str): Commit hashes to build + ncommits (int): Number of commits + pool (list of Board): Shared board pool + pool_lock (threading.Lock): Lock protecting the pool + + Returns: + tuple: (recv_q, state) on success, or (None, None) if the + worker failed during prepare or had no boards to build + """ + recv_q = ctx.start_reader(wrk) + + try: + wrk.build_prepare(commit_list) + except BossError as exc: + ctx.log(wrk, '!!', str(exc)) + if not wrk.closing: + print(f'\n Error from {wrk.name}: {exc}') + return None, None + + if not ctx.wait_for_prepare(wrk, recv_q): + return None, None + + initial = self._grab_boards(pool, pool_lock, wrk, wrk.max_boards) + if not initial: + try: + wrk.build_done() + except BossError: + pass + return None, None + + count = ctx.send_batch(wrk, initial) + if count < 0: + return None, None + if ctx.boss_log: + ctx.boss_log.record_sent(wrk.name, count * ncommits) + ctx.log(wrk, '>>', f'{count} boards (initial,' + f' max_boards={wrk.max_boards})') + + def _grab(w, n): + return self._grab_boards(pool, pool_lock, w, n) + + state = DemandState(count, ncommits, _grab) + return recv_q, state + + @staticmethod + def _finish_demand_worker(wrk, ctx, recv_q, state): + """Collect results and finish the demand-driven protocol + + Args: + wrk (RemoteWorker): Worker to collect from + ctx (_DispatchContext): Shared dispatch infrastructure + recv_q (queue.Queue): Response queue from start_reader() + state (DemandState): Build state from _start_demand_worker() + """ + ctx.collect_results(wrk, recv_q, state) + + ctx.log(wrk, '>>', f'{state.sent} boards total') + try: + wrk.build_done() + except BossError as exc: + ctx.log(wrk, '!!', str(exc)) + return + + # Wait for worker's build_done + while True: + resp = ctx.recv(wrk, recv_q) + if resp is None: + return + if resp.get('resp') == 'build_done': + break + + def _dispatch_demand(self, pool, commits, builder, board_selected): + """Dispatch boards on demand from a shared pool + + Each worker gets boards from the pool as it finishes previous + ones, so faster workers naturally get more work. + + Args: + pool (list of Board): Boards available to build + commits (list of Commit or None): Commits to build + builder (Builder): Builder for result processing + board_selected (dict): target_name -> Board mapping + """ + commit_list = [c.hash if c else None for c in (commits or [None])] + ncommits = len(commit_list) + pool_lock = threading.Lock() + + ctx = _DispatchContext( + self.workers, builder, board_selected, self._boss_log) + + if ctx.boss_log: + ctx.boss_log.start_timer() + + def _run_worker(wrk): + recv_q, state = self._start_demand_worker( + wrk, ctx, commit_list, ncommits, pool, pool_lock) + if recv_q is not None: + self._finish_demand_worker(wrk, ctx, recv_q, state) + + threads = [] + for wrk in self.workers: + thr = threading.Thread(target=_run_worker, args=(wrk,), + daemon=True) + thr.start() + threads.append(thr) + for thr in threads: + thr.join() + + ctx.close() + self._boss_log = None + + def quit_all(self): + """Quit all workers gracefully""" + self.print_transfer_summary() + if self._boss_log: + self._boss_log.log('quit: shutting down') + self._boss_log.log_status() + self._boss_log.close() + self._boss_log = None + for wrk in self.workers: + try: + wrk.quit() + except BossError: + wrk.close() + self.workers = [] + + def print_transfer_summary(self): + """Print data transfer summary for all workers""" + if not self.workers: + return + total_sent = 0 + total_recv = 0 + parts = [] + for wrk in self.workers: + sent = getattr(wrk, 'bytes_sent', 0) + recv = getattr(wrk, 'bytes_recv', 0) + total_sent += sent + total_recv += recv + name = getattr(wrk, 'name', '?') + parts.append(f'{name}:' + f'{_format_bytes(sent)}/' + f'{_format_bytes(recv)}') + sys.stderr.write(f'\nTransfer (sent/recv): {", ".join(parts)}' + f' total:{_format_bytes(total_sent)}/' + f'{_format_bytes(total_recv)}\n') + sys.stderr.flush() + + def close_all(self): + """Stop all workers immediately + + Use this on Ctrl-C. Sends a quit command to all workers first, + then waits briefly for the commands to travel through SSH + before closing the connections. This two-phase approach avoids + a race where closing SSH kills the connection before the quit + command is forwarded to the remote worker. + """ + self.print_transfer_summary() + if self._boss_log: + self._boss_log.log('interrupted: Ctrl-C') + self._boss_log.log_status() + self._boss_log.close() + self._boss_log = None + + # Suppress error messages from reader threads + for wrk in self.workers: + wrk.closing = True + + # Phase 1: send quit to all workers + for wrk in self.workers: + try: + wrk._send({'cmd': 'quit'}) # pylint: disable=W0212 + except BossError: + pass + + # Brief pause so SSH can forward the quit commands to the + # remote workers before we tear down the connections + time.sleep(0.5) + + # Phase 2: close all connections + for wrk in self.workers: + wrk.close() + wrk.remove_lock() + self.workers = [] diff --git a/tools/buildman/main.py b/tools/buildman/main.py index 225e341fc26..b9779c38408 100755 --- a/tools/buildman/main.py +++ b/tools/buildman/main.py @@ -43,6 +43,7 @@ def run_tests(skip_net_tests, debug, verbose, args): from buildman import test_cfgutil from buildman import test_machine from buildman import test_worker + from buildman import test_boss test_name = args.terms and args.terms[0] or None if skip_net_tests: @@ -67,6 +68,7 @@ def run_tests(skip_net_tests, debug, verbose, args): test_builder.TestPrintBuildSummary, test_machine, test_worker, + test_boss, 'buildman.toolchain']) return (0 if result.wasSuccessful() else 1) diff --git a/tools/buildman/test_boss.py b/tools/buildman/test_boss.py new file mode 100644 index 00000000000..2f6710c1a58 --- /dev/null +++ b/tools/buildman/test_boss.py @@ -0,0 +1,2645 @@ +# SPDX-License-Identifier: GPL-2.0+ +# Copyright 2026 Simon Glass <sjg@chromium.org> + +"""Tests for the boss module""" + +# pylint: disable=C0302,E1101,W0212,W0612 + +import io +import json +import os +import queue +import random +import shutil +import subprocess +import tempfile +import threading +import time +import types +import unittest +from unittest import mock + +from u_boot_pylib import command +from u_boot_pylib import terminal +from u_boot_pylib import tools + +from buildman import boss +from buildman import bsettings +from buildman import machine +from buildman import worker as worker_mod + + +def _make_response(obj): + """Create a BM>-prefixed response line as bytes""" + return (worker_mod.RESPONSE_PREFIX + json.dumps(obj) + '\n').encode() + + +class FakeProc: + """Fake subprocess.Popen for testing""" + + def __init__(self, responses=None): + self.stdin = io.BytesIO() + self._responses = responses or [] + self._resp_idx = 0 + self.stdout = self + self.stderr = io.BytesIO(b'') + self._returncode = None + + def readline(self): + """Return the next canned response line""" + if self._resp_idx < len(self._responses): + line = self._responses[self._resp_idx] + self._resp_idx += 1 + return line + return b'' + + def poll(self): + """Return the process return code""" + return self._returncode + + def terminate(self): + """Simulate SIGTERM""" + self._returncode = -15 + + def kill(self): + """Simulate SIGKILL""" + self._returncode = -9 + + def wait(self, timeout=None): + """Wait for the process (no-op)""" + + +class TestRunSsh(unittest.TestCase): + """Test _run_ssh()""" + + @mock.patch('buildman.boss.command.run_pipe') + def test_success(self, mock_pipe): + """Test successful one-shot SSH command""" + mock_pipe.return_value = mock.Mock( + stdout='/tmp/bm-worker-abc\n') + result = boss._run_ssh('host1', 'echo hello') + self.assertEqual(result, '/tmp/bm-worker-abc') + + @mock.patch('buildman.boss.command.run_pipe') + def test_failure(self, mock_pipe): + """Test SSH command failure""" + mock_pipe.side_effect = command.CommandExc( + 'connection refused', command.CommandResult()) + with self.assertRaises(boss.BossError) as ctx: + boss._run_ssh('host1', 'echo hello') + self.assertIn('SSH command failed', str(ctx.exception)) + + +class TestKillWorkers(unittest.TestCase): + """Test kill_workers()""" + + @mock.patch('buildman.boss._run_ssh') + def test_kill_workers(self, mock_ssh): + """Test killing workers on multiple machines""" + mock_ssh.side_effect = ['killed 1234', 'no workers'] + with terminal.capture(): + result = boss.kill_workers(['host1', 'host2']) + self.assertEqual(result, 0) + self.assertEqual(mock_ssh.call_count, 2) + + @mock.patch('buildman.boss._run_ssh') + def test_kill_workers_ssh_failure(self, mock_ssh): + """Test that SSH failures are reported but do not abort""" + mock_ssh.side_effect = boss.BossError('connection refused') + with terminal.capture(): + result = boss.kill_workers(['host1']) + self.assertEqual(result, 0) + + +class TestRemoteWorkerInitGit(unittest.TestCase): + """Test RemoteWorker.init_git()""" + + @mock.patch('buildman.boss._run_ssh') + def test_init_git(self, mock_ssh): + """Test successful git init""" + mock_ssh.return_value = '/tmp/bm-worker-abc' + w = boss.RemoteWorker('host1') + w.init_git() + self.assertEqual(w.work_dir, '/tmp/bm-worker-abc') + self.assertEqual(w.git_dir, '/tmp/bm-worker-abc/.git') + + @mock.patch('buildman.boss._run_ssh') + def test_init_git_busy(self, mock_ssh): + """Test init_git when machine is locked""" + mock_ssh.return_value = 'BUSY' + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.WorkerBusy) as ctx: + w.init_git() + self.assertIn('busy', str(ctx.exception)) + + @mock.patch('buildman.boss._run_ssh') + def test_init_git_empty_output(self, mock_ssh): + """Test init_git with empty output""" + mock_ssh.return_value = '' + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.BossError) as ctx: + w.init_git() + self.assertIn('returned no work directory', str(ctx.exception)) + + @mock.patch('buildman.boss._run_ssh') + def test_init_git_ssh_failure(self, mock_ssh): + """Test init_git when SSH fails""" + mock_ssh.side_effect = boss.BossError('connection refused') + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.BossError): + w.init_git() + + +def _make_result(board, commit_upto=0, return_code=0, + stderr='', stdout=''): + """Create a build_result response dict""" + return { + 'resp': 'build_result', + 'board': board, + 'commit_upto': commit_upto, + 'return_code': return_code, + 'stderr': stderr, + 'stdout': stdout, + } + + +def _make_builder(tmpdir, force_build=True): + """Create a mock Builder with base_dir set""" + builder = mock.Mock() + builder.force_build = force_build + builder.base_dir = tmpdir + builder.count = 0 + builder.get_build_dir.side_effect = ( + lambda c, b: os.path.join(tmpdir, b)) + return builder + + +def _start_worker(hostname, mock_popen, proc): + """Helper to create a worker with work_dir set and start it""" + mock_popen.return_value = proc + wrk = boss.RemoteWorker(hostname) + wrk.work_dir = '/tmp/bm-worker-123' + wrk.git_dir = '/tmp/bm-worker-123/.git' + wrk.start() + return wrk + + +class TestRemoteWorkerStart(unittest.TestCase): + """Test RemoteWorker.start()""" + + @mock.patch('subprocess.Popen') + def test_start_success(self, mock_popen): + """Test successful worker start""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 8}), + ]) + w = _start_worker('myhost', mock_popen, proc) + self.assertEqual(w.nthreads, 8) + + # Check SSH command runs from pushed tree + cmd = mock_popen.call_args[0][0] + self.assertIn('ssh', cmd) + self.assertIn('myhost', cmd) + self.assertIn('--worker', cmd[-1]) + + @mock.patch('subprocess.Popen') + def test_start_not_ready(self, mock_popen): + """Test start when worker sends unexpected response""" + proc = FakeProc([ + _make_response({'resp': 'error', 'msg': 'broken'}), + ]) + mock_popen.return_value = proc + + w = boss.RemoteWorker('badhost') + w.work_dir = '/tmp/bm-test' + with self.assertRaises(boss.BossError) as ctx: + w.start() + self.assertIn('did not send ready', str(ctx.exception)) + + @mock.patch('subprocess.Popen') + def test_start_ssh_failure(self, mock_popen): + """Test start when SSH fails to launch""" + mock_popen.side_effect = OSError('No such file') + + w = boss.RemoteWorker('badhost') + w.work_dir = '/tmp/bm-test' + with self.assertRaises(boss.BossError) as ctx: + w.start() + self.assertIn('Failed to start SSH', str(ctx.exception)) + + @mock.patch('subprocess.Popen') + def test_start_connection_closed(self, mock_popen): + """Test start when connection closes immediately""" + proc = FakeProc([]) # No responses + mock_popen.return_value = proc + + w = boss.RemoteWorker('deadhost') + w.work_dir = '/tmp/bm-test' + with self.assertRaises(boss.BossError) as ctx: + w.start() + self.assertIn('closed connection', str(ctx.exception)) + + def test_start_no_work_dir(self): + """Test start without init_git raises error""" + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.BossError) as ctx: + w.start() + self.assertIn('call init_git', str(ctx.exception)) + + @mock.patch('subprocess.Popen') + def test_max_boards_defaults_to_nthreads(self, mock_popen): + """Test max_boards defaults to nthreads when not configured""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 64}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertEqual(w.max_boards, 64) + + @mock.patch('subprocess.Popen') + def test_max_boards_preserved_when_set(self, mock_popen): + """Test max_boards keeps its configured value after start""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 256}), + ]) + mock_popen.return_value = proc + w = boss.RemoteWorker('host1') + w.work_dir = '/tmp/bm-test' + w.max_boards = 64 + w.start() + self.assertEqual(w.nthreads, 256) + self.assertEqual(w.max_boards, 64) + + +class TestRemoteWorkerPush(unittest.TestCase): + """Test RemoteWorker.push_source()""" + + def test_push_no_init(self): + """Test push before init_git raises error""" + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.BossError) as ctx: + w.push_source('/tmp/repo', 'HEAD:refs/heads/work') + self.assertIn('call init_git first', str(ctx.exception)) + + @mock.patch('buildman.boss.command.run_pipe') + def test_push_success(self, mock_pipe): + """Test successful git push""" + mock_pipe.return_value = mock.Mock(return_code=0) + w = boss.RemoteWorker('host1') + w.git_dir = '/tmp/bm-worker-123/.git' + + w.push_source('/home/user/u-boot', 'HEAD:refs/heads/work') + cmd = mock_pipe.call_args[0][0][0] + self.assertIn('git', cmd) + self.assertIn('push', cmd) + self.assertIn('host1:/tmp/bm-worker-123/.git', cmd) + self.assertIn('HEAD:refs/heads/work', cmd) + + @mock.patch('buildman.boss.command.run_pipe') + def test_push_failure(self, mock_pipe): + """Test git push failure""" + mock_pipe.side_effect = command.CommandExc( + 'push failed', command.CommandResult()) + w = boss.RemoteWorker('host1') + w.git_dir = '/tmp/bm/.git' + + with self.assertRaises(boss.BossError) as ctx: + w.push_source('/tmp/repo', 'HEAD:refs/heads/work') + self.assertIn('git push', str(ctx.exception)) + + +class TestRemoteWorkerBuildBoards(unittest.TestCase): + """Test RemoteWorker.build_boards() and recv()""" + + @mock.patch('subprocess.Popen') + def test_build_boards_and_recv(self, mock_popen): + """Test sending build_boards and receiving results""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 4}), + _make_response({ + 'resp': 'build_result', 'board': 'sandbox', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': '', + }), + ]) + w = _start_worker('host1', mock_popen, proc) + boards = [{'board': 'sandbox', 'defconfig': 'sandbox_defconfig', + 'env': {}}] + w.build_boards(boards, ['abc123']) + + # Check the command was sent + sent = proc.stdin.getvalue().decode() + obj = json.loads(sent) + self.assertEqual(obj['cmd'], 'build_boards') + self.assertEqual(len(obj['boards']), 1) + self.assertEqual(obj['boards'][0]['board'], 'sandbox') + self.assertEqual(obj['commits'], ['abc123']) + + # Receive result + resp = w.recv() + self.assertEqual(resp['resp'], 'build_result') + self.assertEqual(resp['board'], 'sandbox') + + +class TestRemoteWorkerQuit(unittest.TestCase): + """Test RemoteWorker.quit() and close()""" + + @mock.patch('buildman.boss._run_ssh') + @mock.patch('subprocess.Popen') + def test_quit(self, mock_popen, mock_ssh): + """Test clean quit removes the lock""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 4}), + _make_response({'resp': 'quit_ack'}), + ]) + w = _start_worker('host1', mock_popen, proc) + resp = w.quit() + self.assertEqual(resp.get('resp'), 'quit_ack') + self.assertIsNone(w._proc) + # Lock removal SSH should have been called + mock_ssh.assert_called_once() + self.assertIn('rm -f', mock_ssh.call_args[0][1]) + + @mock.patch('subprocess.Popen') + def test_close_without_quit(self, mock_popen): + """Test close without sending quit""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 4}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertIsNotNone(w._proc) + w.close() + self.assertIsNone(w._proc) + + def test_close_when_not_started(self): + """Test close on a worker that was never started""" + w = boss.RemoteWorker('host1') + w.close() # Should not raise + + +class TestRemoteWorkerRecv(unittest.TestCase): + """Test response parsing""" + + @mock.patch('subprocess.Popen') + def test_skip_non_protocol_lines(self, mock_popen): + """Test that non-BM> lines are skipped""" + proc = FakeProc([ + b'Welcome to myhost\n', + b'Last login: Mon Jan 1\n', + _make_response({'resp': 'ready', 'nthreads': 2}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertEqual(w.nthreads, 2) + + @mock.patch('subprocess.Popen') + def test_bad_json(self, mock_popen): + """Test bad JSON in protocol line""" + proc = FakeProc([ + (worker_mod.RESPONSE_PREFIX + 'not json\n').encode(), + ]) + mock_popen.return_value = proc + + w = boss.RemoteWorker('host1') + w._proc = proc + with self.assertRaises(boss.BossError) as ctx: + w._recv() + self.assertIn('Bad JSON', str(ctx.exception)) + + +class TestRemoteWorkerSend(unittest.TestCase): + """Test _send()""" + + def test_send_when_not_running(self): + """Test sending to a stopped worker""" + w = boss.RemoteWorker('host1') + with self.assertRaises(boss.BossError) as ctx: + w._send({'cmd': 'quit'}) + self.assertIn('not running', str(ctx.exception)) + + +class TestRemoteWorkerRepr(unittest.TestCase): + """Test __repr__""" + + def test_repr_stopped(self): + """Test repr when stopped""" + w = boss.RemoteWorker('host1') + self.assertIn('host1', repr(w)) + self.assertIn('stopped', repr(w)) + + @mock.patch('subprocess.Popen') + def test_repr_running(self, mock_popen): + """Test repr when running""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 8}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertIn('running', repr(w)) + self.assertIn('nthreads=8', repr(w)) + w.close() + + +class FakeBoard: # pylint: disable=R0903 + """Fake board for testing split_boards()""" + + def __init__(self, target, arch): + self.target = target + self.arch = arch + + +class TestSplitBoards(unittest.TestCase): + """Test split_boards()""" + + def test_all_local(self): + """Test when no remote toolchains match""" + boards = { + 'sandbox': FakeBoard('sandbox', 'sandbox'), + 'rpi': FakeBoard('rpi', 'arm'), + } + local, remote = boss.split_boards(boards, {'x86': '/usr/bin/gcc'}) + self.assertEqual(len(local), 2) + self.assertEqual(len(remote), 0) + + def test_all_remote(self): + """Test when all boards have remote toolchains""" + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + } + local, remote = boss.split_boards(boards, {'arm': '/usr/bin/gcc'}) + self.assertEqual(len(local), 0) + self.assertEqual(len(remote), 2) + + def test_mixed(self): + """Test split with some local, some remote""" + boards = { + 'sandbox': FakeBoard('sandbox', 'sandbox'), + 'rpi': FakeBoard('rpi', 'arm'), + 'qemu': FakeBoard('qemu', 'riscv'), + } + local, remote = boss.split_boards( + boards, {'arm': '/usr/bin/gcc', 'riscv': '/usr/bin/gcc'}) + self.assertEqual(len(local), 1) + self.assertIn('sandbox', local) + self.assertEqual(len(remote), 2) + self.assertIn('rpi', remote) + self.assertIn('qemu', remote) + + def test_empty_toolchains(self): + """Test with no remote toolchains""" + boards = {'sandbox': FakeBoard('sandbox', 'sandbox')} + local, remote = boss.split_boards(boards, {}) + self.assertEqual(len(local), 1) + self.assertEqual(len(remote), 0) + + def test_none_toolchains(self): + """Test with None toolchains""" + boards = {'sandbox': FakeBoard('sandbox', 'sandbox')} + local, remote = boss.split_boards(boards, None) + self.assertEqual(len(local), 1) + self.assertEqual(len(remote), 0) + + +class TestWriteRemoteResult(unittest.TestCase): + """Test _write_remote_result()""" + + def test_success(self): + """Test writing a successful build result""" + with tempfile.TemporaryDirectory() as tmpdir: + build_dir = os.path.join(tmpdir, 'sandbox') + builder = mock.Mock() + builder.get_build_dir.return_value = build_dir + + resp = { + 'resp': 'build_result', + 'board': 'sandbox', + 'commit_upto': 0, + 'return_code': 0, + 'stderr': '', + 'stdout': 'build output', + } + boss._write_remote_result(builder, resp, {}, 'host1') + + build_dir = builder.get_build_dir.return_value + self.assertTrue(os.path.isdir(build_dir)) + + self.assertEqual( + tools.read_file(os.path.join(build_dir, 'done'), + binary=False), '0\n') + self.assertEqual( + tools.read_file(os.path.join(build_dir, 'log'), + binary=False), 'build output') + self.assertFalse(os.path.exists( + os.path.join(build_dir, 'err'))) + + def test_failure_with_stderr(self): + """Test writing a failed build result with stderr""" + with tempfile.TemporaryDirectory() as tmpdir: + build_dir = os.path.join(tmpdir, 'rpi') + builder = mock.Mock() + builder.get_build_dir.return_value = build_dir + + resp = { + 'resp': 'build_result', + 'board': 'rpi', + 'commit_upto': 1, + 'return_code': 2, + 'stderr': 'error: undefined reference', + 'stdout': '', + } + boss._write_remote_result(builder, resp, {}, 'host1') + + build_dir = builder.get_build_dir.return_value + self.assertEqual( + tools.read_file(os.path.join(build_dir, 'done'), + binary=False), '2\n') + self.assertEqual( + tools.read_file(os.path.join(build_dir, 'err'), + binary=False), + 'error: undefined reference') + + def test_with_sizes(self): + """Test writing a result with size information""" + with tempfile.TemporaryDirectory() as tmpdir: + build_dir = os.path.join(tmpdir, 'sandbox') + builder = mock.Mock() + builder.get_build_dir.return_value = build_dir + + sizes_raw = (' text data bss dec hex\n' + ' 12345 1234 567 14146 374a\n') + resp = { + 'resp': 'build_result', + 'board': 'sandbox', + 'commit_upto': 0, + 'return_code': 0, + 'stderr': '', + 'stdout': '', + 'sizes': {'raw': sizes_raw}, + } + boss._write_remote_result(builder, resp, {}, 'host1') + + # Boss strips the header line from sizes + build_dir = builder.get_build_dir.return_value + self.assertEqual( + tools.read_file(os.path.join(build_dir, 'sizes'), + binary=False), + ' 12345 1234 567 14146 374a') + + +class FakeMachineInfo: # pylint: disable=R0903 + """Fake machine info for testing""" + bogomips = 5000.0 + + +class FakeMachine: # pylint: disable=R0903 + """Fake machine for testing WorkerPool""" + + def __init__(self, hostname): + self.hostname = hostname + self.name = hostname + self.toolchains = {'arm': '/usr/bin/arm-linux-gnueabihf-gcc'} + self.info = FakeMachineInfo() + self.max_boards = 0 + +def _make_worker(): + """Create a RemoteWorker with mocked subprocess for testing + + Uses __new__ to avoid calling __init__ which requires real SSH. + Sets all attributes to safe defaults. + """ + wrk = boss.RemoteWorker.__new__(boss.RemoteWorker) + wrk.hostname = 'host1' + wrk.name = 'host1' + wrk.nthreads = 4 + wrk.bogomips = 5000.0 + wrk.max_boards = 0 + wrk.slots = 2 + wrk.toolchains = {} + wrk._proc = mock.Mock() + wrk._proc.poll.return_value = None # process is running + wrk._log = None + wrk._closed = False + wrk._closing = False + wrk._stderr_buf = [] + wrk._stderr_thread = None + wrk._stderr_lines = [] + wrk._ready = queue.Queue() + wrk._lock_file = None + wrk._work_dir = '' + wrk._git_dir = '' + wrk.work_dir = '' + wrk.timeout = 10 + wrk.bytes_sent = 0 + wrk.bytes_recv = 0 + wrk.closing = False + return wrk + + +def _make_ctx(board_selected=None): + """Create a _DispatchContext with temp directory for testing + + Returns: + tuple: (ctx, wrk, tmpdir) — caller must call ctx.close() and + shutil.rmtree(tmpdir) + """ + wrk = mock.Mock(nthreads=4, closing=False, max_boards=0, + slots=2, toolchains={'arm': '/gcc'}) + wrk.name = 'host1' + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected=board_selected or {}, + boss_log=mock.Mock()) + return ctx, wrk, tmpdir + + +class TestWorkerPool(unittest.TestCase): + """Test WorkerPool""" + + @mock.patch('subprocess.Popen') + @mock.patch('buildman.boss._run_ssh') + @mock.patch('buildman.boss.command.run_pipe') + def test_start_all(self, mock_pipe, mock_ssh, mock_popen): + """Test starting workers on multiple machines""" + mock_ssh.return_value = '/tmp/bm-1' + mock_pipe.return_value = mock.Mock(return_code=0) + proc1 = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 4}), + ]) + proc2 = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 8}), + ]) + mock_popen.side_effect = [proc1, proc2] + + machines = [FakeMachine('host1'), FakeMachine('host2')] + pool = boss.WorkerPool(machines) + with terminal.capture(): + workers = pool.start_all('/tmp/repo', 'HEAD:refs/heads/work') + self.assertEqual(len(workers), 2) + + @mock.patch('subprocess.Popen') + @mock.patch('buildman.boss._run_ssh') + @mock.patch('buildman.boss.command.run_pipe') + def test_start_all_with_settings(self, mock_pipe, mock_ssh, mock_popen): + """Test that start_all sends settings via configure""" + mock_ssh.return_value = '/tmp/bm-1' + mock_pipe.return_value = mock.Mock(return_code=0) + proc1 = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 4}), + _make_response({'resp': 'configure_done'}), + ]) + proc2 = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 8}), + _make_response({'resp': 'configure_done'}), + ]) + mock_popen.side_effect = [proc1, proc2] + + machines = [FakeMachine('host1'), FakeMachine('host2')] + pool = boss.WorkerPool(machines) + settings = {'no_lto': True, 'allow_missing': True} + with terminal.capture(): + workers = pool.start_all('/tmp/repo', 'HEAD:refs/heads/work', + settings=settings) + self.assertEqual(len(workers), 2) + # Check that configure was sent (2nd response consumed) + self.assertEqual(proc1._resp_idx, 2) + self.assertEqual(proc2._resp_idx, 2) + + @mock.patch('buildman.boss._run_ssh') + def test_start_all_init_failure(self, mock_ssh): + """Test start_all when init_git fails on one machine""" + call_count = [0] + + def _side_effect(_hostname, _cmd, **_kwargs): + call_count[0] += 1 + if call_count[0] % 2 == 0: + raise boss.BossError('connection refused') + return '/tmp/bm-1' + + mock_ssh.side_effect = _side_effect + + machines = [FakeMachine('good'), FakeMachine('bad')] + pool = boss.WorkerPool(machines) + with terminal.capture(): + workers = pool.start_all('/tmp/repo', 'HEAD:refs/heads/work') + # Only 'good' survives init phase; push and start not reached + # for 'bad' + self.assertLessEqual(len(workers), 1) + + def test_quit_all(self): + """Test quitting all workers""" + pool = boss.WorkerPool([]) + w1 = mock.Mock(spec=boss.RemoteWorker) + w2 = mock.Mock(spec=boss.RemoteWorker) + pool.workers = [w1, w2] + with terminal.capture(): + pool.quit_all() + w1.quit.assert_called_once() + w2.quit.assert_called_once() + self.assertEqual(len(pool.workers), 0) + + def test_quit_all_with_error(self): + """Test quit_all when a worker raises BossError""" + pool = boss.WorkerPool([]) + w1 = mock.Mock(spec=boss.RemoteWorker) + w1.quit.side_effect = boss.BossError('connection lost') + pool.workers = [w1] + with terminal.capture(): + pool.quit_all() + w1.close.assert_called_once() + self.assertEqual(len(pool.workers), 0) + + def test_build_boards_empty(self): + """Test build_boards with no workers or boards""" + pool = boss.WorkerPool([]) + with terminal.capture(): + pool.build_boards({}, None, mock.Mock()) # Should not raise + + +class TestBuildBoards(unittest.TestCase): + """Test WorkerPool.build_boards() end-to-end""" + + def setUp(self): + self._tmpdir = tempfile.mkdtemp() + + def tearDown(self): + shutil.rmtree(self._tmpdir, ignore_errors=True) + + @staticmethod + def _make_worker(hostname, toolchains, responses): + """Create a mock RemoteWorker with canned responses + + Args: + hostname (str): Hostname for the worker + toolchains (dict): arch -> gcc path + responses (list of dict): Responses to return from recv() + + Returns: + Mock: Mock RemoteWorker + """ + wrk = mock.Mock(spec=boss.RemoteWorker) + wrk.hostname = hostname + wrk.name = hostname + wrk.toolchains = toolchains + wrk.nthreads = 4 + wrk.max_boards = 4 + wrk.bogomips = 5000.0 + wrk.slots = 4 + wrk.recv.side_effect = list(responses) + return wrk + + def _demand_responses(self, *results): + """Build recv responses for the demand-driven protocol + + Returns a list starting with build_prepare_done, followed by + the given results, and ending with build_done. + """ + return ([{'resp': 'build_prepare_done'}] + list(results) + + [{'resp': 'build_done'}]) + + def test_build_boards_success(self): + """Test building boards across workers with correct results""" + wrk1 = self._make_worker('host1', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses(_make_result('rpi'))) + wrk2 = self._make_worker('host2', + {'riscv': '/usr/bin/riscv64-gcc'}, + self._demand_responses(_make_result('odroid'))) + + pool = boss.WorkerPool([]) + pool.workers = [wrk1, wrk2] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'riscv'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # Both workers should have received build_prepare + wrk1.build_prepare.assert_called_once() + wrk2.build_prepare.assert_called_once() + + # Verify done files were written + for board in ['rpi', 'odroid']: + done_path = os.path.join(self._tmpdir, board, 'done') + self.assertTrue(os.path.exists(done_path)) + + def test_board_stays_on_same_worker(self): + """Test that all commits for a board go to the same worker""" + commits = [types.SimpleNamespace(hash=f'abc{i}') for i in range(3)] + + # Two workers with different archs, each gets one board + wrk1 = self._make_worker('host1', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + *[_make_result('rpi', commit_upto=i) + for i in range(3)])) + wrk2 = self._make_worker('host2', + {'riscv': '/usr/bin/riscv64-gcc'}, + self._demand_responses( + *[_make_result('odroid', commit_upto=i) + for i in range(3)])) + + pool = boss.WorkerPool([]) + pool.workers = [wrk1, wrk2] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'riscv'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, commits, builder) + + # Each worker gets one board via build_board + wrk1.build_board.assert_called_once() + wrk2.build_board.assert_called_once() + + # Check each worker got the right board + self.assertEqual(wrk1.build_board.call_args[0][0], 'rpi') + self.assertEqual(wrk2.build_board.call_args[0][0], 'odroid') + + def test_arch_passed(self): + """Test that the board's arch is sent to the worker""" + wrk = self._make_worker( + 'host1', + {'arm': '/usr/bin/arm-linux-gnueabihf-gcc'}, + self._demand_responses(_make_result('rpi'))) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'rpi': FakeBoard('rpi', 'arm')} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + wrk.build_board.assert_called_once() + # build_board(board, arch) + self.assertEqual(wrk.build_board.call_args[0][1], 'arm') + + def test_worker_error_response(self): + """Test that error responses are caught and stop the worker""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, [ + {'resp': 'error', 'msg': 'no work directory set up'}, + ]) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # build_prepare sent, error on first recv stops the worker + self.assertTrue(wrk.build_prepare.called) + + def test_boss_error_stops_worker(self): + """Test that BossError from recv() stops the worker""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, []) + wrk.recv.side_effect = boss.BossError('connection lost') + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # build_prepare sent, but only one recv attempted before error + self.assertTrue(wrk.build_prepare.called) + self.assertEqual(wrk.recv.call_count, 1) + + def test_toolchain_matching(self): + """Test boards only go to workers with the right toolchain""" + wrk_arm = self._make_worker( + 'arm-host', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + _make_result('rpi'), + _make_result('odroid'), + )) + wrk_riscv = self._make_worker( + 'rv-host', {'riscv': '/usr/bin/riscv64-gcc'}, + self._demand_responses( + _make_result('qemu_rv'), + )) + + pool = boss.WorkerPool([]) + pool.workers = [wrk_arm, wrk_riscv] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + 'qemu_rv': FakeBoard('qemu_rv', 'riscv'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # arm boards go to wrk_arm, riscv to wrk_riscv + arm_boards = {call[0][0] + for call in wrk_arm.build_board.call_args_list} + rv_boards = {call[0][0] + for call in wrk_riscv.build_board.call_args_list} + self.assertEqual(arm_boards, {'rpi', 'odroid'}) + self.assertEqual(rv_boards, {'qemu_rv'}) + + def test_sandbox_any_worker(self): + """Test that sandbox boards can go to any worker""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + _make_result('sandbox'), + )) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'sandbox': FakeBoard('sandbox', 'sandbox')} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # sandbox should be sent to the worker even though it has + # no 'sandbox' toolchain — sandbox uses the host compiler + wrk.build_board.assert_called_once() + self.assertEqual(wrk.build_board.call_args[0][1], 'sandbox') + + def test_skip_done_boards(self): + """Test that already-done boards are skipped without force""" + # Create a done file for 'rpi' + done_path = os.path.join(self._tmpdir, 'rpi_done') + tools.write_file(done_path, '0\n', binary=False) + + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + _make_result('odroid'), + )) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + } + builder = _make_builder(self._tmpdir, force_build=False) + builder.get_done_file.side_effect = ( + lambda c, b: os.path.join(self._tmpdir, f'{b}_done')) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # Only odroid should be built (rpi has done file) + wrk.build_board.assert_called_once() + self.assertEqual(wrk.build_board.call_args[0][0], 'odroid') + + def test_no_capable_worker(self): + """Test boards with no capable worker are silently skipped""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, + [{'resp': 'build_prepare_done'}]) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'qemu_rv': FakeBoard('qemu_rv', 'riscv')} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # No build_board should be sent (no riscv worker) + wrk.build_board.assert_not_called() + + def test_progress_updated(self): + """Test that process_result is called for each build result""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + _make_result('rpi'), + _make_result('odroid'), + )) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + brd_rpi = FakeBoard('rpi', 'arm') + brd_odroid = FakeBoard('odroid', 'arm') + boards = {'rpi': brd_rpi, 'odroid': brd_odroid} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # process_result should be called for each board + self.assertEqual(builder.process_result.call_count, 2) + # Each result should have remote set to hostname + for call in builder.process_result.call_args_list: + result = call[0][0] + self.assertEqual(result.remote, 'host1') + + def test_log_files_created(self): + """Test that worker log files are created in the output dir""" + wrk = self._make_worker( + 'myhost', {'arm': '/usr/bin/arm-gcc'}, + self._demand_responses( + _make_result('rpi'), + )) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'rpi': FakeBoard('rpi', 'arm')} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + log_path = os.path.join(self._tmpdir, 'worker-myhost.log') + self.assertTrue(os.path.exists(log_path)) + content = tools.read_file(log_path, binary=False) + self.assertIn('>> 1 boards', content) + self.assertIn('<< ', content) + self.assertIn('build_result', content) + + def test_heartbeat_resets_timeout(self): + """Test that heartbeat messages are accepted without error""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, [ + {'resp': 'build_prepare_done'}, + {'resp': 'heartbeat', 'board': 'rpi', 'thread': 0}, + _make_result('rpi'), + {'resp': 'build_done'}, + ]) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'rpi': FakeBoard('rpi', 'arm')} + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # The heartbeat should be silently consumed, result processed + self.assertEqual(builder.process_result.call_count, 1) + + def test_build_done_stops_worker(self): + """Test that build_done ends collection without timeout""" + wrk = self._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, [ + {'resp': 'build_prepare_done'}, + _make_result('rpi'), + # Worker had 2 boards but only 1 result, then + # build_done + {'resp': 'build_done', 'exceptions': 1}, + # Final build_done response after boss sends + # build_done + {'resp': 'build_done'}, + ]) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = { + 'rpi': FakeBoard('rpi', 'arm'), + 'odroid': FakeBoard('odroid', 'arm'), + } + builder = _make_builder(self._tmpdir) + + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # Only 1 result processed (odroid was lost to a thread + # exception) + self.assertEqual(builder.process_result.call_count, 1) + + +class TestPipelinedBuilds(unittest.TestCase): + """Test pipelined builds with multiple slots""" + + def setUp(self): + self._tmpdir = tempfile.mkdtemp() + + def tearDown(self): + shutil.rmtree(self._tmpdir, ignore_errors=True) + + def test_multiple_boards(self): + """Test that boss sends build_prepare then build_board for each""" + wrk = TestBuildBoards._make_worker( + 'host1', {'arm': '/usr/bin/arm-gcc'}, [ + {'resp': 'build_prepare_done'}, + _make_result('b1'), + _make_result('b2'), + _make_result('b3'), + _make_result('b4'), + {'resp': 'build_done'}, + ]) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = { + 'b1': FakeBoard('b1', 'arm'), + 'b2': FakeBoard('b2', 'arm'), + 'b3': FakeBoard('b3', 'arm'), + 'b4': FakeBoard('b4', 'arm'), + } + builder = _make_builder(self._tmpdir) + with terminal.capture(): + pool.build_boards(boards, None, builder) + + # build_prepare called once, build_board called 4 times + wrk.build_prepare.assert_called_once() + sent_boards = {call[0][0] + for call in wrk.build_board.call_args_list} + self.assertEqual(sent_boards, {'b1', 'b2', 'b3', 'b4'}) + # All 4 results collected + self.assertEqual(builder.process_result.call_count, 4) + + @mock.patch('subprocess.Popen') + def test_slots_from_ready(self, mock_popen): + """Test that slots is read from the worker's ready response""" + proc = FakeProc([ + _make_response( + {'resp': 'ready', 'nthreads': 20, 'slots': 5}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertEqual(w.nthreads, 20) + self.assertEqual(w.slots, 5) + + @mock.patch('subprocess.Popen') + def test_slots_default(self, mock_popen): + """Test that slots defaults to 1 for old workers""" + proc = FakeProc([ + _make_response({'resp': 'ready', 'nthreads': 8}), + ]) + w = _start_worker('host1', mock_popen, proc) + self.assertEqual(w.slots, 1) + + +class TestBuildTimeout(unittest.TestCase): + """Test that the build timeout prevents hangs""" + + def setUp(self): + self._tmpdir = tempfile.mkdtemp() + self._orig_timeout = boss.BUILD_TIMEOUT + + def tearDown(self): + shutil.rmtree(self._tmpdir, ignore_errors=True) + boss.BUILD_TIMEOUT = self._orig_timeout + + def test_recv_timeout(self): + """Test that a hung worker times out instead of blocking""" + + # Use a very short timeout so the test runs quickly + boss.BUILD_TIMEOUT = 0.5 + + wrk = TestBuildBoards._make_worker( + 'slowhost', {'arm': '/usr/bin/arm-gcc'}, []) + wrk.nthreads = 1 + wrk.max_boards = 1 + wrk.slots = 1 + + # Simulate a worker that never responds: recv blocks forever + wrk.recv.side_effect = lambda: time.sleep(60) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {'rpi': FakeBoard('rpi', 'arm')} + builder = _make_builder(self._tmpdir) + + start = time.monotonic() + with terminal.capture(): + pool.build_boards(boards, None, builder) + elapsed = time.monotonic() - start + + # Should complete quickly (within a few seconds), not hang + self.assertLess(elapsed, 10) + + # No results should have been processed + builder.process_result.assert_not_called() + + +class TestMachineMaxBoards(unittest.TestCase): + """Test per-machine max_boards config""" + + def test_max_boards_from_config(self): + """Test [machine:name] section sets max_boards""" + bsettings.setup('') + bsettings.settings.read_string(""" +[machines] +ruru +weka + +[machine:ruru] +max_boards = 64 +""") + pool = machine.MachinePool() + pool._load_from_config() + by_name = {m.name: m for m in pool.machines} + self.assertEqual(by_name['ruru'].max_boards, 64) + self.assertEqual(by_name['weka'].max_boards, 0) + + def test_max_boards_default(self): + """Test max_boards is 0 when no per-machine section exists""" + mach = machine.Machine('host1') + self.assertEqual(mach.max_boards, 0) + + +class TestGccVersion(unittest.TestCase): + """Test gcc_version()""" + + def test_buildman_toolchain(self): + """Test extracting version from a buildman-fetched toolchain""" + path = ('/home/sglass/.buildman-toolchains/gcc-13.1.0-nolibc/' + 'aarch64-linux/bin/aarch64-linux-gcc') + self.assertEqual(machine.gcc_version(path), 'gcc-13.1.0-nolibc') + + def test_different_version(self): + """Test a different gcc version""" + path = ('/home/sglass/.buildman-toolchains/gcc-11.1.0-nolibc/' + 'aarch64-linux/bin/aarch64-linux-gcc') + self.assertEqual(machine.gcc_version(path), 'gcc-11.1.0-nolibc') + + def test_system_gcc(self): + """Test a system gcc with no version directory""" + self.assertIsNone(machine.gcc_version('/usr/bin/gcc')) + + def test_empty(self): + """Test an empty path""" + self.assertIsNone(machine.gcc_version('')) + + + +class _PipelineWorker: # pylint: disable=R0902 + """Mock worker that simulates the demand-driven build protocol + + Responds to build_prepare, build_board, and build_done commands + via the recv() queue, simulating real worker behaviour. + """ + + def __init__(self, nthreads, max_boards=0): + self.hostname = 'sim-host' + self.name = 'sim-host' + self.toolchains = {'arm': '/usr/bin/arm-gcc'} + self.nthreads = nthreads + self.max_boards = max_boards or nthreads + self.slots = nthreads + self.bogomips = 5000.0 + self.closing = False + + self._ready = queue.Queue() + self._commits = None + + def build_prepare(self, commits): + """Accept prepare command and queue ready response""" + self._commits = commits + self._ready.put({'resp': 'build_prepare_done'}) + + def build_board(self, board, _arch): + """Queue results for this board across all commits""" + + def _produce(): + for cu in range(len(self._commits)): + time.sleep(random.uniform(0.001, 0.005)) + self._ready.put({ + 'resp': 'build_result', + 'board': board, + 'commit_upto': cu, + 'return_code': 0, + 'stderr': '', + 'stdout': '', + }) + + threading.Thread(target=_produce, daemon=True).start() + + def build_done(self): + """Queue the build_done response after a short delay""" + def _respond(): + time.sleep(0.05) + self._ready.put({'resp': 'build_done'}) + threading.Thread(target=_respond, daemon=True).start() + + def recv(self): + """Wait for the next result""" + return self._ready.get() + + +class TestBuildBoardsUtilisation(unittest.TestCase): + """Test that build_boards dispatches correctly to workers + + The boss sends one build_boards command per worker. The mock + worker simulates board-first scheduling and produces results. + """ + + def setUp(self): + self._tmpdir = tempfile.mkdtemp() + + def tearDown(self): + shutil.rmtree(self._tmpdir, ignore_errors=True) + + def _run_build(self, nboards, ncommits, nthreads, max_boards=0): + """Run a build and return the mock worker""" + wrk = _PipelineWorker(nthreads, max_boards=max_boards) + + pool = boss.WorkerPool([]) + pool.workers = [wrk] + + boards = {} + for i in range(nboards): + target = f'board{i}' + boards[target] = FakeBoard(target, 'arm') + + commits = [types.SimpleNamespace(hash=f'commit{i}') + for i in range(ncommits)] + + builder = mock.Mock() + builder.force_build = True + builder.base_dir = self._tmpdir + builder.count = 0 + builder.get_build_dir.side_effect = ( + lambda c, b: os.path.join(self._tmpdir, b)) + + with terminal.capture(): + pool.build_boards(boards, commits, builder) + return wrk + + def test_all_results_collected(self): + """Verify boss collects all board x commit results""" + nboards = 30 + ncommits = 10 + wrk = self._run_build(nboards, ncommits, nthreads=8) + + # The ready queue should be empty (boss drained everything) + self.assertTrue(wrk._ready.empty()) + + def test_large_scale(self): + """Test at realistic scale: 200 boards x 10 commits""" + wrk = self._run_build(nboards=200, ncommits=10, + nthreads=32) + self.assertTrue(wrk._ready.empty()) + + def test_max_boards_caps_batch(self): + """Test that max_boards limits initial and in-flight boards""" + wrk = self._run_build(nboards=30, ncommits=5, + nthreads=32, max_boards=8) + self.assertTrue(wrk._ready.empty()) + # Worker should still receive all boards despite the cap + self.assertEqual(wrk.max_boards, 8) + + +class TestFormatBytes(unittest.TestCase): + """Test _format_bytes()""" + + def test_format_bytes(self): + """Test bytes, KB and MB ranges""" + self.assertEqual(boss._format_bytes(0), '0B') + self.assertEqual(boss._format_bytes(1023), '1023B') + self.assertEqual(boss._format_bytes(1024), '1.0KB') + self.assertEqual(boss._format_bytes(1536), '1.5KB') + self.assertEqual(boss._format_bytes(1024 * 1024), '1.0MB') + self.assertEqual(boss._format_bytes(5 * 1024 * 1024), '5.0MB') + + +# Tests merged into TestWriteRemoteResult above + + def test_with_stderr(self): + """Test writing result with stderr""" + bldr = mock.Mock() + bldr.get_build_dir.return_value = tempfile.mkdtemp() + resp = { + 'board': 'sandbox', + 'commit_upto': 0, + 'return_code': 2, + 'stderr': 'error: missing header\n', + 'stdout': '', + } + boss._write_remote_result(bldr, resp, {'sandbox': mock.Mock()}, + 'host1') + build_dir = bldr.get_build_dir.return_value + err_path = os.path.join(build_dir, 'err') + self.assertIn('error:', + tools.read_file(err_path, binary=False)) + + shutil.rmtree(build_dir) + + def test_removes_stale_err(self): + """Test that stale err file is removed on success""" + bldr = mock.Mock() + build_dir = tempfile.mkdtemp() + bldr.get_build_dir.return_value = build_dir + err_path = os.path.join(build_dir, 'err') + tools.write_file(err_path, 'old error', binary=False) + resp = { + 'board': 'sandbox', + 'commit_upto': 0, + 'return_code': 0, + 'stderr': '', + 'stdout': '', + } + boss._write_remote_result(bldr, resp, {'sandbox': mock.Mock()}, + 'host1') + self.assertFalse(os.path.exists(err_path)) + + shutil.rmtree(build_dir) + + +class TestRemoteWorkerMethods(unittest.TestCase): + """Test RemoteWorker send/recv/close methods""" + + def test_send(self): + """Test _send writes JSON to stdin""" + wrk = _make_worker() + wrk._proc.stdin = mock.Mock() + wrk._send({'cmd': 'quit'}) + wrk._proc.stdin.write.assert_called_once() + data = wrk._proc.stdin.write.call_args[0][0] + self.assertIn(b'quit', data) + + def test_send_broken_pipe(self): + """Test _send raises BrokenPipeError on broken pipe""" + wrk = _make_worker() + wrk._proc.stdin.write.side_effect = BrokenPipeError() + with self.assertRaises(BrokenPipeError): + wrk._send({'cmd': 'quit'}) + + def test_build_commands(self): + """Test build_prepare, build_board and build_done commands""" + wrk = _make_worker() + wrk._send = mock.Mock() + + wrk.build_prepare(['abc123']) + self.assertEqual(wrk._send.call_args[0][0]['cmd'], + 'build_prepare') + + wrk._send.reset_mock() + wrk.build_board('sandbox', 'sandbox') + self.assertEqual(wrk._send.call_args[0][0]['cmd'], + 'build_board') + + wrk._send.reset_mock() + wrk.build_done() + self.assertEqual(wrk._send.call_args[0][0]['cmd'], + 'build_done') + + def test_quit(self): + """Test quit sends command and closes, handles errors""" + wrk = _make_worker() + wrk._send = mock.Mock() + wrk._recv = mock.Mock(return_value={'resp': 'quit_ack'}) + wrk.close = mock.Mock() + wrk.quit() + wrk._send.assert_called_once() + wrk.close.assert_called_once() + + # Error path: BossError during quit still closes + wrk2 = _make_worker() + wrk2._send = mock.Mock(side_effect=boss.BossError('gone')) + wrk2.close = mock.Mock() + wrk2.quit() + wrk2.close.assert_called_once() + + def test_close_idempotent(self): + """Test close can be called multiple times""" + wrk = _make_worker() + wrk.close() + self.assertIsNone(wrk._proc) + wrk.close() # should not raise + + @mock.patch('buildman.boss._run_ssh') + def test_remove_lock(self, mock_ssh): + """Test remove_lock: SSH call, no work_dir, SSH failure""" + wrk = _make_worker() + wrk.work_dir = '/tmp/bm' + wrk.remove_lock() + mock_ssh.assert_called_once() + + # No work_dir: does nothing + wrk2 = _make_worker() + wrk2.work_dir = '' + wrk2.remove_lock() + + # SSH failure: silently ignored + mock_ssh.side_effect = boss.BossError('gone') + wrk3 = _make_worker() + wrk3.work_dir = '/tmp/bm' + wrk3.remove_lock() + + + + + +class TestWorkerPoolCapacity(unittest.TestCase): + """Test WorkerPool capacity and arch assignment""" + + def test_get_capacity(self): + """Test capacity calculation""" + wrk = mock.Mock(nthreads=8, bogomips=5000.0) + self.assertEqual( + boss.WorkerPool._get_capacity(wrk), 40000.0) + + def test_get_capacity_no_bogomips(self): + """Test capacity with no bogomips falls back to nthreads""" + wrk = mock.Mock(nthreads=4, bogomips=0) + self.assertEqual(boss.WorkerPool._get_capacity(wrk), 4.0) + + def test_get_worker_for_arch(self): + """Test arch-based worker selection""" + w1 = mock.Mock(nthreads=8, bogomips=5000.0, + toolchains={'arm': '/gcc'}) + w2 = mock.Mock(nthreads=4, bogomips=5000.0, + toolchains={'arm': '/gcc'}) + pool = boss.WorkerPool.__new__(boss.WorkerPool) + pool.workers = [w1, w2] + assigned = {} + + # First assignment should go to w1 (higher capacity) + wrk = pool._get_worker_for_arch('arm', assigned) + self.assertEqual(wrk, w1) + + # Second should go to w2 (w1 already has 1) + wrk = pool._get_worker_for_arch('arm', assigned) + self.assertEqual(wrk, w2) + + def test_get_worker_sandbox(self): + """Test sandbox goes to any worker""" + w1 = mock.Mock(nthreads=4, bogomips=1000.0, toolchains={}) + pool = boss.WorkerPool.__new__(boss.WorkerPool) + pool.workers = [w1] + assigned = {} + wrk = pool._get_worker_for_arch('sandbox', assigned) + self.assertEqual(wrk, w1) + + def test_get_worker_no_capable(self): + """Test returns None when no worker supports arch""" + w1 = mock.Mock(nthreads=4, bogomips=1000.0, + toolchains={'arm': '/gcc'}) + pool = boss.WorkerPool.__new__(boss.WorkerPool) + pool.workers = [w1] + self.assertIsNone( + pool._get_worker_for_arch('mips', {})) + + +class TestBossLog(unittest.TestCase): + """Test _BossLog""" + + def test_log_and_close(self): + """Test logging and closing""" + with tempfile.TemporaryDirectory() as tmpdir: + blog = boss._BossLog(tmpdir) + wrk = mock.Mock(name='host1', nthreads=4) + wrk.name = 'host1' + blog.init_worker(wrk) + blog.log('test message') + blog.record_sent('host1', 3) + blog.record_recv('host1', load_avg=2.5) + blog.close() + + log_path = os.path.join(tmpdir, '.buildman.log') + content = tools.read_file(log_path, binary=False) + self.assertIn('test message', content) + + def test_log_status(self): + """Test log_status writes counts""" + with tempfile.TemporaryDirectory() as tmpdir: + blog = boss._BossLog(tmpdir) + wrk = mock.Mock(name='host1', nthreads=4) + wrk.name = 'host1' + blog.init_worker(wrk) + blog.record_sent('host1', 5) + blog.record_recv('host1') + blog.record_recv('host1') + blog.log_status() + blog.close() + + content = tools.read_file( + os.path.join(tmpdir, '.buildman.log'), binary=False) + self.assertIn('host1', content) + + def test_start_timer(self): + """Test start_timer and close with elapsed""" + with tempfile.TemporaryDirectory() as tmpdir: + blog = boss._BossLog(tmpdir) + blog.start_timer() + blog.close() + + +class TestDispatchContext(unittest.TestCase): + """Test _DispatchContext""" + + def test_update_progress_build_started(self): + """Test worktree progress: build_started""" + wrk = mock.Mock(nthreads=4) + wrk.name = 'host1' + builder = mock.Mock() + with tempfile.TemporaryDirectory() as tmpdir: + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + resp = {'resp': 'build_started', 'num_threads': 8} + self.assertTrue(ctx.update_progress(resp, wrk)) + ctx.close() + + def test_update_progress_worktree_created(self): + """Test worktree progress: worktree_created""" + wrk = mock.Mock(nthreads=2) + wrk.name = 'host1' + builder = mock.Mock() + with tempfile.TemporaryDirectory() as tmpdir: + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + ctx.update_progress( + {'resp': 'build_started', 'num_threads': 2}, wrk) + self.assertTrue(ctx.update_progress( + {'resp': 'worktree_created'}, wrk)) + ctx.close() + + def test_update_progress_other(self): + """Test non-progress messages return False""" + wrk = mock.Mock(nthreads=4) + wrk.name = 'host1' + builder = mock.Mock() + with tempfile.TemporaryDirectory() as tmpdir: + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + self.assertFalse(ctx.update_progress( + {'resp': 'build_result'}, wrk)) + ctx.close() + + def test_log(self): + """Test per-worker log file""" + wrk = mock.Mock(nthreads=4) + wrk.name = 'host1' + builder = mock.Mock() + with tempfile.TemporaryDirectory() as tmpdir: + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + ctx.log(wrk, '>>>', 'test message') + ctx.close() + content = tools.read_file( + os.path.join(tmpdir, 'worker-host1.log'), + binary=False) + self.assertIn('test message', content) + + +class TestRemoteWorkerClose(unittest.TestCase): + """Test RemoteWorker.close() error paths""" + + def test_close_error_paths(self): + """Test close: stdin OSError, terminate timeout, kill timeout""" + # stdin.close() raises OSError + wrk = _make_worker() + wrk._proc.stdin.close.side_effect = OSError('broken') + wrk.close() + self.assertIsNone(wrk._proc) + + # wait() times out, terminate succeeds + wrk2 = _make_worker() + wrk2._proc.wait.side_effect = [ + subprocess.TimeoutExpired('ssh', 2), + None, + ] + wrk2.close() + self.assertIsNone(wrk2._proc) + + # wait() times out twice, falls back to kill + wrk3 = _make_worker() + wrk3._proc.wait.side_effect = [ + subprocess.TimeoutExpired('ssh', 2), + subprocess.TimeoutExpired('ssh', 3), + ] + wrk3.close() + self.assertIsNone(wrk3._proc) + + def test_configure_rejected(self): + """Test configure raises on rejection""" + wrk = _make_worker() + wrk._send = mock.Mock() + wrk._recv = mock.Mock(return_value={'resp': 'error', 'msg': 'bad'}) + with self.assertRaises(boss.BossError): + wrk.configure({'no_lto': True}) + + def test_get_stderr(self): + """Test _get_stderr returns last non-empty line, or empty""" + wrk = _make_worker() + wrk._stderr_thread = mock.Mock() + wrk._stderr_lines = ['first', '', 'last error', ''] + self.assertEqual(wrk._get_stderr(), 'last error') + + wrk._stderr_lines = [] + self.assertEqual(wrk._get_stderr(), '') + + +# Tests merged into TestWriteRemoteResult above + + def test_sizes_with_header(self): + """Test that size output header line is stripped""" + bldr = mock.Mock() + build_dir = tempfile.mkdtemp() + bldr.get_build_dir.return_value = build_dir + resp = { + 'board': 'sandbox', + 'commit_upto': 0, + 'return_code': 0, + 'stderr': '', + 'stdout': '', + 'sizes': { + 'raw': (' text data bss dec hex\n' + ' 1000 200 100 1300 514\n')}, + } + boss._write_remote_result(bldr, resp, {'sandbox': mock.Mock()}, + 'host1') + sizes_content = tools.read_file( + os.path.join(build_dir, 'sizes'), binary=False) + self.assertNotIn('text', sizes_content) + self.assertIn('1000', sizes_content) + + shutil.rmtree(build_dir) + + + + + +class TestWorkerPoolEdgeCases(unittest.TestCase): + """Test WorkerPool edge cases""" + + def test_print_transfer_empty(self): + """Test print_transfer_summary with no workers""" + pool = boss.WorkerPool([]) + with terminal.capture(): + pool.print_transfer_summary() + + def test_quit_all_with_boss_log(self): + """Test quit_all closes boss_log""" + pool = boss.WorkerPool([]) + blog = mock.Mock() + pool._boss_log = blog + with terminal.capture(): + pool.quit_all() + blog.log.assert_called() + blog.close.assert_called_once() + + def test_build_boards_with_local_count(self): + """Test build_boards progress includes local count""" + pool = boss.WorkerPool([]) + wrk = mock.Mock( + nthreads=4, bogomips=1000.0, slots=2, + toolchains={'arm': '/gcc'}, closing=False) + wrk.name = 'host1' + wrk.recv.side_effect = [ + {'resp': 'build_prepare_done'}, + {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''}, + {'resp': 'build_done'}, + ] + pool.workers = [wrk] + + builder = mock.Mock() + builder.force_build = True + builder.base_dir = tempfile.mkdtemp() + builder.count = 0 + builder.get_build_dir.return_value = tempfile.mkdtemp() + + boards = {'rpi': mock.Mock(target='rpi', arch='arm')} + with terminal.capture(): + pool.build_boards(boards, None, builder, local_count=5) + + shutil.rmtree(builder.base_dir) + shutil.rmtree(builder.get_build_dir.return_value) + + def test_get_capacity_no_bogomips(self): + """Test _get_worker_for_arch falls back when bogomips is 0""" + w1 = mock.Mock(nthreads=4, bogomips=0, toolchains={'arm': '/gcc'}) + pool = boss.WorkerPool.__new__(boss.WorkerPool) + pool.workers = [w1] + wrk = pool._get_worker_for_arch('arm', {}) + self.assertEqual(wrk, w1) + + def test_close_all(self): + """Test close_all with boss_log""" + pool = boss.WorkerPool([]) + wrk = mock.Mock() + wrk.closing = False + wrk.bytes_sent = 100 + wrk.bytes_recv = 200 + wrk.name = 'host1' + pool.workers = [wrk] + blog = mock.Mock() + pool._boss_log = blog + with terminal.capture(): + pool.close_all() + wrk.close.assert_called() + wrk.remove_lock.assert_called() + blog.close.assert_called_once() + self.assertEqual(len(pool.workers), 0) + + +class TestDispatchContextRecv(unittest.TestCase): + """Test _DispatchContext.recv() error paths""" + + + @mock.patch.object(boss, 'BUILD_TIMEOUT', 0.01) + def test_recv_timeout(self): + """Test recv returns None on timeout""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() # empty queue + with terminal.capture(): + result = ctx.recv(wrk, recv_q) + self.assertIsNone(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_recv_error_response(self): + """Test recv returns None on error response""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + recv_q.put(('error', 'something broke')) + with terminal.capture(): + result = ctx.recv(wrk, recv_q) + self.assertIsNone(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_recv_worker_error(self): + """Test recv returns None on worker error response""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'error', 'msg': 'oops'})) + with terminal.capture(): + result = ctx.recv(wrk, recv_q) + self.assertIsNone(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_write_result_exception(self): + """Test write_result catches exceptions""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + ctx.builder.get_build_dir.side_effect = RuntimeError('boom') + resp = {'board': 'sandbox', 'commit_upto': 0, + 'return_code': 0} + with terminal.capture(): + result = ctx.write_result(wrk, resp) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_wait_for_prepare(self): + """Test wait_for_prepare with prepare_done""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_prepare_done'})) + result = ctx.wait_for_prepare(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + + @mock.patch.object(boss, 'BUILD_TIMEOUT', 0.01) + def test_wait_for_prepare_timeout(self): + """Test wait_for_prepare returns False on timeout""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + with terminal.capture(): + result = ctx.wait_for_prepare(wrk, recv_q) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_send_batch_error(self): + """Test send_batch returns -1 on BossError""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + wrk.build_board.side_effect = boss.BossError('gone') + brd = mock.Mock(target='rpi', arch='arm') + result = ctx.send_batch(wrk, [brd]) + self.assertEqual(result, -1) + ctx.close() + shutil.rmtree(tmpdir) + + def test_collect_results_heartbeat(self): + """Test collect_results skips heartbeat""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + ctx.board_selected = {'rpi': mock.Mock(target='rpi', arch='arm')} + build_dir = tempfile.mkdtemp() + ctx.builder.get_build_dir.return_value = build_dir + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'heartbeat'})) + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + recv_q.put(('resp', {'resp': 'build_done'})) + state = boss.DemandState( + sent=1, ncommits=1, grab_func=lambda w, c: []) + with terminal.capture(): + result = ctx.collect_results(wrk, recv_q, state) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(build_dir) + + +class TestForwardStderr(unittest.TestCase): + """Test RemoteWorker._forward_stderr()""" + + def test_forward_stderr(self): + """Test stderr collection and OSError handling""" + wrk = boss.RemoteWorker.__new__(boss.RemoteWorker) + wrk.name = 'host1' + wrk._stderr_lines = [] + wrk._proc = mock.Mock() + wrk._proc.stderr = [b'error line 1\n', b'error line 2\n', b''] + with terminal.capture(): + wrk._forward_stderr() + self.assertEqual(wrk._stderr_lines, + ['error line 1', 'error line 2']) + + # OSError path: silently handled + wrk2 = boss.RemoteWorker.__new__(boss.RemoteWorker) + wrk2.name = 'host1' + wrk2._stderr_lines = [] + wrk2._proc = mock.Mock() + wrk2._proc.stderr.__iter__ = mock.Mock( + side_effect=OSError('closed')) + wrk2._forward_stderr() + + +class TestStartDebug(unittest.TestCase): + """Test RemoteWorker.start() debug flag (line 242)""" + + @mock.patch('subprocess.Popen') + @mock.patch('buildman.boss._run_ssh') + def test_debug_flag(self, _mock_ssh, mock_popen): + """Test start() passes -D when debug=True""" + proc = mock.Mock() + proc.stdout.readline.return_value = ( + b'BM> {"resp":"ready","nthreads":4,"slots":2}\n') + proc.poll.return_value = None + proc.stderr = [] # empty iterable for _forward_stderr thread + mock_popen.return_value = proc + + wrk = boss.RemoteWorker.__new__(boss.RemoteWorker) + wrk.hostname = 'host1' + wrk.name = 'host1' + wrk._proc = None + wrk._closed = False + wrk._closing = False + wrk._stderr_lines = [] + wrk._stderr_thread = None + wrk._ready = queue.Queue() + wrk._log = None + wrk.bytes_sent = 0 + wrk.bytes_recv = 0 + wrk.nthreads = 0 + wrk.slots = 0 + wrk.max_boards = 0 + wrk.toolchains = {} + wrk.closing = False + wrk._work_dir = '/tmp/bm' + wrk._git_dir = '/tmp/bm/.git' + wrk.work_dir = '/tmp/bm' + wrk.timeout = 10 + + wrk.start(debug=True) + cmd = mock_popen.call_args[0][0] + self.assertIn('-D', ' '.join(cmd)) + + +class TestBossLogTimer(unittest.TestCase): + """Test _BossLog.start_timer() (lines 607-612)""" + + def test_timer_ticks(self): + """Test that the timer fires and logs status""" + with tempfile.TemporaryDirectory() as tmpdir: + blog = boss._BossLog(tmpdir) + wrk = mock.Mock(nthreads=4) + wrk.name = 'host1' + blog.init_worker(wrk) + blog.record_sent('host1', 5) + + # Patch STATUS_INTERVAL to fire quickly + with mock.patch.object(boss, 'STATUS_INTERVAL', 0.01): + blog.start_timer() + time.sleep(0.05) + blog.close() + + content = tools.read_file( + os.path.join(tmpdir, '.buildman.log'), binary=False) + # Timer should have logged at least one status line + self.assertIn('host1', content) + + +class TestWaitForPrepareProgress(unittest.TestCase): + """Test wait_for_prepare with progress and heartbeat messages""" + + + def test_heartbeat_during_prepare(self): + """Test heartbeat messages are skipped during prepare""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'heartbeat'})) + recv_q.put(('resp', {'resp': 'build_prepare_done'})) + result = ctx.wait_for_prepare(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_progress_during_prepare(self): + """Test worktree progress during prepare""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_started', 'num_threads': 2})) + recv_q.put(('resp', {'resp': 'worktree_created'})) + recv_q.put(('resp', {'resp': 'build_prepare_done'})) + result = ctx.wait_for_prepare(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_unexpected_during_prepare(self): + """Test unexpected response during prepare returns False""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'something_weird'})) + result = ctx.wait_for_prepare(wrk, recv_q) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + +class TestRecvOne(unittest.TestCase): + """Test _DispatchContext.recv_one()""" + + + @mock.patch.object(boss, 'BUILD_TIMEOUT', 0.01) + def test_recv_one_timeout(self): + """Test recv_one returns False on timeout""" + ctx, wrk, tmpdir = _make_ctx() + recv_q = queue.Queue() + with terminal.capture(): + result = ctx.recv_one(wrk, recv_q) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_recv_one_build_done(self): + """Test recv_one handles build_done with exceptions""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_done', 'exceptions': 2})) + result = ctx.recv_one(wrk, recv_q) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_recv_one_build_result(self): + """Test recv_one processes build_result""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + build_dir = tempfile.mkdtemp() + ctx.builder.get_build_dir.return_value = build_dir + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + with terminal.capture(): + result = ctx.recv_one(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(build_dir) + + def test_recv_one_heartbeat(self): + """Test recv_one skips heartbeat then gets result""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + build_dir = tempfile.mkdtemp() + ctx.builder.get_build_dir.return_value = build_dir + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'heartbeat'})) + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + with terminal.capture(): + result = ctx.recv_one(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(build_dir) + + def test_recv_one_other(self): + """Test recv_one returns True for unknown response""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'something_else'})) + result = ctx.recv_one(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_recv_one_progress(self): + """Test recv_one handles progress then result""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + build_dir = tempfile.mkdtemp() + ctx.builder.get_build_dir.return_value = build_dir + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_started', 'num_threads': 2})) + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + with terminal.capture(): + result = ctx.recv_one(wrk, recv_q) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(build_dir) + + +class TestCollectResultsExtended(unittest.TestCase): + """Test collect_results with more board grabbing""" + + + def test_collect_grabs_more(self): + """Test collect_results grabs more boards when in_flight drops""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + wrk.max_boards = 2 + build_dir = tempfile.mkdtemp() + ctx.builder.get_build_dir.return_value = build_dir + extra_brd = mock.Mock(target='extra', arch='arm') + grab_calls = [0] + + def grab(_w, _n): + grab_calls[0] += 1 + if grab_calls[0] == 1: + return [extra_brd] + return [] + + state = boss.DemandState(sent=1, ncommits=1, grab_func=grab) + + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + # After rpi completes, in_flight drops to 0 < max_boards=2, + # grab returns extra_brd, then build_done ends collection + recv_q.put(('resp', {'resp': 'build_done'})) + + with terminal.capture(): + ctx.collect_results(wrk, recv_q, state) + + self.assertEqual(state.received, 1) + self.assertGreater(grab_calls[0], 0) + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(ctx.builder.get_build_dir.return_value) + + def test_collect_write_failure(self): + """Test collect_results stops on write_result failure""" + ctx, wrk, tmpdir = _make_ctx( + {'rpi': mock.Mock(target='rpi', arch='arm')}) + ctx.builder.get_build_dir.side_effect = RuntimeError('boom') + + state = boss.DemandState(sent=1, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + with terminal.capture(): + result = ctx.collect_results(wrk, recv_q, state) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + +class TestRunParallelErrors(unittest.TestCase): + """Test WorkerPool._run_parallel error paths""" + + def test_worker_busy_and_boss_error(self): + """Test _run_parallel handles WorkerBusy and BossError""" + pool = boss.WorkerPool([]) + busy_wrk = mock.Mock(name='busy1') + busy_wrk.name = 'busy1' + fail_wrk = mock.Mock(name='fail1') + fail_wrk.name = 'fail1' + + calls = [] + + def func(item): + calls.append(item.name) + if item.name == 'busy1': + raise boss.WorkerBusy('too busy') + raise boss.BossError('failed') + + with terminal.capture(): + pool._run_parallel('Testing', [busy_wrk, fail_wrk], func) + fail_wrk.remove_lock.assert_called_once() + + +class TestCloseAll(unittest.TestCase): + """Test WorkerPool.close_all() (lines 1533-1544)""" + + def test_close_all_sends_quit(self): + """Test close_all sends quit and closes workers""" + pool = boss.WorkerPool([]) + wrk = mock.Mock() + wrk.closing = False + wrk.bytes_sent = 0 + wrk.bytes_recv = 0 + wrk.name = 'host1' + pool.workers = [wrk] + blog = mock.Mock() + pool._boss_log = blog + + with terminal.capture(): + pool.close_all() + + wrk.close.assert_called() + wrk.remove_lock.assert_called() + self.assertEqual(len(pool.workers), 0) + + +class TestCollectResultsTimeout(unittest.TestCase): + """Test collect_results recv timeout and non-result responses""" + + + @mock.patch.object(boss, 'BUILD_TIMEOUT', 0.01) + def test_collect_timeout(self): + """Test collect_results returns False on recv timeout (line 933)""" + ctx, wrk, tmpdir = _make_ctx() + state = boss.DemandState(sent=1, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + with terminal.capture(): + result = ctx.collect_results(wrk, recv_q, state) + self.assertFalse(result) + ctx.close() + shutil.rmtree(tmpdir) + + def test_collect_skips_unknown(self): + """Test collect_results skips non-build_result responses (line 940)""" + ctx, wrk, tmpdir = _make_ctx() + state = boss.DemandState(sent=1, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'configure_done'})) # unknown + recv_q.put(('resp', {'resp': 'build_done'})) + with terminal.capture(): + result = ctx.collect_results(wrk, recv_q, state) + self.assertTrue(result) + ctx.close() + shutil.rmtree(tmpdir) + + +class TestDispatchJobs(unittest.TestCase): + """Test WorkerPool._dispatch_jobs() (lines 1290-1309)""" + + def test_dispatch_jobs(self): + """Test _dispatch_jobs runs batch workers and closes context""" + pool = boss.WorkerPool([]) + wrk = mock.Mock(nthreads=4, closing=False, slots=2) + wrk.name = 'host1' + + brd = mock.Mock(target='rpi', arch='arm') + commit = mock.Mock(hash='abc123') + wjobs = [(brd, 0, commit)] + + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + board_selected = {'rpi': brd} + blog = mock.Mock() + pool._boss_log = blog + + # Mock build_boards to avoid actual protocol + wrk.build_boards = mock.Mock() + # recv_one will be called once — return False to end + with mock.patch.object(boss._DispatchContext, 'start_reader', + return_value=queue.Queue()), \ + mock.patch.object(boss._DispatchContext, 'recv_one', + return_value=False): + with terminal.capture(): + pool._dispatch_jobs({wrk: wjobs}, builder, + board_selected) + + self.assertIsNone(pool._boss_log) + shutil.rmtree(tmpdir) + + +class TestRunBatchWorker(unittest.TestCase): + """Test WorkerPool._run_batch_worker() (lines 1320-1358)""" + + def test_batch_worker_success(self): + """Test _run_batch_worker sends build_boards and collects""" + wrk = mock.Mock(nthreads=4, closing=False, slots=2) + wrk.name = 'host1' + brd = mock.Mock(target='rpi', arch='arm') + commit = mock.Mock(hash='abc123') + + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={'rpi': brd}, boss_log=mock.Mock()) + + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'build_result', 'board': 'rpi', + 'commit_upto': 0, 'return_code': 0, + 'stderr': '', 'stdout': ''})) + build_dir = tempfile.mkdtemp() + builder.get_build_dir.return_value = build_dir + + with mock.patch.object(ctx, 'start_reader', + return_value=recv_q): + with terminal.capture(): + boss.WorkerPool._run_batch_worker( + wrk, [(brd, 0, commit)], ctx) + + wrk.build_boards.assert_called_once() + ctx.close() + shutil.rmtree(tmpdir) + shutil.rmtree(build_dir) + + def test_batch_worker_build_error(self): + """Test _run_batch_worker handles build_boards BossError""" + wrk = mock.Mock(nthreads=4, closing=False, slots=2) + wrk.name = 'host1' + wrk.build_boards.side_effect = boss.BossError('gone') + brd = mock.Mock(target='rpi', arch='arm') + commit = mock.Mock(hash='abc123') + + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()): + with terminal.capture(): + boss.WorkerPool._run_batch_worker( + wrk, [(brd, 0, commit)], ctx) + + ctx.close() + shutil.rmtree(tmpdir) + + +class TestStartDemandWorker(unittest.TestCase): + """Test WorkerPool._start_demand_worker() (lines 1380-1400)""" + + def _make_pool_and_ctx(self): + pool = boss.WorkerPool([]) + wrk = mock.Mock(nthreads=4, closing=False, max_boards=2, + slots=2, toolchains={'arm': '/gcc'}) + wrk.name = 'host1' + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + return pool, wrk, ctx, tmpdir + + def test_prepare_error(self): + """Test _start_demand_worker handles build_prepare BossError""" + pool, wrk, ctx, tmpdir = self._make_pool_and_ctx() + wrk.build_prepare.side_effect = boss.BossError('gone') + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()): + with terminal.capture(): + recv_q, state = pool._start_demand_worker( + wrk, ctx, ['abc'], 1, [], threading.Lock()) + self.assertIsNone(recv_q) + self.assertIsNone(state) + ctx.close() + shutil.rmtree(tmpdir) + + def test_prepare_timeout(self): + """Test _start_demand_worker when wait_for_prepare fails""" + pool, wrk, ctx, tmpdir = self._make_pool_and_ctx() + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()), \ + mock.patch.object(ctx, 'wait_for_prepare', + return_value=False): + with terminal.capture(): + recv_q, state = pool._start_demand_worker( + wrk, ctx, ['abc'], 1, [], threading.Lock()) + self.assertIsNone(recv_q) + ctx.close() + shutil.rmtree(tmpdir) + + def test_no_boards(self): + """Test _start_demand_worker when no boards available + + Also covers the BossError path in build_done (lines 1395-1396). + """ + pool, wrk, ctx, tmpdir = self._make_pool_and_ctx() + wrk.build_done.side_effect = boss.BossError('gone') + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()), \ + mock.patch.object(ctx, 'wait_for_prepare', + return_value=True): + with terminal.capture(): + recv_q, state = pool._start_demand_worker( + wrk, ctx, ['abc'], 1, [], threading.Lock()) + self.assertIsNone(recv_q) + ctx.close() + shutil.rmtree(tmpdir) + + def test_send_batch_failure(self): + """Test _start_demand_worker when send_batch fails""" + pool, wrk, ctx, tmpdir = self._make_pool_and_ctx() + brd = mock.Mock(target='rpi', arch='arm') + pool_list = [brd] + + wrk.build_board.side_effect = boss.BossError('gone') + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()), \ + mock.patch.object(ctx, 'wait_for_prepare', + return_value=True): + with terminal.capture(): + recv_q, state = pool._start_demand_worker( + wrk, ctx, ['abc'], 1, pool_list, + threading.Lock()) + self.assertIsNone(recv_q) + ctx.close() + shutil.rmtree(tmpdir) + + def test_success(self): + """Test _start_demand_worker success path""" + pool, wrk, ctx, tmpdir = self._make_pool_and_ctx() + brd = mock.Mock(target='rpi', arch='arm') + pool_list = [brd] + + with mock.patch.object(ctx, 'start_reader', + return_value=queue.Queue()), \ + mock.patch.object(ctx, 'wait_for_prepare', + return_value=True): + with terminal.capture(): + recv_q, state = pool._start_demand_worker( + wrk, ctx, ['abc'], 1, pool_list, + threading.Lock()) + self.assertIsNotNone(recv_q) + self.assertIsNotNone(state) + self.assertEqual(state.sent, 1) + ctx.close() + shutil.rmtree(tmpdir) + + +class TestFinishDemandWorker(unittest.TestCase): + """Test WorkerPool._finish_demand_worker() (lines 1429-1437)""" + + def test_build_done_error(self): + """Test _finish_demand_worker when build_done raises""" + wrk = mock.Mock(nthreads=4, closing=False, max_boards=0, + slots=2) + wrk.name = 'host1' + wrk.build_done.side_effect = boss.BossError('gone') + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + + state = boss.DemandState(sent=0, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + + with mock.patch.object(ctx, 'collect_results'): + boss.WorkerPool._finish_demand_worker( + wrk, ctx, recv_q, state) + ctx.close() + shutil.rmtree(tmpdir) + + def test_finish_waits_for_done(self): + """Test _finish_demand_worker waits for build_done response""" + wrk = mock.Mock(nthreads=4, closing=False, max_boards=0, + slots=2) + wrk.name = 'host1' + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + + state = boss.DemandState(sent=0, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + recv_q.put(('resp', {'resp': 'heartbeat'})) + recv_q.put(('resp', {'resp': 'build_done'})) + + with mock.patch.object(ctx, 'collect_results'): + boss.WorkerPool._finish_demand_worker( + wrk, ctx, recv_q, state) + ctx.close() + shutil.rmtree(tmpdir) + + @mock.patch.object(boss, 'BUILD_TIMEOUT', 0.01) + def test_finish_recv_timeout(self): + """Test _finish_demand_worker handles recv timeout""" + wrk = mock.Mock(nthreads=4, closing=False, max_boards=0, + slots=2) + wrk.name = 'host1' + builder = mock.Mock() + tmpdir = tempfile.mkdtemp() + builder.base_dir = tmpdir + + ctx = boss._DispatchContext( + workers=[wrk], builder=builder, + board_selected={}, boss_log=mock.Mock()) + + state = boss.DemandState(sent=0, ncommits=1, + grab_func=lambda w, n: []) + recv_q = queue.Queue() + + with mock.patch.object(ctx, 'collect_results'), \ + terminal.capture(): + boss.WorkerPool._finish_demand_worker( + wrk, ctx, recv_q, state) + ctx.close() + shutil.rmtree(tmpdir) + + +class TestCloseAllSignal(unittest.TestCase): + """Test close_all quit/close path (lines 1543-1544)""" + + def test_close_all_quit_error(self): + """Test close_all handles _send BossError during quit""" + pool = boss.WorkerPool([]) + wrk = mock.Mock() + wrk.closing = False + wrk.bytes_sent = 0 + wrk.bytes_recv = 0 + wrk.name = 'host1' + wrk._send.side_effect = boss.BossError('gone') + pool.workers = [wrk] + pool._boss_log = mock.Mock() + + with terminal.capture(): + pool.close_all() + wrk.close.assert_called() + self.assertEqual(len(pool.workers), 0) + + +class TestZeroCapacity(unittest.TestCase): + """Test _get_worker_for_arch with zero total capacity (line 1183)""" + + def test_zero_nthreads(self): + """Test workers with 0 nthreads don't cause division by zero""" + w1 = mock.Mock(nthreads=0, bogomips=0, + toolchains={'arm': '/gcc'}) + pool = boss.WorkerPool.__new__(boss.WorkerPool) + pool.workers = [w1] + # Should not raise ZeroDivisionError + wrk = pool._get_worker_for_arch('arm', {}) + self.assertEqual(wrk, w1) + + +if __name__ == '__main__': + unittest.main() diff --git a/tools/buildman/test_machine.py b/tools/buildman/test_machine.py index b635d1afb6f..eb28d313e3e 100644 --- a/tools/buildman/test_machine.py +++ b/tools/buildman/test_machine.py @@ -264,7 +264,8 @@ class TestMachinePool(unittest.TestCase): 'host2\n' ) pool = machine.MachinePool() - available = pool.probe_all() + with terminal.capture(): + available = pool.probe_all() self.assertEqual(len(available), 2) self.assertEqual(pool.get_total_weight(), 14) @@ -285,7 +286,8 @@ class TestMachinePool(unittest.TestCase): 'host2\n' ) pool = machine.MachinePool() - available = pool.probe_all() + with terminal.capture(): + available = pool.probe_all() self.assertEqual(len(available), 1) self.assertEqual(available[0].hostname, 'host1') @@ -311,8 +313,10 @@ sandbox : /usr/bin/gcc 'host1\n' ) pool = machine.MachinePool() - pool.probe_all() - missing = pool.check_toolchains({'arm', 'sandbox'}) + with terminal.capture(): + pool.probe_all() + with terminal.capture(): + missing = pool.check_toolchains({'arm', 'sandbox'}) self.assertEqual(missing, {}) @mock.patch('buildman.machine._run_ssh') @@ -336,8 +340,10 @@ sandbox : /usr/bin/gcc 'host1\n' ) pool = machine.MachinePool() - pool.probe_all() - missing = pool.check_toolchains({'arm', 'sandbox'}) + with terminal.capture(): + pool.probe_all() + with terminal.capture(): + missing = pool.check_toolchains({'arm', 'sandbox'}) self.assertEqual(len(missing), 1) m = list(missing.keys())[0] self.assertIn('arm', missing[m]) @@ -683,7 +689,8 @@ class TestMachinePoolExtended(unittest.TestCase): 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() # Just verify it doesn't crash with terminal.capture(): pool.print_summary() @@ -703,8 +710,10 @@ class TestMachinePoolExtended(unittest.TestCase): mock_ssh.side_effect = ssh_side_effect bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() - pool.check_toolchains({'arm', 'sandbox'}) + with terminal.capture(): + pool.probe_all() + with terminal.capture(): + pool.check_toolchains({'arm', 'sandbox'}) with terminal.capture(): pool.print_summary(local_archs={'arm', 'sandbox'}) @@ -728,7 +737,8 @@ class TestMachinePoolExtended(unittest.TestCase): mock_ssh.side_effect = ssh_side_effect bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() local_gcc = { 'arm': f'{home}/.buildman-toolchains/gcc-13.1.0-nolibc/' @@ -751,7 +761,8 @@ class TestMachinePoolExtended(unittest.TestCase): with mock.patch.object(machine.Machine, '_probe_toolchains_from_boss', fake_probe): - missing = pool.check_toolchains({'arm'}, local_gcc=local_gcc) + with terminal.capture(): + missing = pool.check_toolchains({'arm'}, local_gcc=local_gcc) # arm should be flagged as missing due to version mismatch self.assertEqual(len(missing), 1) @@ -839,7 +850,8 @@ class TestPrintSummaryEdgeCases(unittest.TestCase): mock_ssh.side_effect = machine.MachineError('refused') bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() # Should not crash with unavailable machine with terminal.capture(): pool.print_summary() @@ -851,7 +863,8 @@ class TestPrintSummaryEdgeCases(unittest.TestCase): **MACHINE_INFO, 'load_1m': 10.0}) bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() with terminal.capture(): pool.print_summary() @@ -865,7 +878,8 @@ class TestPrintSummaryEdgeCases(unittest.TestCase): '[machines]\nhost1\n' '[machine:host1]\nmax_boards = 50\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() with terminal.capture(): pool.print_summary() @@ -877,7 +891,8 @@ class TestPrintSummaryEdgeCases(unittest.TestCase): 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() pool.machines[0].tc_error = 'buildman not found' with terminal.capture(): pool.print_summary(local_archs={'arm'}) @@ -890,7 +905,8 @@ class TestPrintSummaryEdgeCases(unittest.TestCase): 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() pool.machines[0].toolchains = {'sandbox': '/usr/bin/gcc'} local_gcc = { 'arm': os.path.expanduser( @@ -912,7 +928,8 @@ class TestCheckToolchainsEdge(unittest.TestCase): bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() # Machine is not probed, so not reachable - result = pool.check_toolchains({'arm'}) + with terminal.capture(): + result = pool.check_toolchains({'arm'}) self.assertEqual(result, {}) @mock.patch('buildman.machine._run_ssh') @@ -931,15 +948,17 @@ class TestCheckToolchainsEdge(unittest.TestCase): mock_ssh.side_effect = ssh_side_effect bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() local_gcc = { 'arm': f'{home}/.buildman-toolchains/gcc-13/arm/bin/gcc', } # fetch=True should trigger _fetch_all_missing with mock.patch.object(pool, '_fetch_all_missing') as mock_fetch: - pool.check_toolchains({'arm'}, fetch=True, - local_gcc=local_gcc) + with terminal.capture(): + pool.check_toolchains({'arm'}, fetch=True, + local_gcc=local_gcc) mock_fetch.assert_called_once() @@ -1011,7 +1030,8 @@ class TestPrintSummaryMissingNoVersion(unittest.TestCase): 'mem_avail_mb': 8000, 'disk_avail_mb': 20000}) bsettings.add_file('[machines]\nhost1\n') pool = machine.MachinePool() - pool.probe_all() + with terminal.capture(): + pool.probe_all() pool.machines[0].toolchains = {} # arm has a version (under ~/.buildman-toolchains), # sandbox does not -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add the plumbing to connect the boss and worker modules to the build flow via control.py and cmdline.py. Add --distribute (--dist) flag to activate distributed builds and --use-machines to select a subset of configured machines (implies --dist). Add --no-local to skip local building entirely, sending all boards to remote workers. Add --kill-workers to clean up stale worker processes on remote machines. In control.py, _setup_remote_builds() probes machines, collects toolchains, resolves toolchain aliases, checks gcc versions across machines, splits boards between local and remote, and starts the worker pool. Build settings (verbose, no_lto, allow_missing, etc.) are forwarded to remote workers via the configure command. Show the machine name in the build-progress line for remote results and on verbose output when there is an error or warning. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/builder.py | 12 +- tools/buildman/builderthread.py | 2 + tools/buildman/cmdline.py | 16 ++ tools/buildman/control.py | 297 ++++++++++++++++++++++++++++++-- 4 files changed, 315 insertions(+), 12 deletions(-) diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py index 3264978a616..b3cc136d036 100644 --- a/tools/buildman/builder.py +++ b/tools/buildman/builder.py @@ -374,6 +374,7 @@ class Builder: self.count = 0 self.timestamps = collections.deque() self.verbose = False + self.progress = '' # Note: baseline state for result summaries is now in ResultHandler @@ -591,6 +592,9 @@ class Builder: sys.stderr.write(result.stderr) elif self.verbose: terminal.print_clear() + machine = result.remote + if machine and (result.return_code or result.stderr): + tprint(f'[{machine}]') boards_selected = {target : result.brd} self._result_handler.reset_result_summary(boards_selected) self._result_handler.produce_result_summary( @@ -616,7 +620,13 @@ class Builder: if self._complete_delay: line += f'{self._complete_delay} : ' - line += target + machine = result.remote if result else None + if machine: + line += f'{target} [{machine}]' + elif self.progress: + line += f'{target} [{self.progress}]' + else: + line += f'{target} [local]' if not self._opts.ide: terminal.print_clear() tprint(line, newline=False, limit_to_line=True) diff --git a/tools/buildman/builderthread.py b/tools/buildman/builderthread.py index 6f4f257dedb..3f487ba57c7 100644 --- a/tools/buildman/builderthread.py +++ b/tools/buildman/builderthread.py @@ -434,6 +434,7 @@ class BuilderThread(threading.Thread): - result.stderr set to 'bad' if stderr output was recorded """ result = command.CommandResult() + result.remote = None done_file = self.builder.get_done_file(commit_upto, brd.target) result.already_done = os.path.exists(done_file) result.kconfig_reconfig = False @@ -728,6 +729,7 @@ class BuilderThread(threading.Thread): req, commit_upto, do_config, mrproper, config_only, out_dir, out_rel_dir, result) + result.remote = None result.toolchain = self.toolchain result.brd = req.brd result.commit_upto = commit_upto diff --git a/tools/buildman/cmdline.py b/tools/buildman/cmdline.py index 5f3c47bf7fe..5396ee640fa 100644 --- a/tools/buildman/cmdline.py +++ b/tools/buildman/cmdline.py @@ -105,6 +105,18 @@ def add_upto_m(parser): parser.add_argument( '-M', '--allow-missing', action='store_true', default=False, help='Tell binman to allow missing blobs and generate fake ones as needed') + parser.add_argument('--dist', '--distribute', action='store_true', + dest='distribute', + default=False, + help='Distribute builds to remote machines from [machines] config') + parser.add_argument('--use-machines', type=str, default=None, + dest='use_machines', + help='Comma-separated list of machine names to use for ' + 'distributed builds (default: all from [machines] config)') + parser.add_argument('--no-local', action='store_true', default=False, + dest='no_local', + help='Do not build on the local machine; send all boards to ' + 'remote workers (requires --dist)') parser.add_argument('--mach', '--machines', action='store_true', default=False, dest='machines', help='Probe all remote machines from [machines] config and show ' @@ -119,6 +131,10 @@ def add_upto_m(parser): parser.add_argument( '--maintainer-check', action='store_true', help='Check that maintainer entries exist for each board') + parser.add_argument('--kill-workers', action='store_true', default=False, + dest='kill_workers', + help='Kill stale worker processes and remove lock files on all ' + 'remote machines, then exit') parser.add_argument('--worker', action='store_true', default=False, help='Run in worker mode, accepting build commands on stdin ' '(used internally for distributed builds)') diff --git a/tools/buildman/control.py b/tools/buildman/control.py index 082db377293..bb866910491 100644 --- a/tools/buildman/control.py +++ b/tools/buildman/control.py @@ -10,9 +10,11 @@ This holds the main control logic for buildman, when not running tests. import getpass import multiprocessing import os +import signal import shutil import sys import tempfile +import threading import time from buildman import boards @@ -67,7 +69,8 @@ def count_build_commits(commits, step): return 0 -def get_action_summary(is_summary, commit_count, selected, threads, jobs): +def get_action_summary(is_summary, commit_count, selected, threads, jobs, + no_local=False): """Return a string summarising the intended action. Args: @@ -76,6 +79,7 @@ def get_action_summary(is_summary, commit_count, selected, threads, jobs): selected (list of Board): List of Board objects that are marked threads (int): Number of processor threads being used jobs (int): Number of jobs to build at once + no_local (bool): True if all builds are remote (no local threads) Returns: str: Summary string @@ -86,8 +90,9 @@ def get_action_summary(is_summary, commit_count, selected, threads, jobs): commit_str = 'current source' msg = (f"{'Summary of' if is_summary else 'Building'} " f'{commit_str} for {len(selected)} boards') - msg += (f' ({threads} thread{get_plural(threads)}, ' - f'{jobs} job{get_plural(jobs)} per thread)') + if not no_local: + msg += (f' ({threads} thread{get_plural(threads)}, ' + f'{jobs} job{get_plural(jobs)} per thread)') return msg # pylint: disable=R0913,R0917 @@ -377,7 +382,8 @@ def get_toolchains(toolchains, col, override_toolchain, fetch_arch, if no_toolchains: toolchains.get_settings() - toolchains.scan(list_tool_chains and verbose) + toolchains.scan(list_tool_chains and verbose, + raise_on_error=not list_tool_chains) if list_tool_chains: toolchains.list() print() @@ -538,12 +544,240 @@ def setup_output_dir(output_dir, work_in_output, branch, no_subdirs, col, return output_dir +def _filter_mismatched_toolchains(machines, local_toolchains): + """Remove remote toolchains whose gcc version differs from local + + Compares the gcc version directory (e.g. gcc-13.1.0-nolibc) in + each toolchain path. If a remote machine has a different version + for an architecture, that architecture is removed from the + machine's toolchain list so no boards are sent to it for that arch. + + Args: + machines (list of Machine): Remote machines with toolchains + local_toolchains (dict): arch -> gcc path on the local machine + """ + local_versions = {} + for arch, gcc in local_toolchains.items(): + ver = machine.gcc_version(gcc) + if ver: + local_versions[arch] = ver + + for mach in machines: + mismatched = [] + for arch, gcc in mach.toolchains.items(): + local_ver = local_versions.get(arch) + if not local_ver: + continue + remote_ver = machine.gcc_version(gcc) + if remote_ver and remote_ver != local_ver: + mismatched.append(arch) + for arch in mismatched: + del mach.toolchains[arch] + + +def _collect_worker_settings(args): + """Collect build settings to send to remote workers + + Gathers the command-line flags that affect how make is invoked and + returns them as a dict for the worker's 'configure' command. + + Args: + args (Namespace): Command-line arguments + + Returns: + dict: Settings dict (only includes flags that are set) + """ + settings = {} + flag_names = [ + 'verbose_build', 'allow_missing', 'no_lto', + 'reproducible_builds', 'warnings_as_errors', + 'mrproper', 'fallback_mrproper', 'config_only', + 'force_build', 'kconfig_check', + ] + for name in flag_names: + val = getattr(args, name, None) + if val is not None: + settings[name] = val + return settings + + +def _setup_remote_builds(board_selected, args, git_dir): + """Set up remote workers if machines are configured + + Probes machines, checks toolchains and splits boards into local + and remote sets. Returns a WorkerPool for the remote boards. + + Args: + board_selected (dict): All selected boards + args (Namespace): Command-line arguments + git_dir (str): Path to local .git directory + + Returns: + tuple: + dict: Boards to build locally + dict: Boards to build remotely + WorkerPool or None: Pool of remote workers, or None + """ + from buildman import boss # pylint: disable=C0415 + + # Parse machine name filter from --use-machines + machine_names = None + if args.use_machines: + machine_names = [n.strip() for n in args.use_machines.split(',')] + + no_local = args.no_local + + def _fail(msg): + """Handle a failure to set up remote builds + + With --no-local, prints the error and returns empty dicts so + nothing is built. Otherwise falls back to building everything + locally. + """ + if no_local: + tprint(msg) + return {}, {}, None + return board_selected, {}, None + + machines_config = machine.get_machines_config() + if not machines_config: + return _fail('No machines configured') + + # Probe machines and their toolchains + pool = machine.MachinePool(names=machine_names) + available = pool.probe_all() + if not available: + return _fail('No machines available') + + # Check which of the boss's toolchains exist on each remote + # machine. This makes workers use the boss's toolchain choices + # rather than their own .buildman config. + local_tc = toolchain.Toolchains() + local_tc.get_settings(show_warning=False) + local_tc.scan(verbose=False) + local_gcc = {arch: tc.gcc for arch, tc in local_tc.toolchains.items()} + + # Resolve toolchain aliases (e.g. x86->i386) so that board + # architectures using alias names are recognised by split_boards() + machine.resolve_toolchain_aliases(local_gcc) + + pool.check_toolchains( + set(), buildman_path=args.machines_buildman_path, + local_gcc=local_gcc) + remote_toolchains = {} + for mach in available: + remote_toolchains.update(mach.toolchains) + + if not remote_toolchains: + return _fail('No remote toolchains available') + + if no_local: + local = {} + remote = board_selected + else: + local, remote = boss.split_boards( + board_selected, remote_toolchains) + + if not remote: + return board_selected, {}, None + + # Collect build settings to send to workers. Resolve allow_missing + # using the .buildman config, since workers don't have it. + settings = _collect_worker_settings(args) + settings['allow_missing'] = get_allow_missing( + args.allow_missing, args.no_allow_missing, + len(board_selected), args.branch) + + # Start workers: init git, push source, start from tree + worker_pool = boss.WorkerPool(available) + workers = worker_pool.start_all(git_dir, 'HEAD:refs/heads/work', + debug=args.debug, + settings=settings) + if not workers: + return _fail('No remote workers available') + + return local, remote, worker_pool + + +def _start_remote_builds(builder, commits, board_selected, args): + """Start remote builds in a background thread + + Splits boards between local and remote machines, launches remote + builds in a background thread, and installs a SIGINT handler for + clean shutdown. + + Args: + builder (Builder): Builder to use + commits (list of Commit): Commits to build, or None + board_selected (dict): target -> Board for all selected boards + args (Namespace): Command-line arguments + + Returns: + tuple: (local_boards, remote_thread, worker_pool, extra_count, + old_sigint) + """ + local_boards, remote_boards, worker_pool = ( + _setup_remote_builds(board_selected, args, builder.git_dir)) + + extra_count = 0 + if worker_pool and remote_boards: + commit_count = len(commits) if commits else 1 + extra_count = len(remote_boards) * commit_count + + remote_thread = None + if worker_pool and remote_boards: + remote_thread = threading.Thread( + target=worker_pool.build_boards, + args=(remote_boards, commits, builder, + len(local_boards))) + remote_thread.daemon = True + remote_thread.start() + + # Install a SIGINT handler that cleanly shuts down workers. + # This is more reliable than try/except KeyboardInterrupt since + # SIGINT may terminate the process before the exception handler + # runs. + old_sigint = None + if worker_pool: + def _sigint_handler(_signum, _frame): + worker_pool.close_all() + signal.signal(signal.SIGINT, old_sigint or signal.SIG_DFL) + os.kill(os.getpid(), signal.SIGINT) + old_sigint = signal.signal(signal.SIGINT, _sigint_handler) + + return local_boards, remote_thread, worker_pool, extra_count, old_sigint + + +def _finish_remote_builds(remote_thread, worker_pool, old_sigint, builder): + """Wait for remote builds to finish and clean up + + Args: + remote_thread (Thread or None): Background remote build thread + worker_pool (WorkerPool or None): Worker pool to shut down + old_sigint: Previous SIGINT handler to restore + builder (Builder): Builder for printing the summary + """ + if remote_thread: + try: + while remote_thread.is_alive(): + remote_thread.join(timeout=0.5) + except KeyboardInterrupt: + worker_pool.close_all() + raise + worker_pool.quit_all() + builder.print_summary() + + if worker_pool and old_sigint is not None: + signal.signal(signal.SIGINT, old_sigint) + + def run_builder(builder, commits, board_selected, display_options, args): """Run the builder or show the summary Args: builder (Builder): Builder to use - commits (list of Commit): List of commits being built, None if no branch + commits (list of Commit): List of commits being built, None if + no branch board_selected (dict): Dict of selected boards: key: target name value: Board object @@ -562,8 +796,9 @@ def run_builder(builder, commits, board_selected, display_options, args): if not args.ide: commit_count = count_build_commits(commits, args.step) - tprint(get_action_summary(args.summary, commit_count, board_selected, - args.threads, args.jobs)) + tprint(get_action_summary(args.summary, commit_count, + board_selected, args.threads, + args.jobs, no_local=args.no_local)) builder.set_display_options( display_options, args.filter_dtb_warnings, @@ -573,9 +808,31 @@ def run_builder(builder, commits, board_selected, display_options, args): builder.result_handler.show_summary( commits, board_selected, args.step) else: - fail, warned, excs = builder.build_boards( - commits, board_selected, args.keep_outputs, args.verbose, - args.fragments) + local_boards = board_selected + remote_thread = None + worker_pool = None + extra_count = 0 + old_sigint = None + + if args.distribute: + (local_boards, remote_thread, worker_pool, + extra_count, old_sigint) = _start_remote_builds( + builder, commits, board_selected, args) + + try: + fail, warned, excs = builder.build_boards( + commits, local_boards, args.keep_outputs, + args.verbose, args.fragments, + extra_count=extra_count, + delay_summary=bool(remote_thread)) + except KeyboardInterrupt: + if worker_pool: + worker_pool.close_all() + raise + + _finish_remote_builds(remote_thread, worker_pool, + old_sigint, builder) + if args.build_summary: builder.commits = commits builder.result_handler.show_summary( @@ -762,7 +1019,17 @@ def do_buildman(args, toolchains=None, make_func=None, brds=None, # Handle --worker: run in worker mode for distributed builds if args.worker: from buildman import worker # pylint: disable=C0415 - return worker.do_worker() + return worker.do_worker(args.debug) + + # Handle --kill-workers: kill stale workers and exit + if args.kill_workers: + from buildman import boss # pylint: disable=C0415 + + machines_config = machine.get_machines_config() + if not machines_config: + print('No machines configured') + return 1 + return boss.kill_workers(machines_config) # Handle --machines: probe remote machines and show status if args.machines or args.machines_fetch_arch: @@ -770,6 +1037,14 @@ def do_buildman(args, toolchains=None, make_func=None, brds=None, col, fetch=args.machines_fetch_arch, buildman_path=args.machines_buildman_path) + # --use-machines implies --dist + if args.use_machines: + args.distribute = True + + if args.no_local and not args.distribute: + print('--no-local requires --dist') + return 1 + git_dir = os.path.join(args.git, '.git') toolchains = get_toolchains(toolchains, col, args.override_toolchain, -- 2.43.0
From: Simon Glass <sjg@chromium.org> Add a 'Distributed builds' section covering machine configuration, probing, the --dist, --use-machines and --no-local flags, per-machine max_boards config, which build flags are forwarded to workers, debug mode and the protocol overview. Fix test_full_help to compare UTF-8 byte lengths, since the protocol overview uses → (U+2192) arrows which are multi-byte in UTF-8. Signed-off-by: Simon Glass <sjg@chromium.org> --- tools/buildman/buildman.rst | 132 ++++++++++++++++++++++++++++++++++++ tools/buildman/func_test.py | 3 +- 2 files changed, 134 insertions(+), 1 deletion(-) diff --git a/tools/buildman/buildman.rst b/tools/buildman/buildman.rst index b64b52e9353..2bc0265a61d 100644 --- a/tools/buildman/buildman.rst +++ b/tools/buildman/buildman.rst @@ -1517,6 +1517,138 @@ output directory. It also writes the commands used to build U-Boot in an `out-cmd` file. You can check these if you suspect something strange is happening. +Distributed builds +------------------- + +Buildman can distribute builds across multiple machines over SSH, so that +builds which need cross-compilers not available locally can be offloaded to +machines that have them. + +To use this, add a ``[machines]`` section to your ``~/.buildman`` config file +listing the remote machines:: + + [machines] + ohau + moa + myserver = user@build1.example.com + +Each entry is either a bare hostname (used as both the name and SSH target) or +a ``name = hostname`` pair. The machines must be accessible via SSH without a +password (use ``ssh-agent`` or key-based authentication). + +Per-machine settings can be added in ``[machine:<name>]`` sections:: + + [machine:ruru] + max_boards = 64 + +The ``max_boards`` setting limits how many boards a machine builds +concurrently. By default, a machine builds one board per CPU thread, but +machines with very high thread counts (e.g. 256) can suffer from resource +contention when every thread runs a separate build. Capping concurrent +boards means each board gets a higher ``make -j`` value (e.g. 256 threads / +64 boards = ``-j4``), reducing contention and improving throughput. + +A lower ``max_boards`` also reduces the build "tail": when the shared pool +of boards is nearly empty, a capped machine has fewer in-flight boards and +finishes sooner, while the remaining pool boards flow to other machines. + +You can check which machines are reachable and what toolchains they have:: + + buildman --mach + +This probes each machine over SSH and reports its architecture, thread count, +load average, available memory and disk space. Machines that are too busy (load +average exceeds 80% of CPU count), low on disk (<1 GB) or low on memory +(<512 MB) are marked as unavailable and will not receive builds. Re-run +``--mach`` after the load drops to see them become available again. + +To fetch missing toolchains on the remote machines:: + + buildman --mach --machines-fetch-arch + +Once machines are configured, use ``--dist`` to distribute builds:: + + buildman --dist arm + +This probes the configured machines, checks which toolchains each has, and +splits the selected boards between local and remote workers based on +architecture support. Each remote worker builds its assigned boards in parallel, +using all available CPU threads. + +Use ``--use-machines`` to select specific machines:: + + buildman --use-machines ohau,moa arm + +This implies ``--dist``, so there is no need to pass both. + +Use ``--no-local`` to skip local builds entirely and send everything to the +remote machines:: + + buildman --dist --no-local arm + +Most build flags are forwarded to remote workers automatically. For example +``-f`` (force build), ``-L`` (no LTO), ``-M`` (allow missing blobs), +``-V`` (verbose build), ``-E`` (warnings as errors), ``-m`` (mrproper), +``--fallback-mrproper``, ``--config-only`` and ``-r`` (reproducible builds) +all take effect on remote workers just as they do locally. The ``-f`` flag +also cleans stale output on the workers so that builds start from scratch. + +Use ``-D`` to enable debug output from the workers, which shows each build +as it starts and finishes:: + + buildman --dist -D arm + +The ``--machines-buildman-path`` option allows specifying a custom path to +buildman on the remote machines, if it is not in the default ``PATH``:: + + buildman --dist --machines-buildman-path /opt/tools/buildman arm + +The distributed build protocol uses JSON messages over SSH stdin/stdout. +The boss pushes the local source tree to each worker via ``git push``, so +workers always build with the same code as the boss. Build results (return +codes, stdout, stderr, sizes) are streamed back and written into the same +output directory structure as local builds. + +Per-worker log files are written to the output directory as +``worker-<hostname>.log`` for debugging protocol issues. + +Protocol overview +~~~~~~~~~~~~~~~~~ + +The boss starts each worker via ``ssh host buildman --worker``. The worker +reads JSON commands from stdin and writes ``BM>``-prefixed JSON responses +to stdout. Stderr is forwarded to the boss for diagnostics. + +A typical session looks like:: + + boss → worker: {"cmd": "setup", "work_dir": "~/dev/.bm-worker"} + worker → boss: BM> {"resp": "setup_done", "work_dir": "...", "git_dir": "..."} + + boss: git push host:~/.bm-worker/.git HEAD:refs/heads/work + + boss → worker: {"cmd": "configure", "settings": {"no_lto": true, ...}} + worker → boss: BM> {"resp": "configure_done"} + + boss → worker: {"cmd": "build_prepare", "commits": ["abc123", ...]} + worker → boss: BM> {"resp": "build_started", "num_threads": 8} + worker → boss: BM> {"resp": "worktree_created", "thread": 0} + ... + worker → boss: BM> {"resp": "build_prepare_done"} + + boss → worker: {"cmd": "build_board", "board": "sandbox", "arch": "sandbox"} + worker → boss: BM> {"resp": "build_result", "board": "sandbox", ...} + + boss → worker: {"cmd": "build_done"} + worker → boss: BM> {"resp": "build_done", "exceptions": 0} + + boss → worker: {"cmd": "quit"} + worker → boss: BM> {"resp": "quit_ack"} + +The boss uses demand-driven dispatch: it sends an initial batch of boards to +each worker, then sends more as results come back. This naturally balances load +across workers with different build speeds. + + TODO ---- diff --git a/tools/buildman/func_test.py b/tools/buildman/func_test.py index aa206cf75df..90c4974960e 100644 --- a/tools/buildman/func_test.py +++ b/tools/buildman/func_test.py @@ -298,7 +298,8 @@ class TestFunctional(unittest.TestCase): # Remove possible extraneous strings extra = '::::::::::::::\n' + help_file + '\n::::::::::::::\n' gothelp = result.stdout.replace(extra, '') - self.assertEqual(len(gothelp), os.path.getsize(help_file)) + self.assertEqual(len(gothelp.encode('utf-8')), + os.path.getsize(help_file)) self.assertEqual(0, len(result.stderr)) self.assertEqual(0, result.return_code) -- 2.43.0
participants (1)
-
Simon Glass