Home Delphi Seattle 10, multi-threaded/core performance
Reply: 1

Delphi Seattle 10, multi-threaded/core performance

Jim Eckels
1#
Jim Eckels Published in 2017-11-14 19:53:23Z

I have an application that is 100% Delphi code. It is a 64 bit windows console application, with a workload manager, and a fixed number of workers. This is done via creation of threads and each thread is a worker. The thread does not die, it pulls works from its own queue that the workload manager populates.

This appears to work just fine.

What I am finding, however, is that on a 16 core system I am seeing processing times around 90 minutes (it has 2,000,000+ workloads; and each does database work). When I added 16 to 32 cores, I saw the performance drop! There is no database contention. Essentially, the DB is waiting for things to do.

Each thread has its own DB connection. Each thread's queries use only that threads connection.

I updated the Delphi MM to use ScaleMM2; which made a big improvement; but I am still at a loss as to why increasing cores reduces performance.

When app has 256 threads, on 32 cores, CPU total use at 80%. When app has 256 threads, on 16 cores, CPU total use at 100% (which is why I wanted to add cores) -- and it got slower :-(

I have applied as much as the advice as I can understand to the code-base.

ie - Functions not returning strings, using Const for arguments, protecting "shared" data with small critical sections (actually using Multi-read Exclusive Write). I currently do not assign processor affinity; I was reading conflicting advice on using it .. so I am currently not (would be trival to add, just not there today).

Questions - slanted towards I "think" the issue is around thread contention ...

How do I find confirm thread-contention is the issue? Are there tools available specifically for this type of contention identification? How can I determine what is using "heap" and what is not, to further reduce contention there?

Insights, guidance, pointers would be appreciated.

Can provide relevant code areas ... if I knew what was relevant.

Procedure TXETaskWorkloadExecuterThread.Enqueue(Const Workload: TXETaskWorkload);
Begin
  // protect your own queue
  FWorkloadQueue.Enter;
  FWorkloads.Add(Workload);
  FWorkloadQueue.Leave;
End;

Procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload);
Begin
  If FWorkloadCount >= FMaxQueueSize Then Begin
    WaitForEmptyQueue;
    FWorkloadCount := 0;
  End;

  FExecuters[FNextThread].Enqueue(Workload);
  // round-robin the queue
  Inc(FNextThread);
  Inc(FWorkloadCount);
  If FNextThread >= FWorkerThreads Then Begin
    FNextThread := 0;
  End;
End;


Function TXETaskWorkloadExecuterThread.Dequeue(Var Workload: TXETaskWorkload): Boolean;
Begin
  Workload := Nil;
  Result := False;

  FWorkloadQueue.Enter;
  Try
    If FNextWorkload < FWorkloads.Count Then Begin
      Workload := FWorkloads[FNextWorkload];
      Inc(FNextWorkload);
      If Workload Is TXETaskWorkLoadSynchronize Then Begin
        FreeAndNil(Workload);
        Exit;
      End;
      Result := True;
    End Else Begin
      FWorkloads.Clear;
      FNextWorkload := 0;
      FHaveWorkloadInQueue.ResetEvent;
      FEmptyAndFinishedQueue.SetEvent;
    End;
  Finally
    FWorkloadQueue.Leave;
  End;
End;

EDIT ---

Thanks for all the comments. Clarifications.

This system/VM has nothing else on it. The executable in question is the only thing using the CPU. Single threaded performance means linear. I have simply made this a divide/conquer. If I have 5,000,000 cars to park, and I have 30 drivers with 30 different parking lots. I can tell each driver to wait for the other drive to finish parking, it will be slower than telling 30 drivers to concurrently park cars.

Profiling in single threaded shows there is nothing that is causing this. I have seen mention on this board about Delphi and multi-core performance "gotcha's" (mostly related to string handling and LOCK).

The DB essentially is saying that it is bored, and waiting for things to do. I have checked with a copy of Intels vTune. Generally speaking, it says ... locks. But, I cannot find out where. What I have is pretty simple to my mind, and the current areas for locks are necessary and small. What I cannot see is locks that might be happening due to other things .. like strings creating a lock, or thread 1 causing some issue on the main process via accessing that data (even though protected via a critical section).

Continuing to research. Thanks again for the feedback/ideas.

Remy Lebeau
2#
Remy Lebeau Reply to 2017-11-15 00:10:39Z

Your Workload Manager is deciding which thread gets which work item. If a given thread blocks (say the work is long, DB latency, etc), you are queuing more items to that thread even though they might not get processed for awhile, if at all.

Typically, work items should be stored in a single shared queue that multiple threads then pull from. When any given thread is ready, it pulls the next available work item. For example:

constructor TXETaskManager.Create;
var
  I: Integer;
begin
  FWorkloadQueue := TCriticalSection.Create;
  FWorkloads := TList<TXETaskWorkload>.Create;
  FEmptyQueue := TEvent.Create(nil, True, True, '');
  FHaveWorkloadInQueue := TEvent.Create(nil, True, False, '');
  FNotFullQueue := TEvent.Create(nil, True, True, '');
  FTermEvent := TEvent.Create(nil, True, False, '');
  ...
  FMaxQueueSize := ...;
  FWorkerThreads := ...;
  for I := 0 to FWorkerThreads-1 do
    FExecuters[I] := TXETaskWorkloadExecuterThread.Create(Self);
end;

destructor TXETaskManager.Destroy;
begin
  for I := 0 to FWorkerThreads-1 do
    FExecuters[I].Terminate;
  FTermEvent.SetEvent;
  for I := 0 to FWorkerThreads-1 do
  begin
    FExecuters[I].WaitFor;
    FExecuters[I].Free;
  end;
  FWorkloadQueue.Free;
  FWorkloads.Free;
  FEmptyQueue.Free;
  FHaveWorkloadInQueue.Free;
  FNotFullQueue.Free;
  FTermEvent.Free;
  ...

  inherited;
end;

procedure TXETaskManager.Enqueue(Const Workload: TXETaskWorkload);
begin
  FWorkloadQueue.Enter;
  try
    while FWorkloads.Count >= FMaxQueueSize do
    begin
      FWorkloadQueue.Leave;
      FNotFullQueue.WaitFor(INFINITE);
      FWorkloadQueue.Enter;
    end;

    FWorkloads.Add(Workload);

    if FWorkloads.Count = 1 then
    begin
      FEmptyQueue.ResetEvent;
      FHaveWorkloadInQueue.SetEvent;
    end;

    if FWorkloads.Count >= FMaxQueueSize then
      FNotFullQueue.ResetEvent;
  finally
    FWorkloadQueue.Leave;
  end;
end;

function TXETaskManager.Dequeue(var Workload: TXETaskWorkload): Boolean;
begin
  Result := False;
  Workload := nil;

  FWorkloadQueue.Enter;
  try
    if FWorkloads.Count > 0 then
    begin
      Workload := FWorkloads[0];
      FWorkloads.Delete(0);
      Result := True;

      if FWorkloads.Count = (FMaxQueueSize-1) then
        FNotFullQueue.SetEvent;

      if FWorkloads.Count = 0 then
      begin
        FHaveWorkloadInQueue.ResetEvent;
        FEmptyQueue.SetEvent;
      end;
    end;
  finally
    FWorkloadQueue.Leave;
  end;
end;

constructor TXETaskWorkloadExecuterThread.Create(ATaskManager: TXETaskManager);
begin
  inherited Create(False);
  FTaskManager := ATaskManager;
end;

procedure TXETaskWorkloadExecuterThread.Execute;
var
  Arr: THandleObjectArray;
  Event: THandleObject;
  Workload: TXETaskWorkload;
begin
  SetLength(Arr, 2);
  Arr[0] := FTaskManager.FHaveWorkloadInQueue;
  Arr[1] := FTaskManager.FTermEvent;

  while not Terminated do
  begin
    case TEvent.WaitForMultiple(Arr, INFINITE, False, Event) of
      wrSignaled:
      begin
        if Event = FTaskManager.FHaveWorkloadInQueue then
        begin
          if FTaskManager.Dequeue(Workload) then
          try
            // process Workload as needed...
          finally
            Workload.Free;
          end;
        end;
      end;
      wrError: begin
        RaiseLastOSError;
      end;
    end;
  end;
end; 

If you find threads are not getting enough work, you can adjust your thread count as needed. You typically shouldn't be using very many more threads than you have CPU cores available.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.250262 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO