Discussion:
SCPClient slower than scp
Roman Yakovenko
2010-02-01 07:52:42 UTC
Permalink
Hello.

I am using SCPClient class from branch(
http://bazaar.launchpad.net/~jbardin/paramiko/paramiko_scp/annotate/500?file_id=scp.py-20081117202350-5q0ozjv6zz9ww66y-1
) with paramiko 1.7.6 and Python 2.6 on Ubuntu "Karmic Koala".

I am testing my code with file size 1 GB.

The SCPClient upload rate starts with 10 MB/s and than drops to 5.2
MB/s. The average is 5.2 MB/s. I tried to change buffer size, but this
didn't help
The scp command upload rate starts with 20 MB/s and then drops to 10
MB/s. The average is 10 MB/s.
To complete the statistics, paramiko built-in SFTPClient average rate
is 2.2 MB. I use "put" method as is, with the default configuration.

I am not sure where to start to solve the problem. Initially, I
suspected that "local file reading" is a problem, but that
functionality works pretty well.
Right now, I am using work around ( executing scp with subprocess )
but it is less than optimal solution.

Any help is appreciated.

Thank you
--
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/
james bardin
2010-02-01 15:36:22 UTC
Permalink
On Mon, Feb 1, 2010 at 2:52 AM, Roman Yakovenko
Post by Roman Yakovenko
Hello.
I am using SCPClient class from branch(
http://bazaar.launchpad.net/~jbardin/paramiko/paramiko_scp/annotate/500?file_id=scp.py-20081117202350-5q0ozjv6zz9ww66y-1
) with paramiko 1.7.6 and Python 2.6 on Ubuntu "Karmic Koala".
I am testing my code with file size 1 GB.
The SCPClient upload rate starts with 10 MB/s and than drops to 5.2
MB/s. The average is 5.2 MB/s. I tried to change buffer size, but this
didn't help
The scp command upload rate starts with 20 MB/s and then drops to 10
MB/s. The average is 10 MB/s.
To complete the statistics, paramiko built-in SFTPClient average rate
is 2.2 MB. I use "put" method as is, with the default configuration.
I am not sure where to start to solve the problem. Initially, I
suspected that "local file reading" is a problem, but that
functionality works pretty well.
You would normally start by using a profiler to see where the
performance bottleneck is, before you start speculating. You would
have seen that most of the time is spent in paramiko.Transport
manipulating data, and waiting for pyCrypto. SCPClient adds almost
nothing to the overall time.
Post by Roman Yakovenko
Right now, I am using work around ( executing scp with subprocess )
but it is less than optimal solution.
Any help is appreciated.
Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor. There may be
some places where optimizations could be made in paramiko and
pyCrypto, but I haven't looked into it myself.

-jim
Roman Yakovenko
2010-02-01 16:49:09 UTC
Permalink
Post by james bardin
On Mon, Feb 1, 2010 at 2:52 AM, Roman Yakovenko
Post by Roman Yakovenko
I am testing my code with file size 1 GB.
The SCPClient upload rate starts with 10 MB/s and than drops to 5.2
MB/s. The average is 5.2 MB/s. I tried to change buffer size, but this
didn't help
The scp command upload rate starts with 20 MB/s and then drops to 10
MB/s. The average is 10 MB/s.
To complete the statistics, paramiko built-in SFTPClient average rate
is 2.2 MB. I use "put" method as is, with the default configuration.
I am not sure where to start to solve the problem. Initially, I
suspected that "local file reading" is a problem, but that
functionality works pretty well.
You would normally start by using a profiler to see where the
performance bottleneck is, before you start speculating. You would
have seen that most of the time is spent in paramiko.Transport
manipulating data, and waiting for pyCrypto. SCPClient adds almost
nothing to the overall time.
Thanks for advice. I'll follow it. It was not a complete speculation.
The CPU usage was pretty same for all solutions.
Post by james bardin
Post by Roman Yakovenko
Right now, I am using work around ( executing scp with subprocess )
but it is less than optimal solution.
Any help is appreciated.
Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor.
I have zero experience in ssh and encryption, but my expection was
that at least in the case of transfering 10+ Gb files, the process
will be bounded by network and not CPU.
Post by james bardin
There may be
some places where optimizations could be made in paramiko and
pyCrypto, but I haven't looked into it myself.
Thank you.
--
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/
james bardin
2010-02-01 17:16:14 UTC
Permalink
On Mon, Feb 1, 2010 at 11:49 AM, Roman Yakovenko
Post by Roman Yakovenko
Post by james bardin
You would normally start by using a profiler to see where the
performance bottleneck is, before you start speculating. You would
have seen that most of the time is spent in paramiko.Transport
manipulating data, and waiting for pyCrypto. SCPClient adds almost
nothing to the overall time.
Thanks for advice. I'll follow it. It was not a complete speculation.
The CPU usage was pretty same for all solutions.
There are some other limiting factors in both paramiko and
openssh(http://www.psc.edu/networking/projects/hpn-ssh/), but I have
always hit the cpu wall with paramiko long before anything else is
relevant. If you're maxing out 1 processor core for each, you're just
seeing the difference in the efficiency of the c code vs the python+c
code (pyCrypto does the heavy lifting in c).
Post by Roman Yakovenko
Post by james bardin
Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor.
I have zero experience in ssh and encryption, but my expection was
that at least in the case of transfering 10+ Gb files, the process
will be bounded by network and not CPU.
The size of the file has nothing to do with it once the connection and
negotiation time become irrelevant. It's an encrypted stream of data,
so you're limited by how fast you can process it, not by how long it
is.
james bardin
2010-02-01 22:26:46 UTC
Permalink
Post by Roman Yakovenko
Post by james bardin
Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor.
I have zero experience in ssh and encryption, but my expection was
that at least in the case of transfering 10+ Gb files, the process
will be bounded by network and not CPU.
Your email got me thinking, so I did a few tests:

The biggest boost in performance was had by using the latest
pycrypto(2.1.0). You'll get a deprecation warning from paramiko that
you can ignore for now (bug already submitted in github). There was a
change to the HMAC code that made a huge difference in paramiko's
performance.

I tried using a limited bandwidth connection, and paramiko was on par
with openssh when cpu wasn't a concern.
When bandwidth wasn't an issue (using loopback), paramiko was about
85% of the speed of openssh on my machine.

Each newer version of python2.X was slightly faster as well.
Roman Yakovenko
2010-02-01 22:51:16 UTC
Permalink
Post by james bardin
Post by Roman Yakovenko
Post by james bardin
Yes, the solution written entirely in c will be significantly faster.
Since this is mostly python, cpu is the limiting factor.
I have zero experience in ssh and encryption, but my expection was
that at least in the case of transfering 10+ Gb files, the process
will be bounded by network and not CPU.
The biggest boost in performance was had by using the latest
pycrypto(2.1.0). You'll get a deprecation warning from paramiko that
you can ignore for now (bug already submitted in github). There was a
change to the HMAC code that made a huge difference in paramiko's
performance.
I tried using a limited bandwidth connection, and paramiko was on par
with openssh when cpu wasn't a concern.
As expected, since sending data take much more time then encryption.
Post by james bardin
When bandwidth wasn't an issue (using loopback), paramiko was about
85% of the speed of openssh on my machine.
Those are really good news. I will try to upgrade the code.

I am using the real IP to test my code

( I found the following code on the internet )
import socket
import struct
import fcntl

def get_ip_address(fname='eth0'):
SIOCGIFADDR = 0x8915
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
io_result = fcntl.ioctl( s.fileno(), SIOCGIFADDR,
struct.pack('256s', fname[:15] ) )
return socket.inet_ntoa( io_result[20:24] )

from one side all requests goes via router, from the other side I have
"local" access to the both ends. In case of file transfer, the md5sum
is executed on both files and compared.
Post by james bardin
Each newer version of python2.X was slightly faster as well.
I am using Python 2.4 ( production sys admins are so conservative :-)
) and 2.6 in development, but as you noted there is no a big
difference between them.

Thank you for help.
--
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/
Loading...