<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Network performance comparison	</title>
	<atom:link href="https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/</link>
	<description>A running description of activity related to DragonFly BSD.</description>
	<lastBuildDate>Wed, 08 Mar 2017 21:08:20 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>
		By: Anonymous		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486228</link>

		<dc:creator><![CDATA[Anonymous]]></dc:creator>
		<pubDate>Wed, 08 Mar 2017 21:08:20 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486228</guid>

					<description><![CDATA[My experiences: Routing in DF is slower than in FBSD.]]></description>
			<content:encoded><![CDATA[<p>My experiences: Routing in DF is slower than in FBSD.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Sepherosa Ziehau		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486223</link>

		<dc:creator><![CDATA[Sepherosa Ziehau]]></dc:creator>
		<pubDate>Tue, 07 Mar 2017 13:30:47 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486223</guid>

					<description><![CDATA[xdp is just another netmap/dpdk work-alike, which bypasses kernel network stack and directly operates on the NICs.  Other folks may be interested to complete the netmap port (we did have netmap imported but never finished), but not me; I&#039;d focus on improving the kernel network stack.]]></description>
			<content:encoded><![CDATA[<p>xdp is just another netmap/dpdk work-alike, which bypasses kernel network stack and directly operates on the NICs.  Other folks may be interested to complete the netmap port (we did have netmap imported but never finished), but not me; I&#8217;d focus on improving the kernel network stack.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: ????????? ?????????????????? ??????? ?????????? DragonFly BSD, FreeBSD ? ???? Linux &#8212; IT-News.club		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486222</link>

		<dc:creator><![CDATA[????????? ?????????????????? ??????? ?????????? DragonFly BSD, FreeBSD ? ???? Linux &#8212; IT-News.club]]></dc:creator>
		<pubDate>Tue, 07 Mar 2017 11:13:15 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486222</guid>

					<description><![CDATA[[&#8230;] opennet.ru dragonflydigest.com [&#8230;]]]></description>
			<content:encoded><![CDATA[<p>[&#8230;] opennet.ru dragonflydigest.com [&#8230;]</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Sepherosa Ziehau		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486220</link>

		<dc:creator><![CDATA[Sepherosa Ziehau]]></dc:creator>
		<pubDate>Tue, 07 Mar 2017 01:49:30 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486220</guid>

					<description><![CDATA[@Jared

Thank you very much, I will update my presentation.]]></description>
			<content:encoded><![CDATA[<p>@Jared</p>
<p>Thank you very much, I will update my presentation.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: bsd		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486219</link>

		<dc:creator><![CDATA[bsd]]></dc:creator>
		<pubDate>Tue, 07 Mar 2017 01:07:01 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486219</guid>

					<description><![CDATA[Do we understand the times we live? CPU became the bottleneck. Are others bothering with stuff near the hard work deployed behind dfly&#039;s network stack and related? I DON&quot;T THINK SO! Think pc to pc cheap (hundred bucks) year201X ethernet connection. Why will I need insane amounts of threads just to have that? FreeBSD, Linux, etc, they all want something near or close to wire speed 14.88 Mpps on one thread! Enter network stack bypassing solutions, dpdk, netmap/ptnetmap, vale software switch in freeBSD. Linux went further, they want to have something that isn&#039;t run from userland, as in: it isn&#039;t bypassing the kernel or the tcp/ip stack, but it runs in concert with it. It is called XDP (eXpress Data Path) and it works with something that isn&#039;t yet in DragonflyBSD for reasons Sephe explained already at least two times, I am talking about the Mellanox InfiniBand cards, the ones that use the mlx4 driver, for example, since I can&#039;t think of something cheaper that is rated 40Gbit/s!

https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf

also read:

https://2016.eurobsdcon.org/PresentationSlides/NanakoMomiyama_TowardsFastIPForwarding.pdf

https://events.linuxfoundation.org/sites/events/files/slides/pushing-kernel-networking.pdf

https://people.netfilter.org/hawk/presentations/nfws2016/nfws2016_next_steps_for_linux.pdf

https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf

https://www.iovisor.org/technology/xdp]]></description>
			<content:encoded><![CDATA[<p>Do we understand the times we live? CPU became the bottleneck. Are others bothering with stuff near the hard work deployed behind dfly&#8217;s network stack and related? I DON&#8221;T THINK SO! Think pc to pc cheap (hundred bucks) year201X ethernet connection. Why will I need insane amounts of threads just to have that? FreeBSD, Linux, etc, they all want something near or close to wire speed 14.88 Mpps on one thread! Enter network stack bypassing solutions, dpdk, netmap/ptnetmap, vale software switch in freeBSD. Linux went further, they want to have something that isn&#8217;t run from userland, as in: it isn&#8217;t bypassing the kernel or the tcp/ip stack, but it runs in concert with it. It is called XDP (eXpress Data Path) and it works with something that isn&#8217;t yet in DragonflyBSD for reasons Sephe explained already at least two times, I am talking about the Mellanox InfiniBand cards, the ones that use the mlx4 driver, for example, since I can&#8217;t think of something cheaper that is rated 40Gbit/s!</p>
<p><a href="https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf" rel="nofollow ugc">https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf</a></p>
<p>also read:</p>
<p><a href="https://2016.eurobsdcon.org/PresentationSlides/NanakoMomiyama_TowardsFastIPForwarding.pdf" rel="nofollow ugc">https://2016.eurobsdcon.org/PresentationSlides/NanakoMomiyama_TowardsFastIPForwarding.pdf</a></p>
<p><a href="https://events.linuxfoundation.org/sites/events/files/slides/pushing-kernel-networking.pdf" rel="nofollow ugc">https://events.linuxfoundation.org/sites/events/files/slides/pushing-kernel-networking.pdf</a></p>
<p><a href="https://people.netfilter.org/hawk/presentations/nfws2016/nfws2016_next_steps_for_linux.pdf" rel="nofollow ugc">https://people.netfilter.org/hawk/presentations/nfws2016/nfws2016_next_steps_for_linux.pdf</a></p>
<p><a href="https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf" rel="nofollow ugc">https://people.netfilter.org/hawk/presentations/xdp2016/xdp_intro_and_use_cases_sep2016.pdf</a></p>
<p><a href="https://www.iovisor.org/technology/xdp" rel="nofollow ugc">https://www.iovisor.org/technology/xdp</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Trevor		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486218</link>

		<dc:creator><![CDATA[Trevor]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 22:39:01 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486218</guid>

					<description><![CDATA[Thanks Matt for the detailed response. 

Much appreciated.]]></description>
			<content:encoded><![CDATA[<p>Thanks Matt for the detailed response. </p>
<p>Much appreciated.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Matthew Dillon		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486217</link>

		<dc:creator><![CDATA[Matthew Dillon]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 22:02:01 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486217</guid>

					<description><![CDATA[If you had a 64-thread machine then 64 rings could potentially scale performance further, up to the wire-line cap.  There are still a lot of assumptions there (you have to have 64 MSI-X vectors and the firmware on the chipset has to be able to handle that many rings efficienctly too).  DragonFly is perfectly capable of handling 64, actualy, since 64 is a power of 2 its actually easier than handling 24.

The point on the graphs showing 16 vs 24 is that the linux driver is using a mode that is not well documented to get to 24.  The DFly driver is using what at the time we thought was the chipset limit as described by the chipset documentation.  In order to use the mode on DFly we will have to make adjustments to the way the kernel maps the hash.  Currently we use a simple mask (hence the power-of-2 requirement).  That will have to change to a table lookup in order to map to a non-power-of-2 number of rings.

Generally speaking, scaling the number of rings past available cpu threads should not lead any real improvement in performance.  If you have a 16-thread machine then 64 rings won&#039;t do any better than 16 rings, for example.  The issue with the number of rings is entirely a cpu localization issue.  The number of rings alone does not imply scale.

-Matt]]></description>
			<content:encoded><![CDATA[<p>If you had a 64-thread machine then 64 rings could potentially scale performance further, up to the wire-line cap.  There are still a lot of assumptions there (you have to have 64 MSI-X vectors and the firmware on the chipset has to be able to handle that many rings efficienctly too).  DragonFly is perfectly capable of handling 64, actualy, since 64 is a power of 2 its actually easier than handling 24.</p>
<p>The point on the graphs showing 16 vs 24 is that the linux driver is using a mode that is not well documented to get to 24.  The DFly driver is using what at the time we thought was the chipset limit as described by the chipset documentation.  In order to use the mode on DFly we will have to make adjustments to the way the kernel maps the hash.  Currently we use a simple mask (hence the power-of-2 requirement).  That will have to change to a table lookup in order to map to a non-power-of-2 number of rings.</p>
<p>Generally speaking, scaling the number of rings past available cpu threads should not lead any real improvement in performance.  If you have a 16-thread machine then 64 rings won&#8217;t do any better than 16 rings, for example.  The issue with the number of rings is entirely a cpu localization issue.  The number of rings alone does not imply scale.</p>
<p>-Matt</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Trevor Hillel		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486216</link>

		<dc:creator><![CDATA[Trevor Hillel]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 21:39:40 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486216</guid>

					<description><![CDATA[On the last slide of the performance benchmark it says that Linux can use all 64 rings ( but it was not tested as such)

I really wonder what the performance chart would look lot if all 64 rings had been test. 

Linux might be 4x the performance of Dragonfly ... which would be disingenuous to not include that if it&#039;s the case. 

Linux seems to scale as other have said linearly at a rate of 0.5Mpps/ring. 

So Linux with 64 rings might be 32Mpps, where as Dragonfly was on 8Mpps.]]></description>
			<content:encoded><![CDATA[<p>On the last slide of the performance benchmark it says that Linux can use all 64 rings ( but it was not tested as such)</p>
<p>I really wonder what the performance chart would look lot if all 64 rings had been test. </p>
<p>Linux might be 4x the performance of Dragonfly &#8230; which would be disingenuous to not include that if it&#8217;s the case. </p>
<p>Linux seems to scale as other have said linearly at a rate of 0.5Mpps/ring. </p>
<p>So Linux with 64 rings might be 32Mpps, where as Dragonfly was on 8Mpps.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: mer		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486215</link>

		<dc:creator><![CDATA[mer]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 21:15:42 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486215</guid>

					<description><![CDATA[Actually it&#039;s interesting comparing the different workloads (object size).  Yes, in raw numbers FreeBSD fares worst, but it doesn&#039;t change that much while the other ones show a big change.  DF at 220K for 1KB drops to a little under 140K for 16KB.  The others show similar, with FBSD dropping from say 77000 to 75000.
Performance stuff is always interesting.]]></description>
			<content:encoded><![CDATA[<p>Actually it&#8217;s interesting comparing the different workloads (object size).  Yes, in raw numbers FreeBSD fares worst, but it doesn&#8217;t change that much while the other ones show a big change.  DF at 220K for 1KB drops to a little under 140K for 16KB.  The others show similar, with FBSD dropping from say 77000 to 75000.<br />
Performance stuff is always interesting.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jared Barr		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486214</link>

		<dc:creator><![CDATA[Jared Barr]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 21:06:06 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486214</guid>

					<description><![CDATA[@Vijay, yes Linux TCP stack has been lockless since kernel 4.4. 

I agree the perf PDF slides should be updated since slide # 1 makes it seems only dragonflybsd is lockless and all others are lock based. 

More info on Linux being lockless

https://lwn.net/Articles/659199/]]></description>
			<content:encoded><![CDATA[<p>@Vijay, yes Linux TCP stack has been lockless since kernel 4.4. </p>
<p>I agree the perf PDF slides should be updated since slide # 1 makes it seems only dragonflybsd is lockless and all others are lock based. </p>
<p>More info on Linux being lockless</p>
<p><a href="https://lwn.net/Articles/659199/" rel="nofollow ugc">https://lwn.net/Articles/659199/</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Vijay		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486213</link>

		<dc:creator><![CDATA[Vijay]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 20:54:24 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486213</guid>

					<description><![CDATA[&quot;Francis: The older and newer Linux implementations are locking and lockless, and there’s a definite performance difference, which at least suggests though does not prove that it makes a difference.&quot;

JUSTIN: 

Are you suggesting that Linux 3.10 is lock-based and Linux 4.9 is lockless?

If so, that&#039;s major news. I must of missed the fact that Linux now is lockless

https://leaf.dragonflybsd.org/~sephe/perf_cmp.pdf]]></description>
			<content:encoded><![CDATA[<p>&#8220;Francis: The older and newer Linux implementations are locking and lockless, and there’s a definite performance difference, which at least suggests though does not prove that it makes a difference.&#8221;</p>
<p>JUSTIN: </p>
<p>Are you suggesting that Linux 3.10 is lock-based and Linux 4.9 is lockless?</p>
<p>If so, that&#8217;s major news. I must of missed the fact that Linux now is lockless</p>
<p><a href="https://leaf.dragonflybsd.org/~sephe/perf_cmp.pdf" rel="nofollow ugc">https://leaf.dragonflybsd.org/~sephe/perf_cmp.pdf</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Justin Sherrill		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486212</link>

		<dc:creator><![CDATA[Justin Sherrill]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 20:46:17 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486212</guid>

					<description><![CDATA[Francis: The older and newer Linux implementations are locking and lockless, and there&#039;s a definite performance difference, which at least suggests though does not prove that it makes a difference.  

I understand what you say about latency being consistent - but it is also lower, which I think is the real positive feature.

As for FreeBSD being worst - it&#039;s not doing well in this benchmark, *but* this is a very specific benchmark, with all the caveats that implies.]]></description>
			<content:encoded><![CDATA[<p>Francis: The older and newer Linux implementations are locking and lockless, and there&#8217;s a definite performance difference, which at least suggests though does not prove that it makes a difference.  </p>
<p>I understand what you say about latency being consistent &#8211; but it is also lower, which I think is the real positive feature.</p>
<p>As for FreeBSD being worst &#8211; it&#8217;s not doing well in this benchmark, *but* this is a very specific benchmark, with all the caveats that implies.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Francis		</title>
		<link>https://www.dragonflydigest.com/2017/03/06/network-performance-comparison/comment-page-1/#comment-486210</link>

		<dc:creator><![CDATA[Francis]]></dc:creator>
		<pubDate>Mon, 06 Mar 2017 19:48:53 +0000</pubDate>
		<guid isPermaLink="false">https://www.dragonflydigest.com/?p=19425#comment-486210</guid>

					<description><![CDATA[@Justin

&#062;&#062;&quot;That, if anything, is the real takeaway; that DragonFly’s model has benefits not just to plain speed but to the system’s responsiveness under load.&quot;

Is that an accurate conclusion?

The takeaway to me seems to be:

1. Lock based (Linux) and Lockless (Dragonfly) implementations has no impact on overall throughout. 

2. Dragonfly has a smaller standard deviation of what latency will exist. Said another, latency is more consistent with Dragonfly than with Linux. 

3. Linux has more throughput for non powers of 2 based systems. (E.g. 24 cores).  

4. The myth that FreeBSD was best for networking is false. It&#039;s worst.]]></description>
			<content:encoded><![CDATA[<p>@Justin</p>
<p>&gt;&gt;&#8221;That, if anything, is the real takeaway; that DragonFly’s model has benefits not just to plain speed but to the system’s responsiveness under load.&#8221;</p>
<p>Is that an accurate conclusion?</p>
<p>The takeaway to me seems to be:</p>
<p>1. Lock based (Linux) and Lockless (Dragonfly) implementations has no impact on overall throughout. </p>
<p>2. Dragonfly has a smaller standard deviation of what latency will exist. Said another, latency is more consistent with Dragonfly than with Linux. </p>
<p>3. Linux has more throughput for non powers of 2 based systems. (E.g. 24 cores).  </p>
<p>4. The myth that FreeBSD was best for networking is false. It&#8217;s worst.</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
