An upcoming patch needs to pass more than just sioc as the opaque
pointer to an AioContext; but since our AioContext code in general
(and its QIO Channel wrapper code) lacks a notify callback present
with GSource, we do not have the trivial option of just g_malloc'ing a
small struct to hold all that data coupled with a notify of g_free.
Instead, the data pointer must outlive the registered handler; in
fact, having the data pointer have the same lifetime as QIONetListener
is adequate.
But the cleanest way to stick such a helper struct in QIONetListener
will be to rearrange internal struct members. And that in turn means
that all existing code that currently directly accesses
listener->nsioc and listener->sioc[] should instead go through
accessor functions, to be immune to the upcoming struct layout
changes. So this patch adds accessor methods qio_net_listener_nsioc()
and qio_net_listener_sioc(), and puts them to use.
While at it, notice that the pattern of grabbing an sioc from the
listener only to turn around can call
qio_channel_socket_get_local_address is common enough to also warrant
the helper of qio_net_listener_get_local_address, and fix a copy-paste
error in the corresponding documentation.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-24-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The kernel allocates extra metadata SKBs in case of a zerocopy send,
eventually used for zerocopy's notification mechanism. This metadata
memory is accounted for in the OPTMEM limit. The kernel queues
completion notifications on the socket error queue and this error queue
is freed when userspace reads it.
Usually, in the case of in-order processing, the kernel will batch the
notifications and merge the metadata into a single SKB and free the
rest. As a result, it never exceeds the OPTMEM limit. However, if there
is any out-of-order processing or intermittent zerocopy failures, this
error chain can grow significantly, exhausting the OPTMEM limit. As a
result, all new sendmsg requests fail to allocate any new SKB, leading
to an ENOBUF error. Depending on the amount of data queued before the
flush (i.e., large live migration iterations), even large OPTMEM limits
are prone to failure.
To work around this, if we encounter an ENOBUF error with a zerocopy
sendmsg, flush the error queue and retry once more.
Co-authored-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[DB: change TRUE/FALSE to true/false for 'bool' type;
add more #ifdef QEMU_MSG_ZEROCOPY blocks]
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add a 'blocking' boolean field to QIOChannelSocket to track whether the
underlying socket is in blocking or non-blocking mode.
Signed-off-by: Tejus GK <tejus.gk@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Testing reading and writing from qemu-nbd using a unix domain socket
shows that the platform default send buffer size is too low, leading to
poor performance and hight cpu usage.
Add a helper for setting socket send buffer size to be used in NBD code.
It can also be used in other contexts.
We don't need a helper for receive buffer size since it is not used with
unix domain sockets. This is documented for Linux, and not documented
for macOS.
Failing to set the socket buffer size is not a fatal error, but the
caller may want to warn about the failure.
Signed-off-by: Nir Soffer <nirsof@gmail.com>
Message-ID: <20250517201154.88456-2-nirsof@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()
qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.
Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.
2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220513062836.965425-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20201014134033.14095-1-chetan4windows@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.
Patch generated using:
$ ./scripts/codeconverter/converter.py -i \
--pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')
which will split "typdef struct { ... } TypedefName"
declarations.
Followed by:
$ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
$(git grep -l '' -- '*.[ch]')
which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We have worked on qio_task_run_in_thread() already. Further, let
all the qio channel APIs use that context.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
SocketAddressLegacy is a simple union, and simple unions are awkward:
they have their variant members wrapped in a "data" object on the
wire, and require additional indirections in C. SocketAddress is the
equivalent flat union. Convert all users of SocketAddressLegacy to
SocketAddress, except for existing external interfaces.
See also commit fce5d53..9445673 and 85a82e8..c5f1ae3.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Minor editing accident fixed, commit message and a comment tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The next commit will rename SocketAddressFlat to SocketAddress, and
the commit after that will replace most uses of SocketAddressLegacy by
SocketAddress, replacing most of this commit's renames right back.
Note that checkpatch emits a few "line over 80 characters" warnings.
The long lines are all temporary; the SocketAddressLegacy replacement
will shorten them again.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
In the docs for qio_channel_socket_connect_async,
qio_channel_socket_listen_async and
qio_channel_socket_dgram_async, mention that the
SocketAddress parameters are copied, so can be freed
immediately.
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Implement a QIOChannel subclass that supports sockets I/O.
The implementation is able to manage a single socket file
descriptor, whether a TCP/UNIX listener, TCP/UNIX connection,
or a UDP datagram. It provides APIs which can listen and
connect either asynchronously or synchronously. Since there
is no asynchronous DNS lookup API available, it uses the
QIOTask helper for spawning a background thread to ensure
non-blocking operation.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>