Age | Commit message (Collapse) | Author |
|
Rearrange the cache cleanup so that we always scan following a final
unreference, and guard against multiple scans in a single second.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
When allocating a tiled buffer, if we remove the desired tiling mode due
to it being beyond hardware limits, also remove the stride. This ensures
that we only ever use stride 0 with I915_TILING_NONE.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we now expose a method to allocate tiled buffers, it makes more sense
to defer the SET_TILING until required. Besides the slim chance that it
will be a no-op, by delaying the change we are less likely to stall on
waiting for a bound buffer to release a fence register.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We need to inform the kernel if the tiling stride changes and not only
for changes of the tiling mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
We had two cases recently where the rounding to powers of two hurt
badly: 4:2:0 YUV HD video frames would round up from 2.2MB to 4MB, and
Urban Terror was hitting aperture size limitations. For UT, this is
because mipmap trees for power of two texture sizes will land right in
the middle between two cache buckets.
By giving a few more sizes between powers of two, Urban Terror on my
945 ends up consuming 207MB of GEM objects instead of 272MB, and HD
video decode on Ironlake goes from 99MB to 75MB.
cairo-perf-diff of the benchmarks for gl and xlib shows a 1.09x and
1.06x speedup and a 1.07x, 1.08x, and 1.11x slowdown. From this, I
think this patch was really a no-op in terms of performance for these
CPU-bound workloads.
|
|
If the pitch is too large for the hardware to tile, recompute the
required surface size based on the untiled pitch and alignments. For the
older hardware, which has smaller limits and greater restrictions, this
may be a considerable saving in allocation size.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
I'm using this in experiments with the i965 Mesa driver.
|
|
This introduces a new API to exec on BSD ring buffer, for H.264 VLD
decoding.
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
|
|
|
|
Fixes:
Bug 26686 - Some textures are distorted with libdrm 2.4.18 in GTAVC>A3
http://bugs.freedesktop.org/show_bug.cgi?id=26686
This bug continues to haunt me. The kernel SET_TILING ioctl is
inconsistent in its return values when reporting an error. If one of its
sanity checks fail, then the input values are left unchanged. If the
kernel later fails to change the tiling mode, then the input values are
modified to match the current tiling on the object. In short, userspace
cannot trust the return values upon error and so we must assume that
upon error our current tiling mode matches reality and not update.
|
|
This reverts commit 7ca558494dd3f68f29bb6ca981de9b8f49620b60.
This was pushed ahead of an essential review of bo level locking in
mesa, without which we cannot know whether removing this lock is safe.
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
Thomas tracked down this error with kdm and commit b509640:
==4320== Invalid write of size 8
==4320== at 0x9A97998: do_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0x9A97B9C: drm_intel_gem_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0xAED3234: intel_batchbuffer_emit_reloc (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF13827: brw_emit_vertices (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF1F14D: brw_upload_state (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF12122: brw_draw_prims (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xB256824: vbo_exec_vtx_flush (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB2523BB: vbo_exec_FlushVertices_internal (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB252411: vbo_exec_FlushVertices (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0xB195A3D: _mesa_PopAttrib (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0x8DF0F02: __glXDisp_Render (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF517F: __glXDispatch (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== Address 0x126a8b80 is 0 bytes after a block of size 16,368 alloc'd
==4320== at 0x4C23E03: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4320== by 0x9A97A64: do_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0x9A97B9C: drm_intel_gem_bo_emit_reloc (in /usr/lib/libdrm_intel.so.1.0.0)
==4320== by 0xAED3234: intel_batchbuffer_emit_reloc (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF191DB: upload_binding_table_pointers (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF1F14D: brw_upload_state (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xAF12122: brw_draw_prims (in /usr/lib/xorg/modules/dri/i965_dri.so)
==4320== by 0xB255EF6: vbo_exec_DrawArrays (in /usr/lib/xorg/modules/dri/libdricore.so)
==4320== by 0x8DF67A3: __glXDisp_DrawArrays (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF0F02: __glXDisp_Render (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x8DF517F: __glXDispatch (in /usr/lib/xorg/modules/extensions/libglx.xorg)
==4320== by 0x446293: ??? (in /usr/bin/Xorg)
which is simply due to only allocating space for the pointers and not
the structs themselves. D'oh.
Reported-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Suggested-by: Rémi Cardona <remi@gentoo.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
intel_bufmgr.h is installed in ${includedir} directly, and the other
headers are taken care of by libdrm.pc's Cflags.
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
This is the largest untiled pitch requirement from gen2 through gen4.
It's only the case for gen3 rendering to color regions with depth, but
it's rare for this to be a significant factor in memory usage -- for
example, gen4 requires 1 or 2 times the element size, or up to 64
bytes depending on the size of the elements. This is easier than
encoding all the various little quirks for untiled pitch alignment,
since we rarely do untiled now.
|
|
intel_atomic.h includes very usefull atomic operations for
lock free parrallel access of variables. Moving these to
core libdrm for code sharing with radeon.
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
|
|
Repeat while EINTR, not EAGAIN! One more source of corruption
erradicated, hurray!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes piglit depth-tex-modes on gen4.
|
|
Ensure that errors from the kernel are propagated back to the caller,
and not masked with return 0;
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes fbo-copyteximage on i915 with texture tiling and execbuf2 fenced
relocs.
|
|
|
|
This allows Mesa to use drm_intel_bo_alloc_tiled() for its tiled
buffers, since it makes its decision about pitch before telling
libdrm. They happen to be the same choices for the tiled case.
|
|
Luckily I caught the bug with the first consumer of the interface.
|
|
|
|
Saves a bunch of comparisons in hot paths.
|
|
This patch to libdrm adds support for the new execbuf2 ioctl. If
detected, it will be used instead of the old ioctl. By using the new
drm_intel_bufmgr_gem_enable_fenced_relocs(), you can indicate that any
time a fence register is actually required for a relocation target you
will call drm_intel_bo_emit_reloc_fence instead of
drm_intel_bo_emit_reloc, which will reduce fence register pressure.
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
|
|
The SET_TILING is pernicious in that it overwrites the input arguments
following an error in order to report the current tiling state of the
buffer. This caught us by surprise as we then fed those arguments back
into to the ioctl unmodified following an EINTR and so the kernel then
reported success for the no-op. We interpreted this success as meaning
that the tiling on the buffer had changed so updated our state and
started using the buffer incorrectly in the new tiled/untiled manner.
This lead to all sorts of random corruption and GPU hangs, even though
the batch buffers would look sane (when the GPU had not wandered off
into forbidden territory).
References:
Bug 25475 - [i915] Xorg crash / Execbuf while wedged
http://bugs.freedesktop.org/show_bug.cgi?id=25475
Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
http://bugs.freedesktop.org/show_bug.cgi?id=25554
(And probably every other weird bug in the last few months.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As the kernel reports the total number of fences, we must guess how many
fences are likely to be pinned. In the typical system these will be only
used by the scanout buffers, of which there may be one per pipe, and any
number of manually pinned fenced buffers. So take a conservative guess
and reserve two fences for use by the system.
Note this reduces the number of fences to 3 for i915 and prior.
Reference:
http://bugs.freedesktop.org/show_bug.cgi?id=25911
The latest intel driver 2.10.0 causes kernel oops and system hangs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|
|
Signed-off-by: Alan Coopersmith <alan.coopersmith@sun.com>
|
|
Signed-off-by: Alan Coopersmith <alan.coopersmith@sun.com>
|
|
Don't store the error return in bo_gem->gtt_virtual or else we will
attempt to use that as a valid pointer in future mappings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This removes the foremost prolific user of mutexes in libdrm_intel.so.
The other uses of the bufmgr_gem->mutex to serial access to individual
bos are currently required by Mesa, and are far less frequent.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: This chunk looks good...]
Acked-by: Eric Anholt <eric@anholt.net>
|
|
This has the unfortunate behaviour of releasing our malloc cache, but
the alternative is for X to consume a couple of gigabytes of ram and
die during testing. Fortunately the extra mallocs have little impact on
performance whereas avoiding swap and death, lots.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Instead of forcing the caller to check after every emit_reloc(), we can
flag the object as being in error, propagating that error upwards through
the relocation tree, and failing the eventual batch buffer execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
EAGAIN cannot be raised by the current code, but the system call maybe
interrupted and so return EINTR.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Hitting this error lead to a segfault:
intel_bufmgr_gem.c:919: Error mapping buffer 48607 (pixmap):
Cannot allocate memory.
because the errno was reused as the function return value after being
reset by the fprintf(), so caller thought the mapping had succeeded. The
convention established by libdrm is that the return value is the
negative errno and that uses of libdrm cannot trust the value of errno
afterwards, but must use the return code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Buffers on the relocation tree are guarded by the reference to the batch
object and so do not need an extra reference whilst constructing the
list of execution buffer objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Having been bitten by a missing EINTR check during mmap_gtt(), I thought
it prudent to add some more protection around the ioctls.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
This allows us to keep the assert added in the previous commit that we do
not modify the tree_reloc_size after inserting the buffer into a relocation
tree, which was being hit here:
#0 0xb78c2424 in __kernel_vsyscall ()
#1 0xb74f6401 in raise () from /lib/libc.so.6
#2 0xb74f7b42 in abort () from /lib/libc.so.6
#3 0xb74ef5a8 in __assert_fail () from /lib/libc.so.6
#4 0xb737e78b in drm_intel_bo_gem_set_in_aperture_size (bufmgr_gem=<value optimized out>, bo_gem=0x6) at intel_bufmgr_gem.c:373
#5 0xb737f519 in drm_intel_gem_bo_set_tiling (bo=0xa1030a0, tiling_mode=0xbff6c85c, stride=0) at intel_bufmgr_gem.c:1386
#6 0xb737f67f in drm_intel_gem_bo_unreference_final (bo=0xa1030a0, time=<value optimized out>) at intel_bufmgr_gem.c:768
#7 0xb737f5e3 in drm_intel_gem_bo_unreference_locked_timed (bo=0xa1e50d0, time=<value optimized out>) at intel_bufmgr_gem.c:805
#8 drm_intel_gem_bo_unreference_final (bo=0xa1e50d0, time=<value optimized out>) at intel_bufmgr_gem.c:756
#9 0xb737fcbb in drm_intel_gem_bo_unreference (bo=0xa1e50d0) at intel_bufmgr_gem.c:821
#10 0xb737b4e6 in drm_intel_bo_unreference (bo=0x0) at intel_bufmgr.c:80
#11 0xb7325625 in intel_batch_flush (scrn=0x9d91f78, flush=1) at i830_batchbuffer.c:200
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
For the older chipsets, i.e. pre-i965, which have severe alignment
restrictions for tiled buffers we need to pessimistically assume that we
will waste the size of buffer to meet those alignment constraints.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the kernel immediately frees the backing store for a buffer when
marking it purgeable, then there is not point adding to the cache. Free
it immediately, instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|