Age | Commit message (Collapse) | Author |
|
Notably when freeing a batchbuffer, we often end up freeing many of the
buffers it points at as well. Avoiding repeated calls brings us a 9% CPU
win for cairo-gl.
[ # ] backend test min(s) median(s) stddev. count
before:
[ 0] gl firefox-talos-gfx 58.941 58.966 0.75% 3/3
after:
[ 0] gl firefox-talos-gfx 54.186 54.195 0.49% 3/3
|
|
If the target we're asking about hasn't ever been used as a relocation
target, then it obviously hasn't been used as a target by the batch's reloc
tree. This is the common case for good GL programming where you only map
fresh buffers, and gives us a 5% win in cairo-gl.
[ # ] backend test min(s) median(s) stddev. count
before:
[ 0] gl firefox-talos-gfx 64.680 64.756 0.06% 3/3
after:
[ 0] gl firefox-talos-gfx 60.816 60.970 0.29% 3/3
|
|
Use the external implementation for atomic operations across a wide
range of architectures.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
I slipped it in with the alloc_tiled changes, since we were explicitly
throwing the parameter away. It caught some bogus released code, which
we've now fixed, so remove the asserts to keep old drivers working.
|
|
This simplifies driver code in handling object allocation, and also gives us
an opportunity to possibly cache tiled buffers if it turns out to be a win.
[anholt: This is chopped out of the execbuf2 patch, as it seems to be useful
separately and cleans up the execbuf2 changes to be more obvious]
|
|
|
|
This is done with:
Lindent *.[ch]
perl -pi -e 's|drm_intel_bo \* |drm_intel_bo *|g' *.[ch]
perl -pi -e 's|drm_intel_bufmgr \* |drm_intel_bufmgr *|g' *.[ch]
perl -pi -e 's|drm_intel_bo_gem \* |drm_intel_bo_gem *|g' *.[ch]
perl -pi -e 's|drm_intel_bufmgr_gem \* |drm_intel_bufmgr_gem *|g' *.[ch]
perl -pi -e 's|_fake \* |_fake *|g' *.[ch]
hand-editing to whack indented comments into line and other touchups.
|
|
This saves 32k of relocation entry storage for many 965 state buffers. No
noticeable impact on performance for cairo-gl firefox.
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As the target architecture for Intel GPUs is the x86, we can presume to
have reasonable compiler support for Intel atomic intrinsics, i.e. gcc,
and so use those in preference to pulling in a complicated mess of
fragile assembly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: hand-resolved against my previous commit. This brings cairo-gl
firefox-talos-gfx time from 65 seconds back down to 62 seconds.]
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
Set the DONTNEED flag on cached buffers so that the kernel is free to
discard those when under memory pressure.
[anholt: This takes firefox-talos-gfx time from ~62 seconds to ~65 seconds
on my GM965, but it seems like a hit worth taking for the improved
functionality from saving memory]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
This takes firefox-talos-gfx from 74 seconds to 70 seconds on my GM965.
|
|
There are a bunch of places in GL where if we can't do this we have to
flush the batchbuffer, and the cost of lookups here is outweighed by flush
savings.
|
|
I thought I was going to do all sorts of crazy experiments with it. I never
did, and it turned out the free-after-a-few-seconds plan is working out fine.
|
|
It hasn't been doing anything effective since
52e5d24fae4af6f2f4a5304a516c8c5ab347a11b, and we pretty much don't bo_map
pinned buffers any more anyway.
|
|
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
New ids for G41, Clarkdale and Arrandale.
Make sure we don't need to count fence also on new chips.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
The goal of the BO cache is to keep buffers on hand for fast continuous use,
as in every frame of a game or every batchbuffer of the X Server. Keeping
older buffers on hand not only doesn't serve this purpose, it may hurt
performance by resulting in disk cache getting kicked out, or even driving
the system to swap.
Bug #20766.
|
|
The logbase2 would overflow and wrap the size around to 0, making the code
allocate a 4kb object instead. By simplifying the code to just walk the
14-entry bucket array comparing sizes instead of indexing on
ffs(1 << logbase2(size)), we avoid silly math errors and have code of
approximately the same speed.
Many thanks to Simon Farnsworth for debugging and providing a working patch.
Bug #27365.
|
|
bug #21999
|
|
|
|
This avoids making objects significantly bigger than they would be
otherwise, which would result in some failing at binding to the GTT.
Found from firefox hanging on:
http://upload.wikimedia.org/wikipedia/commons/b/b7/Singapore_port_panorama.jpg
due to a software fallback trying to do a GTT-mapped copy between two 73MB
BOs that were instead each 128MB, and failing because both couldn't fit
simultaneously.
The cost here is that we get no opportunity to cache these objects and
avoid the mapping. But since the objects are a significant percentage
of the aperture size, each mapped access is likely having to fault and rebind
the object most of the time anyway.
Bug #20152 (2/3)
|
|
The convention is that all APIs are per-bufmgr, so make this one the same.
Then, have it return -1 on failure so that the application can know what's
going on and do something sensible.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
This wraps the new DRM_IOCTL_I915_GET_PIPE_FROM_CRTC_ID ioctl,
allowing applications to discover the pipe number corresponding
to a given CRTC ID. This is necessary for doing pipe-specific
operations such as waiting for vblank on a given CRTC.
|
|
Scanout buffers need to be freed through the kernel as it holds a reference
to them; exposing this API allows applications allocating scanout buffers to
flag them as not reusable.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Add assertions to drm_intel_gem_bo_reference,
drm_intel_gem_bo_reference_locked and drm_intel_gem_bo_unreference_locked
that the object has not been freed (refcount > 0). Mistakes in refcounting
lead to attempts to insert a bo into a free list more than once which causes
application failure as empty free lists are dereferenced as buffer objects.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Fixes assertion failures on later use of the object.
|
|
libdrm has some support for GTT mapping already, but there are bugs
with it (no surprise since it hasn't been used much).
In fixing 20803, I found that sharing bo_gem->virtual was a bad idea,
since a previously mapped object might not end up getting GTT mapped,
leading to corruption. So this patch splits the fields according to
use, taking care to unmap both at free time (but preserving the map
caching).
There's still a risk we might run out of mappings (there's a sysctl
tunable for max number of mappings per process, defaulted to 64k or so
it looks like) but at least GTT maps will work with these changes (and
some others for fixing PAT breakage in the kernel).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
This helps avoid the n^2 performance cost of counting tree size when we
get a lot of relocations into our batch buffer. rgb10text on keithp's laptop
went from 136k glyphs/sec to 234k glyphs/sec.
|
|
|
|
This avoids using the oldest BO in the BO cache and waiting for it to be
idle before we turn around and render to it with the GPU. Thanks to
Chris Wilson for pointing out how silly we were being.
|
|
The minor CPU cost here is probably outweighed by bothering us with noise in
the tool.
|
|
|
|
|
|
This patch tries to use the available fence count to figure out whether a
given batch can succeed or not (just like the aperture check).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
|
|
It's also unused, so worthless.
|
|
Remember tiling mode values provided by appplications, and
record tiling mode when creating a buffer from another application. This
eliminates any need to ask the kernel for tiling values and also makes
reused buffers get the right tiling.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Applications may actually care if the mapping operation failed, so when
it happens, return an error indication. errno is probably trashed by
fprintf though.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
The execbuffer ioctl returns ENOMEM when it fails to pin all of the buffers
in the GTT. This is usually caused by the DRM client attempting to use too
much memory in a single request. Dumping out the requested and available
memory values should help point out failures in the DRM code to catch over
commitments of this form.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Add mode setting files to libdrm, including xf86drmMode.* and the new
drm_mode.h header. Also add a couple of tests to sanity check the
kernel interfaces and update code to support them.
|
|
|
|
We wouldn't want some remaining 3D rendering to scribble on our batchbuffer.
|
|
|
|
This reverts commit 6656db10551bbb8770dd945b6d81d5138521f208.
We really just want the libdrm and ioctl bits, not all the driver
stuff.
|
|
|
|
This function can also serve the role that the bo_wait_rendering did, when
write_enable is unset.
|