Age | Commit message (Collapse) | Author |
|
Use the external implementation for atomic operations across a wide
range of architectures.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
I slipped it in with the alloc_tiled changes, since we were explicitly
throwing the parameter away. It caught some bogus released code, which
we've now fixed, so remove the asserts to keep old drivers working.
|
|
This simplifies driver code in handling object allocation, and also gives us
an opportunity to possibly cache tiled buffers if it turns out to be a win.
[anholt: This is chopped out of the execbuf2 patch, as it seems to be useful
separately and cleans up the execbuf2 changes to be more obvious]
|
|
|
|
This is done with:
Lindent *.[ch]
perl -pi -e 's|drm_intel_bo \* |drm_intel_bo *|g' *.[ch]
perl -pi -e 's|drm_intel_bufmgr \* |drm_intel_bufmgr *|g' *.[ch]
perl -pi -e 's|drm_intel_bo_gem \* |drm_intel_bo_gem *|g' *.[ch]
perl -pi -e 's|drm_intel_bufmgr_gem \* |drm_intel_bufmgr_gem *|g' *.[ch]
perl -pi -e 's|_fake \* |_fake *|g' *.[ch]
hand-editing to whack indented comments into line and other touchups.
|
|
This saves 32k of relocation entry storage for many 965 state buffers. No
noticeable impact on performance for cairo-gl firefox.
|
|
In conjunction with the atomic operation patch, it may be more
convenient for some people to disable building libdrm-intel and its
dependencies upon the atomic intrinsics then it is for them to use a
supported compiler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Nicolai Hähnle <nhaehnle@gmail.com>
|
|
As the target architecture for Intel GPUs is the x86, we can presume to
have reasonable compiler support for Intel atomic intrinsics, i.e. gcc,
and so use those in preference to pulling in a complicated mess of
fragile assembly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: hand-resolved against my previous commit. This brings cairo-gl
firefox-talos-gfx time from 65 seconds back down to 62 seconds.]
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
Set the DONTNEED flag on cached buffers so that the kernel is free to
discard those when under memory pressure.
[anholt: This takes firefox-talos-gfx time from ~62 seconds to ~65 seconds
on my GM965, but it seems like a hit worth taking for the improved
functionality from saving memory]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
This takes firefox-talos-gfx from 74 seconds to 70 seconds on my GM965.
|
|
There are a bunch of places in GL where if we can't do this we have to
flush the batchbuffer, and the cost of lookups here is outweighed by flush
savings.
|
|
I thought I was going to do all sorts of crazy experiments with it. I never
did, and it turned out the free-after-a-few-seconds plan is working out fine.
|
|
It hasn't been doing anything effective since
52e5d24fae4af6f2f4a5304a516c8c5ab347a11b, and we pretty much don't bo_map
pinned buffers any more anyway.
|
|
These were leaking.
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Not sure what intptr_t was up to here.
Reported and tested by: Kevin DeKorte
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
New ids for G41, Clarkdale and Arrandale.
Make sure we don't need to count fence also on new chips.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
This reverts commit 0a732983f059c353b267b6bf877e1f0eea4e033f.
Paul Nieminen and Dave Airlie pointed out on IRC that this shouldn't be
necessary. I was seeing visual corruption in X before I made this change, but
I can't reproduce that anymore so it was probably an unrelated issue.
|
|
This allows users to eliminate explicit bo_wait calls before bo_map calls.
|
|
airlied: modified the interface to drop busy return value, just return
it normally, also fixed int->uint32_t for domain
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
|
|
No idea why G80 doesn't hit this, but, this fixes at least one NV40 card.
|
|
Using this call in OUT_BATCH_TABLE reduces radeonEmitState cpu usage from
9% to 5% and emit_vpu goes from 7% to 1.5%. I did use calgrind to profile
gears for cpu hotspots with r500 card.
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
|
|
GCC did war about optimization not possible because possible forever loop.
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
|
|
Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
|
|
|
|
Nasty, but nicer than silently not writing into the pushbuf
|
|
|
|
Noticed by vehemens on irc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Signed-off-by: Eric Anholt <eric@anholt.net>
|
|
|
|
This caches the mapping and just use mapping as a sync point
|
|
|
|
|
|
|
|
inbalances cpu_prep/cpu_finish
- The bo was mapped with sysmem == NULL, so this means cpu prep is called.
- The bo was unmapped with sysmem != NULL, so this means cpu finish is not called.
- This can lead to a non-zero "cpu writers" count in ttm_bo.
|
|
The goal of the BO cache is to keep buffers on hand for fast continuous use,
as in every frame of a game or every batchbuffer of the X Server. Keeping
older buffers on hand not only doesn't serve this purpose, it may hurt
performance by resulting in disk cache getting kicked out, or even driving
the system to swap.
Bug #20766.
|
|
|
|
If call was interrupted by signal we have to make call again.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
The logbase2 would overflow and wrap the size around to 0, making the code
allocate a 4kb object instead. By simplifying the code to just walk the
14-entry bucket array comparing sizes instead of indexing on
ffs(1 << logbase2(size)), we avoid silly math errors and have code of
approximately the same speed.
Many thanks to Simon Farnsworth for debugging and providing a working patch.
Bug #27365.
|
|
bug #21999
|
|
integers.
|
|
Based on patch by Pauli Nieminen. Thanks.
|
|
This ports a lot of the space checking code into a the common
library, so that the DDX and mesa can use it.
|
|
We always realloc at least 0x1000 dwords (page on most system)
when growing the cs buffer this is to avoid having to realloc
at each cs_begin.
|
|
This should use ndw not cdw, using cdw leads to realloc alignment going wrong
|
|
|
|
the DDX does this and used to handle it internally
|