Age | Commit message (Collapse) | Author |
|
When reading from multiple domains, allow each cache to continue
to hold data until writes occur somewhere. This is done by
first leaving the read_domains alone at bind time (presumably the CPU read
cache contains valid data still) and then in set_domain, if no write_domain
is specified, the new read domains are simply merged into the existing read
domains.
A huge comment was added above set_domain to explain how things are
expected to work.
|
|
Newly allocated objects need to be in the CPU domain as they've just been
cleared by the CPU. Also, unmapping objects from the GTT needs to put them
into the CPU domain, both to flush rendering as well as to ensure that any
paging action gets flushed before we remap to the GTT.
|
|
Commands in the ring are parsed and started when the head pointer passes by
them, but they are not necessarily finished until a MI_FLUSH happens. This
patch inserts a flush after the execbuffer (the only place a flush wasn't
already happening).
|
|
Ring locals must be reloaded from hardware in case the X server ran.
|
|
There are now 3 lists. Active is buffers currently in the ringbuffer.
Flushing is not in the ringbuffer, but needs a flush before unbinding.
Inactive is as before. This prevents object_free → unbind →
wait_rendering → object_reference and a kernel oops about weird refcounting.
This also avoids an synchronous extra flush and wait when freeing a buffer
which had a write_domain set (such as a temporary rendered to and then from
using the 2d engine). It will sit around on the flushing list until the
appropriate flush gets emitted, or we need the GTT space for another
operation.
|
|
|
|
Since it's a circular list, the entry won't be NULL at termination.
|
|
Otherwise, in the middle of the function called using it the last ref
might disappear.
|
|
Ring locals must be reloaded from hardware in case the X server ran.
|
|
This lets us get some qualities we desire, such as using the full 32-bit
range (except zero), avoiding DRM_WAIT_ON, and a 1:1 mapping of active
sequence numbers to request structs, which will be used soon for throttling
and interrupt-driven list cleanup.
|
|
|
|
Additionally, a boolean active field is added to indicate which list an
object is on, rather than smashing last_rendering_cookie to 0 to show
inactive. This will help with flush-reduction later on, and makes the code
clearer.
|
|
|
|
|
|
No need to fill the ring that much; wait for it to become nearly empty
before adding the execbuffer request. A better fix will involve scheduling
ring insertion in the irq handler.
|
|
pread and pwrite must update the memory domains to ensure consistency with
the GPU. At some point, it should be possible to avoid clflush through this
path, but that isn't working for me.
|
|
The exec list contains all objects, in order of use. The lru list contains
only unpinned objects ready to be evicted. This required two changes -- the
first was to not migrate pinned objects from exec to lru, the second was to
search for the first unpinned object in the exec list when doing eviction.
|
|
|
|
Rename 'validate_entry' to 'exec_object', then clean up some field names in
structures (renaming buffer_offset to just offset, for example).
|
|
It takes (offset, size), not (offset, end).
|
|
Now, the LRU list has objects that are completely done rendering and ready
to kick out, while the execution list has things with active rendering,
which have associated cookies and reference counts on them.
|
|
Still managing to get something wrong with this, oopsing down in agp.
|
|
|
|
|
|
Whitespace changes, a few too-long-lines and some extra braces.
|
|
|
|
Domain information is about buffer relationships, not buffer contents. That
means a relocation contains the domain information as it knows how the
source buffer references the target buffer.
This also adds the set_domain ioctl so that user space can move buffers to
the cpu domain.
|
|
This should already have been generally safe since we don't change contents
and put in new relocations between execbufs, so if we were writing in a new
relocation then we'd already waited rendering to complete when we moved
the target of the relocation. However, doing the right thing will be required
if we do buffer reuse.
|
|
One of our MI_FLUSH bits is reserved on 965, being always implied, and there's
a vertex cache that was forgotten.
|
|
If objects on the lru aren't ref counted, they'll get pulled from the gtt as
soon as they are freed. This change does cause objects to get stuck in the
gtt until they're forced out by new requests. The lru should get cleaned
when the irq occurs.
|
|
Memory domains allow the kernel to track which caches to flush and how to
move objects before buffer execution.
|
|
|
|
|
|
|
|
|
|
pages come back from find_or_create_page locked, but must not stay locked
for long. Unlock them immediately instead of waiting until we're done with
them to avoid deadlock when applications try to touch them.
|
|
Conflicts:
linux-core/i915_gem.c
|
|
I was wrong about how the data structure worked, and didn't care to fix it
to support debugging code.
|
|
|
|
The relocated value was being written to the wrong location, missing
the object base address.
|
|
|
|
|
|
This function submits a gem-based execbuffer to the ring.
It doesn't work yet.
|
|
|
|
|
|
Track named objects in /proc/dri/0/gem_names.
Track total object count in /proc/dri/0/gem_objects.
Initialize device gem data.
return -ENODEV for gem ioctls if the driver doesn't support gem.
Call unlock_page when unbinding from gtt.
Add numerous misssing calls to drm_gem_object_unreference.
|
|
Names are just another unique integer set (from another idr object).
Names are removed when the user refernces (handles) are all destroyed --
this required that handles for objects be counted separately from
internal kernel references (so that we can tell when the handles are all
gone).
|
|
Now that drm_gem_object has a drm_driver * in it, functions don't need both
parameters.
|
|
When pinning buffers, or using execbuffer, allow the application to specify
the necessary aperture allocation alignment constraints.
|
|
mixed 32/64 bit systems need 'special' help for ioctl where the user-space
and kernel-space datatypes differ. Fixing the datatypes to be the same size,
and align the same way for both 32 and 64-bit ppc and x86 environments will
elimiante the need to have magic 32/64-bit ioctl translation code.
|