Age | Commit message (Collapse) | Author |
|
|
|
When i915_gem_retire_request has a flush which matches an object write
domain, clear the write domain. This will move the object to the inactive
list rather than the flushing list, avoiding trouble with objects left stuck
on the flushing list.
|
|
In i915_gem_object_wait_rendering, if the object write domain is being
written by the GPU, the appropriate flushing commands are written to the
device and an additional request queued to mark that flush. Finally, the
function blocks on that new request.
The bug was that the write_domain in the object was cleared before the
function blocked.
If the wait is interrupted by a signal, the flushing commands may still be
pending. With the current write_domain information lost, the restarted
syscall will drop right through the write_domain test as that value was
lost, and so the function will not block at all. Oops.
Fixed by simply moving the write_domain clear until after the wait_request
succeeds. Note that the restarted system call will generate an additional
flush sequence and request, but that should be 'harmless', aside from a
slight performance impact.
Someday we'll track flushing more accurately and clear write_domains more
efficiently, but for now, this should suffice.
This bug was discovered in the 2d gem development by running x11perf
-copypixwin500 and noticing that the window got cleared accidentally.
|
|
This reverts commit 3ad8db2071d30c198403e605f2726fc5c3e46bfd.
We ended up not needing that namespace, and I'd rather not have the churn
for producing diffs.
|
|
Conflicts:
linux-core/Makefile.kernel
shared-core/i915_dma.c
shared-core/i915_drv.h
shared-core/i915_irq.c
|
|
|
|
Main fix is an oops that was triggered by the gtt pwrite path when we don't
have the gtt initialized. Also, settle on -EBADF for "bad object handle",
and -EINVAL for "reading/writing beyond object boundary".
|
|
This is around 3x or so speedup, since we would read wide rows at a time, and
clflush each tile 8 times as a result. We'll want code related to this anyway
when we do fault-based per-page clflushing for sw fallbacks.
|
|
|
|
Fixes an oops in fbotexture from walking off the end of the page list.
|
|
This increases overhead for the large-readpixels case due to the repeated
page cache accessing, but greatly reduces overhead for the small-readpixels
case.
|
|
These will be covered by the fence, while pread/pwrite are supposed to be
CPU-perspective writes, with manual detiling done by the client.
|
|
|
|
This requires an updated 2D driver to not try to set it up as well.
|
|
On some distros missing prototypes cause kernel builds to fail. These
are hack to make the code build.
|
|
Various chips have exciting interactions between the CPU and the GPU's
different ways of accessing interleaved memory, so we need some kernel
assistance in determining how it works.
Only fully tested on GM965 so far.
|
|
|
|
|
|
Clean up queues, free objects. On the next entervt, unmark the hardware to
let the user try again (presumably after resetting the chip). Someday we'll
automatically recover...
|
|
Pin/copy_from_user/unpin through the GTT to eliminate clflush costs.
Benchmarks say this helps quite a bit.
|
|
This tracks most of the interrupt-related status, including the
interrupt registers in the chip and the sequence number variables.
|
|
|
|
While waiting for the hardware to idle on leavevt or lastclose, poll
for the sync sequence number instead of waiting for an interrupt. This
allows the code to bail if the hardware hangs for some reason. Also, this
avoids issues with signals as the exisiting wait function is interruptible.
|
|
This adds gem_active, gem_flushing, gem_inactive, gem_request and gem_seqno
entries to monitor gem operation and help debug issues.
|
|
find_or_create_page doesn't quite set up pages correctly; any newly created
pages aren't hooked into the shmem object quite right; user space mmaps of
those pages end up mapping pages full of zeros which then get written to the
real pages inappropriately. This patch requires that the kernel export
shmem_getpage.
|
|
When a software fallback has completed, usermode must notify the kernel so
that any scanout buffers can be synchronized. This ioctl should be called
whenever a fallback completes to flush CPU and chipset caches.
|
|
In leavevt_ioctl, queue an MI_FLUSH and then block waiting for it to
complete. This will empty the active and flushing lists. That leaves only
the inactive list to evict.
|
|
Pin/unpin need to know whether to remove/add objects from the inactive list,
inactive objects cannot be in any GPU write domain as those would be on the
flushing list instead. However, inactive objects may be in the CPU write
domain.
|
|
Now that gem_object_unbind waits for rendering to complete, objects should
not be active when they are being pulled from the GTT. BUG_ON if this is
broken.
|
|
Inactive list elements may not be pinned, active or have non-CPU write
domains.
|
|
|
|
|
|
Moving to the CPU domain doesn't ensure that rendering is finished, the
buffer may still be in use as a texture or other data source.
|
|
Not quite portable, but these are useful for intel. Some more general
mechanism could be done...
|
|
Loop end variable 'pinned' was set one too low.
|
|
Pinning the objects avoids accidentally evicting them while binding
other objects.
|
|
Record the last execbuffer sequence for each client.
Record that sequence in the throttle ioctl as the 'throttle sequence'.
Wait for the last throttle sequence in the throttle ioctl.
|
|
When i915_wait_request clears object from the active list, it may end up
freeing them and not moving them to the inactive list. This ends up
unbinding objects from the GTT without there ever being new objects visible
to i915_gem_evict_something on the inactive list. As the only success
condition required the presence of objects on the inactive list, this would
falsely assume that no GTT space had been made available, and end up
returning -ENOMEM to the application.
|
|
We want request retirement to occur about once a second when the request
queue is non-empty. This was done with a timer that queued a work_struct,
using a delayed_work instead makes a lot more sense.
|
|
i915_add_request was calling schedule_delayed_work before adding the request
to the list; it makes more sense to do that last.
|
|
|
|
We no longer need to use it to protect against shared ringbuffer access.
|
|
|
|
This is the create (may want location flags), pread/pwrite/mmap
(performance tuning hints), and set_domain (will 32 bits be enough for
everyone?) ioctls. Left in the generic set are just flink/open/close.
The 2D driver must be updated for this change, and API but not ABI is broken
for 3D. The driver version is bumped to mark this.
|
|
They are not unnecessary since the kernel's the only thing touching the ring.
|
|
This requires that the X Server use the execbuf interface for buffer
submission, as it no longer has direct access to the ring. This is
therefore a flag day for the gem interface.
This also adds enter/leavevt ioctls for use by the X Server. These would
get stubbed out in a modesetting implementation, but are required while
in an environment where the device's state is only managed by the DRM while
X has the VT.
|
|
The driver can know what hardware requires MI_BATCH_BUFFER vs
MI_BATCH_BUFFER_START; there's no reason to let user mode configure this.
|
|
Without the user IRQ running constantly, there's no wakeup when the ring
empties to go retire requests and free buffers. Use a 1 second timer to make
that happen more often.
|
|
Instead of throttling and execbuffer time, have the application ask to
throttle explicitly. This allows the throttle to happen less often, and
without holding the DRM lock.
|
|
|