Meeting the needs of your business from a distance

Ensure Smallest .NET Install Download for Clients

by Mark Shiffer 26. August 2008 20:20
If you are distributing software using .NET then you more than likely have come across the issue of ensuring your clients have the correct version of the .NET framework installed. Microsoft makes that simple enough with bootstrapper installs. However, if you want to keep the amount of bits that your clients download to a minimum so as to encourage your products be more freely distributed, then the following website might help. Scot Hanselman had a recent post where he explained the different sizes of the framework versions and then created a website that detects your framework version and what you will need to download. Could be a useful tool. Check it out here.

Tags: ,

Research | Websites | Programming

Free digital magazines

by Mark Shiffer 25. August 2008 21:58
Ran across this site and thought it was an interesting concept. If you just want to drop in and check out a magazine or a want to read an article or two without having to buy or subscribe this is a way to do it. Check out Mygazines.

Tags:

Shortcuts | Research | Websites

Organize Usings Across Your Entire Solution

by Mark Shiffer 25. August 2008 20:00

Macro to organize (remove and sort) using clauses in Visual Studio. (Original Source)

Public Module Module1
    Sub OrganizeSolution()
        Dim sol As Solution = DTE.Solution
        For i As Integer = 1 To sol.Projects.Count
            OrganizeProject(sol.Projects.Item(i))
        Next
    End Sub
 
    Private Sub OrganizeProject(ByVal proj As Project)
        For i As Integer = 1 To proj.ProjectItems.Count
            OrganizeProjectItem(proj.ProjectItems.Item(i))
        Next
    End Sub
 
    Private Sub OrganizeProjectItem(ByVal projectItem As ProjectItem)
        Dim fileIsOpen As Boolean = False
        If projectItem.Kind = Constants.vsProjectItemKindPhysicalFile Then
            'If this is a c# file 
            If projectItem.Name.LastIndexOf(".cs") = projectItem.Name.Length - 3 Then
                'Set flag to true if file is already open 
                fileIsOpen = projectItem.IsOpen
                Dim window As Window = projectItem.Open(Constants.vsViewKindCode)
                window.Activate()
                projectItem.Document.DTE.ExecuteCommand("Edit.RemoveAndSort")
                'Only close the file if it was not already open 
                If Not fileIsOpen Then
                    window.Close(vsSaveChanges.vsSaveChangesYes)
                End If
            End If
        End If
        'Be sure to apply RemoveAndSort on all of the ProjectItems. 
        If Not projectItem.ProjectItems Is Nothing Then
            For i As Integer = 1 To projectItem.ProjectItems.Count
                OrganizeProjectItem(projectItem.ProjectItems.Item(i))
            Next
        End If
        'Apply RemoveAndSort on a SubProject if it exists. 
        If Not projectItem.SubProject Is Nothing Then
            OrganizeProject(projectItem.SubProject)
        End If
    End Sub
End Module

Tags:

Tools | Programming

Javascript API frameworks

by Mark Shiffer 14. August 2008 17:13

I was doing some research and came across a list of some Javascript frameworks. I have not been doing a whole lot of web development lately, but I have definitely been exposed to the pains of Javascript and the compatability problems it has across browsers. The idea of using a framework to start from that handle many of the incompatabilities and accelerates javascript coding sounds great to me. Eventually I'll pick up one (or a few) of these APIs and give them a run.

There are several popular Javascript API frameworks to choose from:

  1. Prototype and Script.aculo.us
  2. JQuery
  3. Yahoo UI Library
  4. ExtJS
  5. Dojo
  6. MooTools

Tags: , ,

Research | Programming

Most Common Performance Issues in Parallel Programs

by Mark Shiffer 13. August 2008 14:35

This post comes from the Microsoft Parallel Team and provides a pretty good primer on some parallel issues. The original post can be found here.

Since the goal of Parallel Extensions is to simplify parallel programming, and the motivation behind parallel programming is performance, it is not surprising that many of the questions we receive about our CTP releases are performance-related.

Developers ask why one program shows a parallel speedup but another one does not, or how to modify a program so that it scales better on multi-core machines.

The answers tend to vary. In some cases, the problem observed is a clear issue in our code base that is relatively easy to fix. In other cases, the problem is that Parallel Extension does not adapt well to a particular workload. This may be harder to fix on our side, but understanding real-world workloads is a first step to do that, so please continue to send us your feedback. In yet another class of issues, the problem is in the way the program uses Parallel Extensions, and there is little we can do on our side to resolve the issue.

Interestingly, it turns out that there are common patterns that underlie most performance issues, regardless of whether the source of the problem is in our code or the user code. In this blog posting, we will walk though the common performance issues to take into consideration while developing applications using Parallel Extensions.

Amount of Parallelizable CPU-Bound Work 

The number one requirement for parallelization is that the program must have enough work that can be performed in parallel. If only half of the work can be parallelized, Amdahl’s Law dictates that we are not going to be able to speed up the program by more than a factor of two.

Also, additional CPUs thrown at a task will help the most if the CPU was the performance bottleneck. If the program spends 90% of its time waiting for a server to execute SQL queries, then parallelizing the program likely will not achieve significant benefits. (Well, you may still observe a speedup if there are multiple requests that we can send, say to multiple servers. In such cases, though, the asynchronous programming model would likely result in even better performance.)

To benefit from parallelism, the total amount of processor-intensive work in a program must be large enough to dwarf the overheads of parallelization, and a large fraction of that work must be decomposable to run in parallel.

Task Granularity

Even if a program does a lot of parallelizable work, we must be careful to ensure that we will split up the work into appropriately-sized chunks which will execute in parallel. If we create too many chunks, the overheads of managing and scheduling the chunks will be large. If we create too few chunks, some cores on the machine will have nothing to do.

In some parts of the Parallel Extensions API, such as Parallel.For and PLINQ, the code in our library is responsible for deciding on the proper granularity of tasks. In other parts of the API, such as tasks and futures, it is the responsibility of the user code. Regardless of who is responsible for creating the tasks, appropriate task granularity is important in order to achieve good performance.

Load Balancing

Even if there is enough parallelizable CPU work to make parallelism worthwhile, we need to ensure that the work will be evenly distributed among cores on the machine. This is complicated by the fact that different “chunks” of work may differ widely in the time required to execute them. Also, we often don’t know how much work each chunk will require until we execute it till completion.

For example, if Parallel.For were to assume that all iterations of the loop take the same amount of time, we could simply divide the range into as many contiguous ranges as we have cores, and assign each range to one core. Unfortunately, since the work per iteration may vary, it is possible that one core will end up with many expensive iterations, while other cores will have less work to do. In an extreme situation, one core may end up with nearly all work, and we are back to the sequential case.

Fortunately, our implementation of Parallel.For provides load balancing that should work well for most workloads. But, you are likely to encounter the load-balancing problem when writing your own concurrent code.

Memory Allocations and Garbage Collection

Some programs spend a lot of time in memory allocations and garbage collections. For example, programs that manipulate strings tend to allocate a lot of memory, particularly if they are not designed carefully to prevent unnecessary allocations.

Unfortunately, allocating memory is an operation that may require synchronization. After all, we need to ensure that memory regions allocated by different threads will not overlap.

Perhaps even more seriously, allocating a lot of memory typically means that we will also need to do a lot of garbage collection work to reclaim memory that has been freed. If the garbage collection dominates the running time of your program, the program will only scale as well as the garbage collection algorithm.

It is possible to mitigate this issue by turning on the server GC. See Chris Lvon’s blog post Server, Workstation and Concurrent GC for more information about how server GC works and how to enable it.

False Cache-Line Sharing

In order to explain this particular performance problem of parallel programs, let’s quickly review a few details about how caches work on today’s mainstream computers. When a CPU reads a value from the main memory, it copies the value to cache, so that subsequent accesses to that value are much faster. In fact, rather than just bringing in that particular value into cache, the CPU will bring in also nearby memory locations. It turns out that if a program read a particular memory location, chances are that it is going to read nearby values too. So, values are moved between main memory and cache in chunks called cache lines, typically of size 64 or 128 bytes.

One problem that arises on machines with multiple cores is that if one core invalidates a particular memory location, the version of that memory location cached by another core gets invalidated. Then, the core with an invalid cached copy must go all the way to the main memory on the next read of that memory location. So, if two cores keep writing and reading a particular memory location, they may end up continuously invalidating each other’s caches, sometimes dramatically reducing the performance of the program.

The trickiest part of the issue is that the two cores don’t even have to be writing into the same memory location. The same problem occurs also if they are writing into two memory locations that are on the same cache line (hence the term “false cache-line sharing”).

Strangely, this problem turns up in practice fairly regularly. For example, a parallel program that computes the sum of integers in an array will typically have a separate intermediate result for each thread. The intermediate results are likely to be either elements in an array or fields in a class. And, in both cases, they would likely end up close in memory.

There are various techniques to prevent false sharing, or at to at least make it unlikely: padding data structures with garbage data, allocating them in an order that makes false sharing less likely, or allocating them by different threads.

Locality Issues

Sometimes modifying a program to run in parallel negatively affects locality. For example, let’s say that we want to perform an operation on each element on an array. On a dual-core machine, we could create two tasks: one to perform the operation on the elements with even indices and one to handle odd indices.

But, as a negative consequence, the locality of reference degrades for each thread. The elements that a thread needs to access are interleaved with elements it does not care about. Each cache line will contain only half as many elements as it did previously, which may mean that twice as many memory accesses will go all the way to the main memory.

In this particular case, one solution is to split up the array into a left half and a right half, rather than odd and even elements. In more complicated cases, the problem and the solution may not be as obvious, so the effect of parallelization on locality of reference is one of the things to keep in mind when designing parallel algorithms.

Summary 

We discussed the most common reasons why a concurrent program may not achieve a parallel speedup that you would expect. If you keep these issues in mind, you should have an easier time designing and developing parallel programs.

Tags: , ,

Research | Programming

Work Stealing Queue for Thread Pools

by Mark Shiffer 12. August 2008 20:14

This post comes courtesy of Joe Duffy's blog, reposted for my own use. Sorry for the code formatting; some day I will figure out how to format code properly with this blog editor. Joe has some great insights into the CLR memory model and multi-threading in general. The original post is here.

The primary reason a traditional thread pool doesn’t scale is that there’s a single work queue protected by a global lock.  For obvious reasons, this can easily become a bottleneck.  Two primary things contribute heavily to whether the global lock becomes a limiting factor for a particular workload’s throughput:

  1. As the size of work items become smaller, the frequency at which the pool’s threads must acquire the global lock increases.  Moving forward, we expect the granularity of latent parallelism to become smaller such that programs can scale as more processors are added.
  2. As more processors are added, the arrival rate at the lock will increase when compared to the same workload run with fewer processors.  This inherently limits the ability to “get more work through” that single straw that is the global queue.

For coarse-grained work items, and for small numbers of processors, these problems simply aren’t too great.  That has been the CLR ThreadPool’s forte for quite some time; most work items range in the 1,000s to 10,000s (or more) of CPU cycles, and 8-processors was considered pushing the limits.  Clearly the direction the whole industry is headed in exposes these fundamental flaws very quickly.  We’d like to enable work items with 100s and 1,000s of cycles and must scale well beyond 4, 8, 16, 32, 64, ... processors.

Decentralized scheduling techniques can be used to combat this problem.  In other words, if we give different components their own work queues, we can eliminate the central bottleneck.  This approach works to a degree but becomes complicated very quickly because clearly we don’t want each such queue to have its own pool of dedicated threads.  So we’d need some way of multiplexing a very dynamic and comparatively large number of work pools onto a mostly-fixed and comparatively small number of OS threads.

Introducing work stealing

Another technique – and the main subject of this blog post – is to use a so-called work stealing queue (WSQ).  A WSQ is a special kind of queue in that it has two ends, and allows lock-free pushes and pops from one end (“private”), but requires synchronization from the other end (“public”).  When the queue is sufficiently small that private and public operations could conflict, synchronization is necessary.  It is array-based and can grow dynamically.  This data structure was made famous in the 90’s when much work on dynamic work scheduling was done in the research community.

In the context of a thread pool, the WSQ can augment the traditional global queue to enable more efficient private queuing and dequeuing.  It works roughly as follows:

  • We still have a global queue protected by a global lock.
  • (We can of course consider the ability to have separate pools to reduce pressure on this.)
  • Each thread in the pool has its own private WSQ.
  • When work is queued from a pool thread, the work goes into the WSQ, avoiding all locking.
  • When work is queued from a non-pool thread, it goes into the global queue.
  • When threads are looking for work, they can have a preferred search order:
    • Check the local WSQ.  Work here can be dequeued without locks.
    • Check the global queue.  Work here must be dequeued using locks.
    • Check other threads’ WSQs.  This is called “stealing”, and requires locks.

If you haven’t guessed, this is by-and-large how the Task Parallel Library (TPL) schedules work.

For workloads that recursively queue a lot of work, the use of a per-thread WSQ substantially reduces the synchronization necessary to complete the work, leading to far better throughput.  There are also fewer cache effects due to sharing of the global queue information.  “Stealing” is our last course of action in the abovementioned search logic, because it has the secondary effect of causing another thread to have to visit the global queue (or steal) sooner.  In some sense, it is double the cost of merely getting an item from the global queue.

Another (subtle) aspect of WSQs is that they are LIFO for private operations and FIFO for steals.  This is inherent in how the WSQ’s synchronization works (and is key to enabling lock-freedom), but has additional rationale:

  1. By executing the work most recently pushed into the queue in LIFO order, chances are that memory associated with it will still be hot in the cache.
  2. By stealing in FIFO order, chances are that a larger “chunk” of work will be stolen (possibly reducing the chance of needing additional steals).  The reason for this is that many work stealing workloads are divide-and-conquer in nature; in such cases, the recursion forms a tree, and the oldest items in the queue lie closer to the root; hence, stealing one of those implicitly also steals a (potentially) large subtree of computations that will unfold once that piece of work is stolen and run.

This decision clearly changes the regular order of execution when compared to a mostly-FIFO system, and is the reason we’re contemplating exposing options to control this behavior from TPL.

A simple WorkStealingQueue<T> type

With all that background behind us, let’s jump straight into a really simple implementation of a work stealing queue written in C#.

public class WorkStealingQueue<T>{

The queue is array-based, and we keep two indexes—a head and a tail.  The tail represents the private end and the head represents the public end.  We also maintain a mask that is always equal to the size of the list minus one, helping with some of the bounds-checking arithmetic and handling automatic wraparound for indexing into the array.  Because of the way we use the mask (we will assume all legal bits for indexing into the list are on), the count must always be a power of two.  We arbitrarily select the number 32 as the queues initial (power of two) size.

 

    private const int INITIAL_SIZE = 32;    private T[] m_array = new T[INITIAL_SIZE];    private int m_mask = INITIAL_SIZE - 1;    private volatile int m_headIndex = 0;

    private volatile int m_tailIndex = 0;
 
We also need a lock to protect the operations that require synchronization.

    private object m_foreignLock = new object();

Although they aren’t exercised very much in the code, we have some helper properties.  The queue is empty when the head is equal to or greater than the tail, and the count can be computed by subtracting the head from the tail.  Because these fields never wrap (because we use the mask), this is correct.

    public bool IsEmpty    {        get { return m_headIndex >= m_tailIndex; }    }     public int Count    {        get { return m_tailIndex - m_headIndex; }

    }
 
OK, let’s get into the meat of the implementation.  Pushing is the obvious place to start, and, for obvious reasons, we only support private pushes.  Public pushes are useless given the protocol explained above, i.e., the only public operation we will support is stealing.  Keep in mind when reading this code that m_tailIndex and m_headIndex are both volatile variables.

 

    public void LocalPush(T obj)    {        int tail = m_tailIndex;


First we must check whether there is room in the queue.  To do so, we just see if m_tailIndex is less than the sum of m_mask (the size of the list minus one) and m_headIndex.  False negatives are OK, and are certainly possible because a concurrent steal may come along and take an element, making room, immediately after the check.  We will handle this by synchronizing in a moment.

        if (tail < m_headIndex + m_mask)        {

If there is indeed room, we can merely stick the object into the array (masking m_tailIndex with m_mask to ensure we’re within the legal range) and then increment m_tailIndex by one.  This may look unsafe, but it is in fact safe: writes retire in order in .NET’s memory model, and we know no other thread is changing m_tailIndex (only private operations write to it) and that no thread will try to access the current array slot into which we’re storing the element.

 

            m_array[tail & m_mask] = obj;            m_tailIndex = tail + 1;        }

Otherwise, we need to head down the slow path which involves resizing.

 

        else        {

We will take the lock and check that we still need to make room.  

 

            lock (m_foreignLock)            {                int head = m_headIndex;                int count = m_tailIndex - m_headIndex;                 if (count >= m_mask)                {

Assuming we need to make more room, we will just double the size of the array, copy elements, fix up the fields, and move on.  Remember that the array length is always a power of two, so we can get the next power of two by simply bitshifting to the left by one.  We do that for the mask too, but need to remember to “turn on” the least significant bit by oring one into the mask.

 

                    T[] newArray = new T[m_array.Length << 1];                    for (int i = 0; i < m_array.Length; i++)                        newArray[i] = m_array[(i + head) & m_mask];                     m_array = newArray;                    m_headIndex = 0;                    m_tailIndex = tail = count;                    m_mask = (m_mask << 1) | 1;


After we’re done resizing, the m_headIndex is reset to 0, and the m_tailIndex is the previous size of the queue.  We can then store into the queue in same way we would have earlier.

                }

                m_array[tail & m_mask] = obj;                m_tailIndex = tail + 1;            }        }    }

And that’s that: we’ve added an item into the queue with a local push.  Now let’s look at the reverse: removing an element with a local pop.  Remember, it’s impossible for a local push and pop to interleave with one another because they must be executed by the same thread serially.

 

    public bool LocalPop(ref T obj)    {

First we read the current value of m_tailIndex.  If the queue is currently empty, i.e., m_headIndex >= m_tailIndex, then we just return false right away.  This is how “emptiness” is conveyed to callers.

 

        int tail = m_tailIndex;        if (m_headIndex >= tail)

            return false;
 
Next we disable an annoying C# compiler warning.

#pragma warning disable 0420
 
Now we have determined there is at least one element in the queue (or was during our previous check).  We will now subtract one from the tail, which effectively removes the element.  There is still a chance that we will “lose” in a race with another thread doing a steal, so we’ll need to be very careful.  In fact, there is a subtle .NET memory model gotcha to be aware of: we must guarantee our write to take the element does not get trapped in the write buffer beyond a subsequent read of the m_headIndex.  If that could happen, we might mistakenly think we took the element, while at the same time a stealing thread thought it took the same element!  The result would be that the same item will be dequeued by two threads which could lead to disaster.  In a thread pool, it’d amount to the same work item being run twice.  To ensure this reordering can’t happen, we must use a XCHG to perform the write to m_tailIndex.

        tail -= 1;

        Interlocked.Exchange(ref m_tailIndex, tail);
 
We detect whether we lost the race by checking to see if our dequeuing of the element has made the queue empty.  If it hasn’t, we can just read the array element in the new m_tailIndex position and return it.

        if (m_headIndex <= tail)        {            obj = m_array[tail & m_mask];            return true;        }        else        {

Otherwise, we take the lock and see what to do.  This blocks out all steals.  Either we will find that there indeed is an element remaining, and we can just return it as we would have done above, or we must “put the element back” by just incrementing the m_tailIndex.  If we have to back out our modification, we just return false to indicate that the queue has become empty.  We know we aren’t racing with it becoming non-empty because only private pushes are supported.

 

            lock (m_foreignLock)            {                if (m_headIndex <= tail)                {                    // Element still available. Take it.                    obj = m_array[tail & m_mask];                    return true;                }                else                {                    // We lost the race, element was stolen, restore the tail.                    m_tailIndex = tail + 1;                    return false;                }            }        }    }

Lastly, let’s take a look at the public pop capability.  We allow a timeout to be supplied, because it’s often useful during the stealing logic to use a 0-timeout on the first pass through all the WSQs.  This can help to eliminate lock wait times and more evenly distribute contention across the list of WSQs.
 

    private bool TrySteal(ref T obj, int millisecondsTimeout)    {

First we acquire the WSQ’s lock, ensuring mutual exclusion among all other concurrent steals, resize operations, and local pops that may make the queue empty.

 

        bool taken = false;        try        {            taken = Monitor.TryEnter(m_foreignLock, millisecondsTimeout);            if (taken)            {

Once inside the lock, we must increment m_headIndex by one.  This moves the head towards the tail, and has the effect of taking an element.  Now this part gets quite tricky.  We must ensure that we don’t remove the last element when racing with a local pop that went down its fast path (i.e., it didn’t acquire the lock).  Given two threads racing to take an element—a steal and a local pop—we must ensure precisely one of them “wins”.  Having both succeed will lead to the same element being popped twice, and having neither succeed could lead to reporting back an empty queue when in fact an element exists.

To do that, we will write to the m_headIndex variable to tentatively take the element, and must then read the m_tailIndex right afterward to ensure that the queue is still non-empty.  As with the pop logic earlier, we need to use an XCHG operation to write the m_headIndex field, otherwise we will potentially suffer from a similar legal memory reordering bug.

 

                int head = m_headIndex;

                Interlocked.Exchange(ref m_headIndex, head + 1);
 
If the queue is non-empty, we just read the element as we usually do: by indexing into the array with the new m_headIndex value using the proper masking.  We then return true to indicate an element was found.

 

                if (head < m_tailIndex)                {                    obj = m_array[head & m_mask];                    return true;                }

Otherwise, the queue is empty and we must return.  Clearly this is racy and by the time we return the queue may be non-empty.  If the pool will subsequently wait for work to arrive, this must be taken into consideration so as not to incur lost wake-ups.

 

                else                {                    m_headIndex = head;                    return false;                }            }        }

We of course need to release the lock at the end of it all.

 

        finally        {            if (taken)                Monitor.Exit(m_foreignLock);        }         return false;    }}

And that’s it!  As with most lock-free algorithms, the core idea is surprisingly simple but deceptively subtle and intricate.  After seeing it written out and explained in detail, I hope that you’ll have that “Ah hah!” moment that always happens after staring at this kind of code for a little while.  In future posts, we’ll take a closer look at the performance differences between this and a traditional globally synchronized queue, and discuss what it takes to merge the two ideas implementation-wise.

Appendix

For reference, here’s the full code without all the explanation intertwined:

using System;using System.Threading; public class WorkStealingQueue<T>{    private const int INITIAL_SIZE = 32;    private T[] m_array = new T[INITIAL_SIZE];    private int m_mask = INITIAL_SIZE - 1;    private volatile int m_headIndex = 0;    private volatile int m_tailIndex = 0;    private object m_foreignLock = new object();     public bool IsEmpty    {        get { return m_headIndex >= m_tailIndex; }    }     public int Count    {        get { return m_tailIndex - m_headIndex; }    }     public void LocalPush(T obj)    {        int tail = m_tailIndex;        if (tail < m_headIndex + m_mask)        {            m_array[tail & m_mask] = obj;            m_tailIndex = tail + 1;        }        else        {            lock (m_foreignLock)            {                int head = m_headIndex;                int count = m_tailIndex - m_headIndex;                 if (count >= m_mask)                {                    T[] newArray = new T[m_array.Length << 1];                    for (int i = 0; i < m_array.Length; i++)                        newArray[i] = m_array[(i + head) & m_mask];                     // Reset the field values, incl. the mask.                    m_array = newArray;                    m_headIndex = 0;                    m_tailIndex = tail = count;                    m_mask = (m_mask << 1) | 1;                }                m_array[tail & m_mask] = obj;                m_tailIndex = tail + 1;            }        }    }     public bool LocalPop(ref T obj)    {        int tail = m_tailIndex;        if (m_headIndex >= tail)            return false; #pragma warning disable 0420         tail -= 1;        Interlocked.Exchange(ref m_tailIndex, tail);         if (m_headIndex <= tail)        {            obj = m_array[tail & m_mask];            return true;        }        else        {            lock (m_foreignLock)            {                if (m_headIndex <= tail)                {                    // Element still available. Take it.                    obj = m_array[tail & m_mask];                    return true;                }                else                {                    // We lost the race, element was stolen, restore the tail.                    m_tailIndex = tail + 1;                    return false;                }            }        }    }     private bool TrySteal(ref T obj, int millisecondsTimeout)    {        bool taken = false;        try        {            taken = Monitor.TryEnter(m_foreignLock, millisecondsTimeout);            if (taken)            {                int head = m_headIndex;                Interlocked.Exchange(ref m_headIndex, head + 1);                if (head < m_tailIndex)                {                    obj = m_array[head & m_mask];                    return true;                }                else                {                    m_headIndex = head;                    return false;                }            }        }        finally        {            if (taken)                Monitor.Exit(m_foreignLock);        }         return false;    }

}

 

Tags: , , , , ,

Research | Programming

Improving .NET Application Performance and Scalability

by Mark Shiffer 11. August 2008 22:13

This is a bookmark so that I can later go back and read this MSDN site. Seems like it is a topic worth reading about. Improving .NET Application Performance and Scalability

 

Tags: , , ,

Research | Programming

P/Invoke Interop Assistant

by Mark Shiffer 5. August 2008 14:28

I ran across this tool while doing some research. I watched a video on Channel 9 that had its author describing and demoing the tool. It looks like it has a pretty easy search functionality that spits out VB code for the structures and method signature. There are a few other functions of the program as well, all of which have to do with wrapping native calls into managed .NET environment.

P/Invoke Interop Assistant available on CodePlex. The tool helps with converting unmanaged C code to managed P/Invoke signatures and vice versa. Say goodbye to digging through random header files or MSDN documentation to find the right constants, structures and signatures. The P/Invoke Interop Assistant does a smarter translation for you using SAL (Source Code Annotation Language). 

 

Tags: ,

Research | Tools | Programming

Win32: The library, drive, or media pool must be empty to perform this.

by Mark Shiffer 4. August 2008 15:52
I recently received this error when programmatically attempting to delete an application pool in IIS. Upon examination of IIS, the system thought the application pool was assigned to a virtual directory that no longer existed. I tried to delete the application pool manually in IIS and received a similar error (worded differently though). I searched through all of the virtual directories on the server and did not find any that were referencing the application pool. Still, IIS was showing the application pool as being assigned to a virtual directory. To get around this, I created a virtual directory with the name that it thought it was already assigned to and assigned the application pool to that new virtual directory. I then unassigned the application pool and was able to delete it. Odd...

Tags: , , ,

Issues

Ensuring a Single Instance of an Application

by Mark Shiffer 4. August 2008 15:46
Most that have been playing this game for a while immediately think Mutex when you want to ensure your application only runs one instance at a time. I recently came across a .NET class that provides a different approach with some added functionality. The class is WindowsFormsApplicationBase. Unfortunately that class is in the Microsoft.VisualBasic.ApplicationServices namespace, so for C# you have to dirty your code with a VB namespace. The class is pretty simple. Just create it and set IsSingleInstance to true. You then have the ability to hook to events for OnStartupNextInstance, OnShutdown, OnCreateSplashScreen, etc... Worth checking out if you have this need.

Tags: , ,

Research | Programming

Copyright © 2001-2010 MS Consulting, Inc. All Rights Reserved.